BullishAgent pulls from three distinct EDGAR systems, each optimized for different filing types:
EDGAR Full-Text Search (EFTS)
Primary source for 8-K filings, NT filings, S-1/S-3/424B registration statements, and SC TO (tender offers). EFTS indexes the full text of filings and supports filtering by form type, date range, and keyword. We poll this endpoint on a daily cron to discover new filings.
efts.sec.gov
EDGAR Submissions API
Used for Form 4 insider trades, Schedule 13D/13G institutional filings, and 13F quarterly holdings. The submissions endpoint returns all filings for a given CIK (company or person) in structured JSON — no HTML parsing required.
data.sec.gov
EDGAR XBRL / Financial Data API
For structured financial data (revenue, EPS, cash flow) from 10-K and 10-Q filings. EDGAR's XBRL viewer exposes tagged financial data in machine-readable JSON, bypassing the need to parse PDF or HTML financial statements.
data.sec.gov/api/xbrl
Every filing goes through the same five-stage pipeline before it appears on BullishAgent:
1
Fetch
Daily cron (6 AM ET) queries EDGAR EFTS for all 8-K filings filed in the past 24 hours. A separate cron handles Form 4 and 13D/13G via the submissions API. 13F quarterly holdings run once per quarter after the filing deadline.
2
Parse
Each filing's HTML or XML is downloaded and parsed. For 8-Ks, we extract the item number, filing entity, and the full text of each item section. For Form 4, the XML schema is structured — ticker, insider name, transaction type, shares, and price are direct fields. For 13F, we parse the information table XML for each holding.
3
Classify
The item number (for 8-Ks) or form type (for 13D/13G) is mapped to an event type: exec_change, acquisition, auditor_change, default, equity_issuance, etc. A supplementary rule set checks for keywords that indicate subtype — e.g., an 8-K Item 1.01 that mentions "merger agreement" is classified differently than one mentioning "licensing agreement."
4
Summarize
For 8-K filings with sufficient text, we pass the extracted section text to Claude Haiku with a prompt requesting a single factual sentence describing what happened and why it matters. The AI take is stored alongside the raw filing data. Summarization runs only once per filing — we don't re-summarize.
5
Store & serve
Parsed records are written to MySQL. The web application reads from the database in real time — no intermediate caching layer. Filing feed pages query the most recent N records; stock pages join on ticker to show filing history.
An 8-K filing can disclose almost anything — a CEO departure, a debt default, a new customer contract,
or a going concern opinion. The item number gives a coarse category; we map it to a typed
event that drives how the filing is displayed and filtered.
| Event type |
Maps from |
AI takes |
| other |
— |
5,083 |
| reg_fd |
8-K Item 7.01 (Reg FD disclosures) |
2,530 |
| material_event |
8-K Item 8.01 (unclassified material events) |
2,394 |
| exec_change |
8-K Item 5.02 (officer/director changes) |
2,161 |
| agreement |
8-K Item 1.01 (material agreements) |
774 |
| earnings_results |
8-K Item 2.02 (earnings results) |
660 |
| financing |
8-K Item 2.03 (new debt obligations) |
526 |
| late_filing |
NT 10-K / NT 10-Q filings |
407 |
| acquisition |
8-K Item 1.01 + merger/acquisition keywords |
196 |
| amendment |
8-K/A amendments to prior filings |
168 |
BullishAgent uses Claude Haiku (Anthropic's fast, low-cost model) to
generate single-sentence filing summaries — what we call "AI takes." The goal of the
prompt is a factual, information-dense sentence that tells a reader what happened and why
it might matter, without editorializing or speculating beyond what the filing states.
The prompt passes the relevant section text (not the whole filing) and explicitly instructs
the model to:
# Simplified version of the summarization prompt
prompt = """You are summarizing a section of an SEC 8-K filing.
Write exactly one factual sentence (max 180 characters) describing what happened.
Be specific: name the people, dollar amounts, and companies involved.
Do not speculate. Do not use phrases like 'may impact' or 'could affect'.
State facts only.
Filing section:
{section_text}"""
Each AI take is generated once and stored. We do not re-run summarization unless the
filing is reprocessed. For filings with very short or boilerplate text (under ~100 words),
we skip AI summarization and store a rule-based summary instead.
The model sees only the filing text — no ticker price, no market context, no prior AI takes.
This keeps the summaries grounded in what was actually disclosed.
| Filing type |
Source |
Update frequency |
AI summarized |
| 8-K (all items) |
EDGAR EFTS |
Daily, 6 AM ET |
Yes — item section text |
| NT 10-K / NT 10-Q |
EDGAR EFTS |
Daily, 6 AM ET |
Yes — late filing context |
| Form 4 insider trades |
EDGAR Submissions API |
Daily, 6 AM ET |
No — structured data only |
| SC 13D / 13G |
EDGAR Submissions API |
Daily, 6 AM ET |
Yes — purpose section |
| 13F quarterly holdings |
EDGAR Submissions API |
Quarterly (45d lag) |
No — structured data only |
| S-1 / S-3 / 424B offerings |
EDGAR EFTS |
Daily, 6 AM ET |
Yes — offering summary |
| SC TO-I / SC TO-T (tender offers) |
EDGAR EFTS |
Daily, 6 AM ET |
Yes |
The SEC provides a free, public REST API for accessing EDGAR data — no API key, no registration,
no cost. It consists of three main systems: the EFTS full-text search API
for searching filing content, the submissions API for looking up filings by
company, and the XBRL financial data API for structured financial statement data.
All three are used in production at BullishAgent.
The only requirement is a descriptive User-Agent
header on every request. Without it, the SEC will block your IP (HTTP 403).
The rate limit is 10 requests per second per IP address.
# Required on every EDGAR API request — use your app name and contact email
headers = {"User-Agent": "MyApp contact@myemail.com"}
# Rate limit: 10 requests/second — add a small sleep between calls
import time
time.sleep(0.1) # 100ms between calls to stay well under the limit
The EDGAR Full-Text Search System (EFTS) at efts.sec.gov
indexes the complete text of all EDGAR filings. You can filter by form type, date range,
company name, or keyword — and get results as JSON without any HTML parsing.
This is the fastest way to find all 8-K filings, NT filings, or activist disclosures
filed in a given time window.
import requests
# Search for all 8-K filings in the last 7 days
url = "https://efts.sec.gov/LATEST/search-index"
params = {
"forms": "8-K",
"dateRange": "custom",
"startdt": "2026-05-18",
"enddt": "2026-05-25",
"_source": "file_date,entity_name,file_num,period_of_report",
}
r = requests.get(url, params=params, headers={"User-Agent": "MyApp me@email.com"})
hits = r.json()["hits"]["hits"]
for h in hits:
src = h["_source"]
print(src["entity_name"], src["file_date"])
# Also works for: forms=SC+13D, forms=4, forms=13F-HR, forms=NT+10-K
The EDGAR submissions API at data.sec.gov/submissions/
returns every filing ever made by a given entity, identified by their CIK (Central Index Key).
CIKs are 10-digit zero-padded numbers. You can look up a company's CIK on the EDGAR company
search page, or via the company facts API.
import requests
# Apple Inc. CIK = 0000320193 — zero-pad to 10 digits
cik = "0000320193"
url = f"https://data.sec.gov/submissions/CIK{cik}.json"
r = requests.get(url, headers={"User-Agent": "MyApp me@email.com"})
data = r.json()
# Recent filings are in data["filings"]["recent"]
filings = data["filings"]["recent"]
for i, form in enumerate(filings["form"]):
if form == "8-K":
accession = filings["accessionNumber"][i]
filed = filings["filingDate"][i]
print(accession, filed)
# For companies with many filings, older ones are in data["filings"]["files"]
# — fetch each additional JSON file listed there
Each filing has an accession number in the format 0001234567-26-000001.
Remove the dashes to get the folder name. The filing index lists all documents in the submission —
find the primary document (usually an .htm file) and download it.
import requests
from bs4 import BeautifulSoup
cik = "320193"
accession = "0000320193-26-000045"
acc_clean = accession.replace("-", "") # "000032019326000045"
# 1. Fetch the filing index to find the primary document
index_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{acc_clean}-index.json"
index = requests.get(index_url, headers={"User-Agent": "MyApp me@email.com"}).json()
# 2. Find the primary .htm document
primary = [f for f in index["directory"]["item"] if f["type"] == "8-K"][0]
doc_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{primary['name']}"
# 3. Download and parse with BeautifulSoup
html = requests.get(doc_url, headers={"User-Agent": "MyApp me@email.com"}).text
soup = BeautifulSoup(html, "html.parser")
text = soup.get_text(separator="\n", strip=True)
Form 4 filings are structured XML — unlike 8-Ks which are free-form HTML. The XML schema
is consistent across all filers, which makes parsing straightforward. The key fields are
in the nonDerivativeTransaction and
derivativeTransaction elements.
import requests
import xml.etree.ElementTree as ET
# Form 4 XML URL pattern
cik = "1318605" # Elon Musk's CIK
acc_clean = "000131860526000012"
xml_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/0001318605-26-000012.xml"
r = requests.get(xml_url, headers={"User-Agent": "MyApp me@email.com"})
root = ET.fromstring(r.text)
ticker = root.findtext(".//issuerTradingSymbol")
name = root.findtext(".//rptOwnerName")
for txn in root.findall(".//nonDerivativeTransaction"):
code = txn.findtext(".//transactionAcquiredDisposedCode/value") # 'A' or 'D'
shares = txn.findtext(".//transactionShares/value")
price = txn.findtext(".//transactionPricePerShare/value")
ttype = txn.findtext(".//transactionCode/value") # 'P'=purchase, 'S'=sale
print(ticker, name, ttype, shares, price)
The SEC enforces a 10 requests per second limit per IP address under their
fair-access policy. Exceeding this results in a temporary block. For bulk data collection,
the SEC also provides pre-built bulk data downloads — full quarterly archives of all EDGAR
filings — which are faster than hitting the API for historical data.
# Bulk filing index — all filings for a given quarter (no rate limit concerns)
# Format: https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{1-4}/company.idx
GET https://www.sec.gov/Archives/edgar/full-index/2026/QTR1/company.idx
# Also available as JSON: crawler.idx, form.idx, full-index.json
# Each line: company name | form type | CIK | date | filename
# For real-time monitoring, use EFTS with a date filter instead of polling submissions
GET https://efts.sec.gov/LATEST/search-index?forms=8-K&dateRange=custom&startdt=2026-05-25&enddt=2026-05-25
Building vs using BullishAgent:
The EDGAR API is genuinely good — free, reliable, and comprehensive. The hard parts are not
the fetching: they're the parsing (every filer's HTML is different), the classification
(Item 5.02 with "resigned" signals something different from Item 5.02 with "appointed"),
and the AI summarization at scale. BullishAgent has already solved these for 8-K, Form 4,
13D/13G, and 13F filings — and the result runs daily on the
EDGAR filings feed.
The code above is exactly how we do it.