BullishAgent BullishAgent Market EDGAR Earnings Ratings Insiders Shorts
ETFs Screener
Sign in Register
Chapter 9

How BullishAgent
Works with EDGAR

Every filing summary, insider signal, and activist alert you see on BullishAgent starts with raw data from SEC EDGAR. This chapter explains how we fetch, parse, classify, and summarize EDGAR filings — and how you can access the same raw data yourself.

17,425
Filings processed
15,721
AI takes generated
8,332
Insider trades
8,714
13D / 13G filings

Data sources

BullishAgent pulls from three distinct EDGAR systems, each optimized for different filing types:

EDGAR Full-Text Search (EFTS)
Primary source for 8-K filings, NT filings, S-1/S-3/424B registration statements, and SC TO (tender offers). EFTS indexes the full text of filings and supports filtering by form type, date range, and keyword. We poll this endpoint on a daily cron to discover new filings.
efts.sec.gov
EDGAR Submissions API
Used for Form 4 insider trades, Schedule 13D/13G institutional filings, and 13F quarterly holdings. The submissions endpoint returns all filings for a given CIK (company or person) in structured JSON — no HTML parsing required.
data.sec.gov
EDGAR XBRL / Financial Data API
For structured financial data (revenue, EPS, cash flow) from 10-K and 10-Q filings. EDGAR's XBRL viewer exposes tagged financial data in machine-readable JSON, bypassing the need to parse PDF or HTML financial statements.
data.sec.gov/api/xbrl

The pipeline — fetch to display

Every filing goes through the same five-stage pipeline before it appears on BullishAgent:

1
Fetch
Daily cron (6 AM ET) queries EDGAR EFTS for all 8-K filings filed in the past 24 hours. A separate cron handles Form 4 and 13D/13G via the submissions API. 13F quarterly holdings run once per quarter after the filing deadline.
2
Parse
Each filing's HTML or XML is downloaded and parsed. For 8-Ks, we extract the item number, filing entity, and the full text of each item section. For Form 4, the XML schema is structured — ticker, insider name, transaction type, shares, and price are direct fields. For 13F, we parse the information table XML for each holding.
3
Classify
The item number (for 8-Ks) or form type (for 13D/13G) is mapped to an event type: exec_change, acquisition, auditor_change, default, equity_issuance, etc. A supplementary rule set checks for keywords that indicate subtype — e.g., an 8-K Item 1.01 that mentions "merger agreement" is classified differently than one mentioning "licensing agreement."
4
Summarize
For 8-K filings with sufficient text, we pass the extracted section text to Claude Haiku with a prompt requesting a single factual sentence describing what happened and why it matters. The AI take is stored alongside the raw filing data. Summarization runs only once per filing — we don't re-summarize.
5
Store & serve
Parsed records are written to MySQL. The web application reads from the database in real time — no intermediate caching layer. Filing feed pages query the most recent N records; stock pages join on ticker to show filing history.

Event classification — how we label filings

An 8-K filing can disclose almost anything — a CEO departure, a debt default, a new customer contract, or a going concern opinion. The item number gives a coarse category; we map it to a typed event that drives how the filing is displayed and filtered.

Event type Maps from AI takes
other 5,083
reg_fd 8-K Item 7.01 (Reg FD disclosures) 2,530
material_event 8-K Item 8.01 (unclassified material events) 2,394
exec_change 8-K Item 5.02 (officer/director changes) 2,161
agreement 8-K Item 1.01 (material agreements) 774
earnings_results 8-K Item 2.02 (earnings results) 660
financing 8-K Item 2.03 (new debt obligations) 526
late_filing NT 10-K / NT 10-Q filings 407
acquisition 8-K Item 1.01 + merger/acquisition keywords 196
amendment 8-K/A amendments to prior filings 168

AI take generation — Claude Haiku

BullishAgent uses Claude Haiku (Anthropic's fast, low-cost model) to generate single-sentence filing summaries — what we call "AI takes." The goal of the prompt is a factual, information-dense sentence that tells a reader what happened and why it might matter, without editorializing or speculating beyond what the filing states.

The prompt passes the relevant section text (not the whole filing) and explicitly instructs the model to:

# Simplified version of the summarization prompt prompt = """You are summarizing a section of an SEC 8-K filing. Write exactly one factual sentence (max 180 characters) describing what happened. Be specific: name the people, dollar amounts, and companies involved. Do not speculate. Do not use phrases like 'may impact' or 'could affect'. State facts only. Filing section: {section_text}"""

Each AI take is generated once and stored. We do not re-run summarization unless the filing is reprocessed. For filings with very short or boilerplate text (under ~100 words), we skip AI summarization and store a rule-based summary instead.

The model sees only the filing text — no ticker price, no market context, no prior AI takes. This keeps the summaries grounded in what was actually disclosed.

What's covered — and update frequency

Filing type Source Update frequency AI summarized
8-K (all items) EDGAR EFTS Daily, 6 AM ET Yes — item section text
NT 10-K / NT 10-Q EDGAR EFTS Daily, 6 AM ET Yes — late filing context
Form 4 insider trades EDGAR Submissions API Daily, 6 AM ET No — structured data only
SC 13D / 13G EDGAR Submissions API Daily, 6 AM ET Yes — purpose section
13F quarterly holdings EDGAR Submissions API Quarterly (45d lag) No — structured data only
S-1 / S-3 / 424B offerings EDGAR EFTS Daily, 6 AM ET Yes — offering summary
SC TO-I / SC TO-T (tender offers) EDGAR EFTS Daily, 6 AM ET Yes

What is the SEC EDGAR API?

The SEC provides a free, public REST API for accessing EDGAR data — no API key, no registration, no cost. It consists of three main systems: the EFTS full-text search API for searching filing content, the submissions API for looking up filings by company, and the XBRL financial data API for structured financial statement data. All three are used in production at BullishAgent.

The only requirement is a descriptive User-Agent header on every request. Without it, the SEC will block your IP (HTTP 403). The rate limit is 10 requests per second per IP address.

# Required on every EDGAR API request — use your app name and contact email headers = {"User-Agent": "MyApp contact@myemail.com"} # Rate limit: 10 requests/second — add a small sleep between calls import time time.sleep(0.1) # 100ms between calls to stay well under the limit

The EDGAR Full-Text Search System (EFTS) at efts.sec.gov indexes the complete text of all EDGAR filings. You can filter by form type, date range, company name, or keyword — and get results as JSON without any HTML parsing. This is the fastest way to find all 8-K filings, NT filings, or activist disclosures filed in a given time window.

import requests # Search for all 8-K filings in the last 7 days url = "https://efts.sec.gov/LATEST/search-index" params = { "forms": "8-K", "dateRange": "custom", "startdt": "2026-05-18", "enddt": "2026-05-25", "_source": "file_date,entity_name,file_num,period_of_report", } r = requests.get(url, params=params, headers={"User-Agent": "MyApp me@email.com"}) hits = r.json()["hits"]["hits"] for h in hits: src = h["_source"] print(src["entity_name"], src["file_date"]) # Also works for: forms=SC+13D, forms=4, forms=13F-HR, forms=NT+10-K

How do I get all SEC filings for a company using Python?

The EDGAR submissions API at data.sec.gov/submissions/ returns every filing ever made by a given entity, identified by their CIK (Central Index Key). CIKs are 10-digit zero-padded numbers. You can look up a company's CIK on the EDGAR company search page, or via the company facts API.

import requests # Apple Inc. CIK = 0000320193 — zero-pad to 10 digits cik = "0000320193" url = f"https://data.sec.gov/submissions/CIK{cik}.json" r = requests.get(url, headers={"User-Agent": "MyApp me@email.com"}) data = r.json() # Recent filings are in data["filings"]["recent"] filings = data["filings"]["recent"] for i, form in enumerate(filings["form"]): if form == "8-K": accession = filings["accessionNumber"][i] filed = filings["filingDate"][i] print(accession, filed) # For companies with many filings, older ones are in data["filings"]["files"] # — fetch each additional JSON file listed there

How do I download and parse an SEC filing document?

Each filing has an accession number in the format 0001234567-26-000001. Remove the dashes to get the folder name. The filing index lists all documents in the submission — find the primary document (usually an .htm file) and download it.

import requests from bs4 import BeautifulSoup cik = "320193" accession = "0000320193-26-000045" acc_clean = accession.replace("-", "") # "000032019326000045" # 1. Fetch the filing index to find the primary document index_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{acc_clean}-index.json" index = requests.get(index_url, headers={"User-Agent": "MyApp me@email.com"}).json() # 2. Find the primary .htm document primary = [f for f in index["directory"]["item"] if f["type"] == "8-K"][0] doc_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{primary['name']}" # 3. Download and parse with BeautifulSoup html = requests.get(doc_url, headers={"User-Agent": "MyApp me@email.com"}).text soup = BeautifulSoup(html, "html.parser") text = soup.get_text(separator="\n", strip=True)

How do I parse Form 4 insider trading data with Python?

Form 4 filings are structured XML — unlike 8-Ks which are free-form HTML. The XML schema is consistent across all filers, which makes parsing straightforward. The key fields are in the nonDerivativeTransaction and derivativeTransaction elements.

import requests import xml.etree.ElementTree as ET # Form 4 XML URL pattern cik = "1318605" # Elon Musk's CIK acc_clean = "000131860526000012" xml_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/0001318605-26-000012.xml" r = requests.get(xml_url, headers={"User-Agent": "MyApp me@email.com"}) root = ET.fromstring(r.text) ticker = root.findtext(".//issuerTradingSymbol") name = root.findtext(".//rptOwnerName") for txn in root.findall(".//nonDerivativeTransaction"): code = txn.findtext(".//transactionAcquiredDisposedCode/value") # 'A' or 'D' shares = txn.findtext(".//transactionShares/value") price = txn.findtext(".//transactionPricePerShare/value") ttype = txn.findtext(".//transactionCode/value") # 'P'=purchase, 'S'=sale print(ticker, name, ttype, shares, price)

What are the EDGAR API rate limits?

The SEC enforces a 10 requests per second limit per IP address under their fair-access policy. Exceeding this results in a temporary block. For bulk data collection, the SEC also provides pre-built bulk data downloads — full quarterly archives of all EDGAR filings — which are faster than hitting the API for historical data.

# Bulk filing index — all filings for a given quarter (no rate limit concerns) # Format: https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{1-4}/company.idx GET https://www.sec.gov/Archives/edgar/full-index/2026/QTR1/company.idx # Also available as JSON: crawler.idx, form.idx, full-index.json # Each line: company name | form type | CIK | date | filename # For real-time monitoring, use EFTS with a date filter instead of polling submissions GET https://efts.sec.gov/LATEST/search-index?forms=8-K&dateRange=custom&startdt=2026-05-25&enddt=2026-05-25
Building vs using BullishAgent: The EDGAR API is genuinely good — free, reliable, and comprehensive. The hard parts are not the fetching: they're the parsing (every filer's HTML is different), the classification (Item 5.02 with "resigned" signals something different from Item 5.02 with "appointed"), and the AI summarization at scale. BullishAgent has already solved these for 8-K, Form 4, 13D/13G, and 13F filings — and the result runs daily on the EDGAR filings feed. The code above is exactly how we do it.