Chapter 9

How BullishAgent
Works with EDGAR

Every filing summary, insider signal, and activist alert you see on BullishAgent starts with raw data from SEC EDGAR. This chapter explains how we fetch, parse, classify, and summarize EDGAR filings — and how you can access the same raw data yourself.

22,360

Filings processed

17,499

AI takes generated

24,991

Insider trades

10,301

13D / 13G filings

Data sources

BullishAgent pulls from three distinct EDGAR systems, each optimized for different filing types:

EDGAR Full-Text Search (EFTS)

Primary source for 8-K filings, NT filings, S-1/S-3/424B registration statements, and SC TO (tender offers). EFTS indexes the full text of filings and supports filtering by form type, date range, and keyword. We poll this endpoint on a daily cron to discover new filings.

efts.sec.gov

EDGAR Submissions API

Used for Form 4 insider trades, Schedule 13D/13G institutional filings, and 13F quarterly holdings. The submissions endpoint returns all filings for a given CIK (company or person) in structured JSON — no HTML parsing required.

data.sec.gov

EDGAR XBRL / Financial Data API

For structured financial data (revenue, EPS, cash flow) from 10-K and 10-Q filings. EDGAR's XBRL viewer exposes tagged financial data in machine-readable JSON, bypassing the need to parse PDF or HTML financial statements.

data.sec.gov/api/xbrl

The pipeline — fetch to display

Every filing goes through the same five-stage pipeline before it appears on BullishAgent:

Fetch

Daily cron (6 AM ET) queries EDGAR EFTS for all 8-K filings filed in the past 24 hours. A separate cron handles Form 4 and 13D/13G via the submissions API. 13F quarterly holdings run once per quarter after the filing deadline.

Parse

Each filing's HTML or XML is downloaded and parsed. For 8-Ks, we extract the item number, filing entity, and the full text of each item section. For Form 4, the XML schema is structured — ticker, insider name, transaction type, shares, and price are direct fields. For 13F, we parse the information table XML for each holding.

Classify

The item number (for 8-Ks) or form type (for 13D/13G) is mapped to an event type: exec_change, acquisition, auditor_change, default, equity_issuance, etc. A supplementary rule set checks for keywords that indicate subtype — e.g., an 8-K Item 1.01 that mentions "merger agreement" is classified differently than one mentioning "licensing agreement."

Summarize

For 8-K filings with sufficient text, we pass the extracted section text to Claude Haiku with a prompt requesting a single factual sentence describing what happened and why it matters. The AI take is stored alongside the raw filing data. Summarization runs only once per filing — we don't re-summarize.

Store & serve

Parsed records are written to MySQL. The web application reads from the database in real time — no intermediate caching layer. Filing feed pages query the most recent N records; stock pages join on ticker to show filing history.

Event classification — how we label filings

An 8-K filing can disclose almost anything — a CEO departure, a debt default, a new customer contract, or a going concern opinion. The item number gives a coarse category; we map it to a typed event that drives how the filing is displayed and filtered.

Event type	Maps from	AI takes
other	—	5,470
reg_fd	8-K Item 7.01 (Reg FD disclosures)	2,768
material_event	8-K Item 8.01 (unclassified material events)	2,729
exec_change	8-K Item 5.02 (officer/director changes)	2,413
agreement	8-K Item 1.01 (material agreements)	896
earnings_results	8-K Item 2.02 (earnings results)	779
financing	8-K Item 2.03 (new debt obligations)	591
late_filing	NT 10-K / NT 10-Q filings	407
acquisition	8-K Item 1.01 + merger/acquisition keywords	213
amendment	8-K/A amendments to prior filings	207

AI take generation — Claude Haiku

BullishAgent uses Claude Haiku (Anthropic's fast, low-cost model) to generate single-sentence filing summaries — what we call "AI takes." The goal of the prompt is a factual, information-dense sentence that tells a reader what happened and why it might matter, without editorializing or speculating beyond what the filing states.

The prompt passes the relevant section text (not the whole filing) and explicitly instructs the model to:

# Simplified version of the summarization prompt
prompt = """You are summarizing a section of an SEC 8-K filing.
Write exactly one factual sentence (max 180 characters) describing what happened.
Be specific: name the people, dollar amounts, and companies involved.
Do not speculate. Do not use phrases like 'may impact' or 'could affect'.
State facts only.

Filing section:
{section_text}"""
    

Each AI take is generated once and stored. We do not re-run summarization unless the filing is reprocessed. For filings with very short or boilerplate text (under ~100 words), we skip AI summarization and store a rule-based summary instead.

The model sees only the filing text — no ticker price, no market context, no prior AI takes. This keeps the summaries grounded in what was actually disclosed.

What's covered — and update frequency

Filing type	Source	Update frequency	AI summarized
8-K (all items)	EDGAR EFTS	Daily, 6 AM ET	Yes — item section text
NT 10-K / NT 10-Q	EDGAR EFTS	Daily, 6 AM ET	Yes — late filing context
Form 4 insider trades	EDGAR Submissions API	Daily, 6 AM ET	No — structured data only
SC 13D / 13G	EDGAR Submissions API	Daily, 6 AM ET	Yes — purpose section
13F quarterly holdings	EDGAR Submissions API	Quarterly (45d lag)	No — structured data only
S-1 / S-3 / 424B offerings	EDGAR EFTS	Daily, 6 AM ET	Yes — offering summary
SC TO-I / SC TO-T (tender offers)	EDGAR EFTS	Daily, 6 AM ET	Yes

What is the SEC EDGAR API?

The SEC provides a free, public REST API for accessing EDGAR data — no API key, no registration, no cost. It consists of three main systems: the EFTS full-text search API for searching filing content, the submissions API for looking up filings by company, and the XBRL financial data API for structured financial statement data. All three are used in production at BullishAgent.

The only requirement is a descriptive User-Agent header on every request. Without it, the SEC will block your IP (HTTP 403). The rate limit is 10 requests per second per IP address.

# Required on every EDGAR API request — use your app name and contact email
headers = {"User-Agent": "MyApp contact@myemail.com"}

# Rate limit: 10 requests/second — add a small sleep between calls
import time
time.sleep(0.1)  # 100ms between calls to stay well under the limit
    

How do I search SEC filings by keyword or form type? (EDGAR EFTS)

The EDGAR Full-Text Search System (EFTS) at efts.sec.gov indexes the complete text of all EDGAR filings. You can filter by form type, date range, company name, or keyword — and get results as JSON without any HTML parsing. This is the fastest way to find all 8-K filings, NT filings, or activist disclosures filed in a given time window.

import requests

# Search for all 8-K filings in the last 7 days
url = "https://efts.sec.gov/LATEST/search-index"
params = {
    "forms": "8-K",
    "dateRange": "custom",
    "startdt": "2026-05-18",
    "enddt":   "2026-05-25",
    "_source": "file_date,entity_name,file_num,period_of_report",
}
r = requests.get(url, params=params, headers={"User-Agent": "MyApp me@email.com"})
hits = r.json()["hits"]["hits"]

for h in hits:
    src = h["_source"]
    print(src["entity_name"], src["file_date"])

# Also works for: forms=SC+13D, forms=4, forms=13F-HR, forms=NT+10-K
    

How do I get all SEC filings for a company using Python?

The EDGAR submissions API at data.sec.gov/submissions/ returns every filing ever made by a given entity, identified by their CIK (Central Index Key). CIKs are 10-digit zero-padded numbers. You can look up a company's CIK on the EDGAR company search page, or via the company facts API.

import requests

# Apple Inc. CIK = 0000320193 — zero-pad to 10 digits
cik = "0000320193"
url = f"https://data.sec.gov/submissions/CIK{cik}.json"
r = requests.get(url, headers={"User-Agent": "MyApp me@email.com"})
data = r.json()

# Recent filings are in data["filings"]["recent"]
filings = data["filings"]["recent"]
for i, form in enumerate(filings["form"]):
    if form == "8-K":
        accession = filings["accessionNumber"][i]
        filed     = filings["filingDate"][i]
        print(accession, filed)

# For companies with many filings, older ones are in data["filings"]["files"]
# — fetch each additional JSON file listed there
    

How do I download and parse an SEC filing document?

Each filing has an accession number in the format 0001234567-26-000001. Remove the dashes to get the folder name. The filing index lists all documents in the submission — find the primary document (usually an .htm file) and download it.

import requests
from bs4 import BeautifulSoup

cik        = "320193"
accession  = "0000320193-26-000045"
acc_clean  = accession.replace("-", "")  # "000032019326000045"

# 1. Fetch the filing index to find the primary document
index_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{acc_clean}-index.json"
index = requests.get(index_url, headers={"User-Agent": "MyApp me@email.com"}).json()

# 2. Find the primary .htm document
primary = [f for f in index["directory"]["item"] if f["type"] == "8-K"][0]
doc_url = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/{primary['name']}"

# 3. Download and parse with BeautifulSoup
html = requests.get(doc_url, headers={"User-Agent": "MyApp me@email.com"}).text
soup = BeautifulSoup(html, "html.parser")
text = soup.get_text(separator="\n", strip=True)
    

How do I parse Form 4 insider trading data with Python?

Form 4 filings are structured XML — unlike 8-Ks which are free-form HTML. The XML schema is consistent across all filers, which makes parsing straightforward. The key fields are in the nonDerivativeTransaction and derivativeTransaction elements.

import requests
import xml.etree.ElementTree as ET

# Form 4 XML URL pattern
cik       = "1318605"   # Elon Musk's CIK
acc_clean = "000131860526000012"
xml_url   = f"https://www.sec.gov/Archives/edgar/data/{cik}/{acc_clean}/0001318605-26-000012.xml"

r    = requests.get(xml_url, headers={"User-Agent": "MyApp me@email.com"})
root = ET.fromstring(r.text)

ticker = root.findtext(".//issuerTradingSymbol")
name   = root.findtext(".//rptOwnerName")

for txn in root.findall(".//nonDerivativeTransaction"):
    code   = txn.findtext(".//transactionAcquiredDisposedCode/value")  # 'A' or 'D'
    shares = txn.findtext(".//transactionShares/value")
    price  = txn.findtext(".//transactionPricePerShare/value")
    ttype  = txn.findtext(".//transactionCode/value")  # 'P'=purchase, 'S'=sale
    print(ticker, name, ttype, shares, price)
    

What are the EDGAR API rate limits?

The SEC enforces a 10 requests per second limit per IP address under their fair-access policy. Exceeding this results in a temporary block. For bulk data collection, the SEC also provides pre-built bulk data downloads — full quarterly archives of all EDGAR filings — which are faster than hitting the API for historical data.

# Bulk filing index — all filings for a given quarter (no rate limit concerns)
# Format: https://www.sec.gov/Archives/edgar/full-index/{year}/QTR{1-4}/company.idx
GET https://www.sec.gov/Archives/edgar/full-index/2026/QTR1/company.idx

# Also available as JSON: crawler.idx, form.idx, full-index.json
# Each line: company name | form type | CIK | date | filename

# For real-time monitoring, use EFTS with a date filter instead of polling submissions
GET https://efts.sec.gov/LATEST/search-index?forms=8-K&dateRange=custom&startdt=2026-05-25&enddt=2026-05-25
    

Building vs using BullishAgent: The EDGAR API is genuinely good — free, reliable, and comprehensive. The hard parts are not the fetching: they're the parsing (every filer's HTML is different), the classification (Item 5.02 with "resigned" signals something different from Item 5.02 with "appointed"), and the AI summarization at scale. BullishAgent has already solved these for 8-K, Form 4, 13D/13G, and 13F filings — and the result runs daily on the EDGAR filings feed. The code above is exactly how we do it.

How BullishAgentWorks with EDGAR