SourcesBudget PDF· 2h|Legistar· 18m|Open Data· 47m|KPPA Pension· 4d|EMMA Bonds· 1dBuildsha256:dev·Next ingest: 23m
/ methodology · how lexdoge works

Methodology

LexDOGE is an open-source civic accountability platform for Lexington-Fayette Urban County Government (LFUCG). It ingests public records, runs statistical anomaly detection, and publishes human-reviewed findings. Every claim on this site links back to a primary source. Every AI-generated finding passes through a tiered review system before publication.

Data sources

Annual adopted / proposed budgets (~500pp). Parsed with pymupdf4llm + pdfplumber into structured line items.
pdf
annual
Comprehensive Annual Financial Reports — audited statements, pension data, debt schedules.
pdf
annual
Council legislation, contracts, agendas, minutes, votes. Polled daily.
json api
daily
Socrata-backed datasets: GIS, permits, 311 service requests, expenditures.
socrata
daily
Kentucky Open Records Act requests filed via MuckRock — human-authorized before submission.
request
ad-hoc
Kentucky Public Pensions Authority disclosures — CERS funded-ratio, employer contribution rates.
pdf / xlsx
quarterly
MSRB EMMA bond disclosures — LFUCG general obligation and revenue bond filings.
xml
as-filed

AI agent system

Ingestion
Parses PDFs and structured feeds. Extracts line items, generates embeddings, normalizes vendor names.
claude-haiku-4-5
Research
Deep-research over the knowledge base + external sources. Returns cited findings to the report writer.
claude-sonnet-4
Anomaly Detection
Runs statistical tests on budget + expenditure data. Surfaces unusual patterns to human reviewers.
claude-sonnet-4
Report Writer
Generates long-form analysis grounded in source documents. Every claim has a citation.
claude-sonnet-4
Meeting
Summarizes council meetings, extracts fiscal-impact items, links to Legistar agenda entries.
claude-haiku-4-5
FOIA
Drafts open records requests. Never submits without explicit human authorization.
claude-sonnet-4
Watchdog
Cross-checks council votes, contract awards, and bond filings against the public record.
claude-sonnet-4

Statistical methods

Benford's Law
Tests the distribution of leading digits in financial data against the expected log-distribution. Significant deviations suggest data irregularities or fabrication.
χ² > 15.5
YoY Spike Detection
Flags line items with >25% increases not paired with a corresponding budget amendment or council action.
>25% Δ
Threshold Avoidance
Identifies clusters of transactions just below known approval thresholds ($20k, $30k, $50k). Suggests possible contract splitting.
bunching ratio
Duplicate Payment
Fuzzy matching on vendor + amount + date (±5 days) to surface potential duplicate payments.
Levenshtein < 3
Vendor Concentration
Flags departments where >40% of discretionary spending flows to a single vendor over a fiscal year.
HHI > 0.4
Contract Splitting
Detects multiple awards to the same vendor within 90 days, where the combined total exceeds a council-approval threshold.
sum > threshold
Sole-Source Concentration
Tracks % of contract dollars awarded without competitive bidding, by department and over time.
>30% non-bid
anomaly flags are statistical observations, not allegations of wrongdoing. all flags are reviewed before publication.

Review tiers

Tier 1 reviewed
Pure factual data, >95% confidence. Auto-published.
examplesAdopted budget totals, council vote tallies, contract award amounts.
Automated checks only
Tier 2 reviewed
Content mentioning named entities, 80–95% confidence.
examplesVendor concentration summaries, departmental YoY changes.
Light editorial pass
Tier 3 reviewed
Anomaly flags naming specific entities, or <80% confidence.
examplesThreshold-avoidance flags on a named vendor, sole-source concentration callouts.
Full editorial review
Tier 4 reviewed
Any content that could be construed as alleging wrongdoing.
examplesReports implying impropriety, duplicate-payment claims naming an official.
Editorial + legal review

Limitations

read this before citing lexdoge
  • AI analysis can produce errors. Hallucination is reduced by strict source-grounding, but all findings should be independently verified before being cited or republished.
  • PDF parsing may miss or misinterpret data, especially from older scanned documents or non-standard tables.
  • Anomaly flags are statistical observations, not allegations. A flag indicates an unusual pattern, not wrongdoing.
  • Data availability depends on what LFUCG, KPPA, EMMA, and the Open Data Portal publish. Some information may be delayed, incomplete, or redacted.
  • Confidence scores are model self-reports, calibrated against hand-labeled samples. They are heuristics, not guarantees.

Awaiting ingestion

Pages and panels marked awaiting ingestion are wired to the production API but the agent system has not yet populated the knowledge base. The ingestion pipeline requires an OPENAI_API_KEY (used for embeddings) and an ANTHROPIC_API_KEY (used for the agents above). Until those are configured and the first full ingest completes, fallback values shown on the site are taken directly from the underlying public source documents and labeled as such.

Press inquiries

For interviews, dataset access, embargo, or to flag a specific finding before publication, email info@lexdoge.org. Source code, ingestion scripts, and the full anomaly-detection test suite are public.

Report an error

If you believe any information on this site is inaccurate, please open an issue on the GitHub repository. We investigate and correct promptly — corrections are logged publicly with the offending revision.