Under constructionThis deployment is live for testing only. Data, features, and findings may change without notice. To contribute — code, tips, corrections, or legal review — info@lexdoge.org.
SourcesBudget PDF· 2h|Legistar· 18m|Open Data· 47m|KPPA Pension· 4d|EMMA Bonds· 1dBuildsha256:dev·Next ingest: 23m
ROADMAP

Everything LexDOGE wants to be

What this is. The full picture of what LexDOGE wants to be — every capability planned, every audience served, every system designed. No dates, no quarters, no commitments. Just a map of the surface area.

What this is not. A schedule. A promise. A priority list with delivery timelines. Build order emerges from contributor capacity, funding, document availability, and what the city is doing in any given month — it is not pre-decided here. For point-in-time status, see the project status in the README.

How to use it. Find the row that fires you up. Open an issue, or pick up the linked module spec. Everything here is fair game — contribute anonymously, pseudonymously, or under your real name, your choice. See /about #community for channels.

How to read the status flags

live
live
Shipped and operating against real LFUCG data in production.
partial
partial
Foundational scaffolding exists in the repo. Not user-facing or not yet running on real data.
specified
specified
An executable module spec or ADR exists. Anyone can pick it up and build.
vision
vision
Named here as part of the project's intent. Not yet specified in depth. Open to scoping by anyone.

These flags describe reality, not aspiration. They can move in either direction. A capability is only live when it is running on real LFUCG data in production.

§1 Foundation — the public-record substrate

Every other capability in this roadmap depends on this layer. LexDOGE covers every public entity a Lexington taxpayer funds — not just LFUCG. Foundation is organized by entity (who collects the tax dollar and writes the audit) plus cross-entity infrastructure (storage, search, lineage) that all sources share.

LFUCG · lexington-fayette urban county government
  • live
    LFUCG annual budget (adopted + proposed PDFs → structured line items)
    1,026 line items FY25+FY26 in prod
  • live
    LFUCG CAFR (Comprehensive Annual Financial Report) — pension, debt, audit findings
    pension_data, debt_obligations, debt_service_schedule populated (Round 5)
  • live
    Council legislation, agendas, minutes, votes (Legistar API)
    16,923 legislation rows + 3,238 contracts in prod, daily cron
  • live
    Contracts + vendor extraction from council legislation
    2,729 contracts with extracted vendor (Round 7+)
  • live
    FOIA / Kentucky Open Records requests (MuckRock API v2)
    JWT pipeline operational; submission still human-gated
  • live
    Open-data portal ingest (ArcGIS Hub: parcels, zoning, addresses, schools, STR registry, voting precincts, council districts, parks, historic districts, H-1 overlay — +143 more datasets available)
    6,796 rows across 10 LFUCG datasets; DCAT-US catalog ingested; daily Celery Beat refresh
  • vision
    Granicus meeting media (audio/video, captions, RSS) for LFUC Council
    primary source for Tier-2/3 meeting summaries
FCPS · fayette county public schools
  • live
    FCPS adopted + tentative budget (district financials)
    school_district_finances seeded FY26 working budget $827.2M (Round 6)
  • specified
    FCPS audited financial statements (independent CPA, annual)
    primary source replaces secondary-press magnitude errors (see methodology)
  • specified
    KDE District Financial Profile (state-level audit overlay)
    education.ky.gov/districts/FinRept — Kentucky Dept of Education
  • specified
    FCPS BoardDocs governance feed (agendas, minutes, board policies)
    parallel to LFUCG's Legistar — different vendor, same role
  • specified
    FCPS Open Records portal (KORA requests directed at the district)
    separate channel from LFUCG MuckRock pipeline
  • specified
    FCPS Council Voting Record + Alignment (school board roll-calls)
    Module 09 covers both LFUC Council and FCPS Board
other entities · lextran, lfchd, library, bgadd, pva, sheriff
  • specified
    LexTran (Lexington Transit Authority) — board resolutions, operating budget, federal grant disclosures
    FY26 ops budget $37.97M; ~70% from 6¢/$100 property tax
  • specified
    LFCHD (Lexington-Fayette County Health Dept.) — audited financials, 2.43¢/$100 health levy
    FY24 audited $26.27M revenues — source-of-truth must be the audit, not press
  • specified
    Lexington Public Library — board minutes, audited financials, dedicated property-tax allocation
    ~$24M magnitude per Library Board orientation materials
  • specified
    BGADD (Bluegrass Area Development District) — federal pass-through, regional planning
    FY25 total receipts $9.0M per KY Legislature ADD Annual Report
  • specified
    Fayette County PVA (Property Valuation Administrator) — parcel-level valuations + tax-roll exports
    feeds Module 06 — Property Tax + Parcel-Level Data
  • specified
    Fayette County Sheriff — property-tax collection statements (how the bill is actually billed)
    the missing piece between PVA valuation and entity allocation
  • vision
    Independent agencies + special districts (LFUCG-adjacent, e.g. Airport Board, BlueGrass Tomorrow)
    scoping needed — each appears in CAFR component-units footnotes
cross-entity infrastructure · pensions, bonds, archive, search, lineage
  • specified
    KPPA pension disclosures (CERS funded-ratio, employer rates) — county-employer view
    covers LFUCG CERS + FCPS KTRS in one pipeline; kyret.ky.gov primary source
  • specified
    EMMA bond filings (MSRB disclosures) — LFUCG GO + revenue bonds + FCPS school bonds
    emma.msrb.org continuing-disclosure API; closes the bondholder-view gap
  • specified
    Full-text document search across all ingested records
    pg_trgm + tsvector over budgets, CAFRs, minutes, FOIA responses, contracts
  • live
    Embedding index over ingested chunks for semantic search (pgvector)
    600+ chunks indexed and growing; OpenAI text-embedding-3-small
  • partial
    Cloudflare R2 raw-document archive (every primary source preserved with content-hash lineage)
    archive_document() + archive_bytes() helpers wired into budget/CAFR/MuckRock; activates on R2 token mint — see apps/api/.env.example
  • partial
    Source-document provenance + content-hash lineage (every claim traces to PDF + page + hash)
    content_hash + r2_key + archived_at columns added in m30; loop closes when R2 archive activates
  • vision
    Lincoln Institute fiscally-standardized cities dataset (cross-city benchmarks)
    context for 'is Lexington's burden typical for a 330k-population peer?'
  • specified
    KORA/FOIA response archive — every response document ingested + searchable
    feeds Module 03 — Open Records Request Tracker

§2 Analysis — what the agents do with the substrate

Once data is ingested, agents analyze it. Methods are documented, reproducible, and described in plain English on the public methodology page.

statistical anomaly detection
  • live
    Benford's Law (leading-digit distribution)
  • live
    Year-over-year spike detection
    >25% Δ without matching council action
  • live
    Threshold-avoidance / contract-splitting
    clusters just below $20k / $30k / $50k thresholds
  • live
    Duplicate-payment detection
    fuzzy vendor+amount+date matching
  • live
    Vendor concentration (HHI > 0.4)
  • specified
    Sole-source concentration over time
  • vision
    Distributional anomaly across years
    longitudinal pattern detection — open scope
  • vision
    Network analysis (vendor → council member → committee assignment)
cross-cutting analytical modules
  • specified
    Whistleblower Channel — cryptographic intake of insider tips with metadata stripping
  • specified
    Settlements + Litigation Ledger — every settlement and judgment paid by LFUCG
  • specified
    Open Records Request Tracker — public log of every KORA request, response time, redactions
  • specified
    Public Payroll Search — name-searchable employee database with base, OT, longevity, total comp
  • specified
    Campaign Finance + Lobbying Overlay — donations + lobbying joined to votes and awards
  • specified
    Property Tax + Parcel-Level Data — who owns Lexington, who pays the bill, who got an exemption
  • specified
    Contract Lifecycle Tracker — solicitation → award → sole-source justification → change orders
  • specified
    Per-District Dashboard — fifteen council districts × one accountability dashboard each
  • specified
    Council Voting Record + Alignment — full roll-call history (LFUC Council + FCPS Board)
  • specified
    Public Safety Metrics — use of force, complaints, settlements, response times, overtime

§3 Publication — what reaches the public

Anomalies, findings, and analyses become public artifacts. Every artifact carries citations to primary sources, a confidence score, and a tier classification (1 = automated; 4 = legal review required).

  • live
    Live budget dashboard (department breakdown, fund composition, YoY)
  • live
    Anomaly feed with per-flag citations
    503 anomalies live on /anomalies with plain-English humanizer + evidence disclosure
  • partial
    Long-form reports (Tier 3/4, agent-drafted, human-reviewed)
    report-writer agent exists; first publication pending
  • partial
    Council meeting summaries with fiscal-impact extraction
  • partial
    FOIA log on /foia with status, responses, produced documents
  • live
    Glossary tooltips on every technical term used on the site
    glossary.ts + Term component shipped, 31 terms
  • specified
    Corrections + retractions log with timestamped diffs
  • vision
    RSS / Atom feeds per category (anomalies, reports, FOIA, meetings)
  • specified
    Embed widgets (per-department chart, per-vendor table, FOIA tracker)

§4 Distribution — getting findings to those who can act

A finding nobody sees is no finding at all. Distribution is a first-class concern.

  • specified
    Email Alerts + Subscriptions — vendor watchlists, anomaly thresholds, department beats
    Module 10 — converts visitors into beat followers
  • specified
    Public API + Journalist Kits — REST + bulk exports + reproducibility kits + journalist program
    Module 11 — turns LexDOGE into civic-data infrastructure
  • specified
    News Monitoring + Autonomous Reports Pipeline — sensor-to-publish loop
    Module 18 — the largest single new build
  • vision
    Social auto-poster (anomalies + reports → X / Bluesky / Mastodon with citations)
  • partial
    Press kit / journalist onboarding flow

§5 Community interface — how Lexington engages

LexDOGE serves residents directly. Every interface here is designed to lower the bar to participation — for tipsters, contributors, and casual readers.

  • partial
    Anonymous / pseudonymous / identifiable contribution surface
    email + GitHub today; Module 01 is the upgrade
  • live
    /about page surfacing mission, independence, AI-experiment framing
  • live
    In-page glossary for civic terminology (TIF, CAFR, OPEB, KORA, etc.)
  • specified
    District-localized views (show me my council district)
  • live
    Mobile responsiveness across every page
    viewport meta + 3 breakpoint bands + table overflow scroll

§6 Governance and trust — transparency about the project itself

LexDOGE asks public institutions to disclose how they operate. It owes the public the same standard about itself.

  • live
    Public ADRs documenting every architectural / policy decision
    seven published; more expected
  • live
    Public methodology page describing data sources, agents, anomaly methods, tier system
  • live
    Forbidden-words content policy enforced by the codebase, not just the docs
  • specified
    Public corrections + retractions log
  • specified
    Self-disclosure: budget, funding sources, infrastructure costs, governing body, vendors
  • live
    Public source code for everything: parsers, prompts, thresholds, gates
    AGPL-3.0
  • vision
    501(c)(3) status + IRS Form 990 published as soon as filed

§7 Resilience and autonomy — what keeps the system honest

The project's premise is that AI agents can run a civic watchdog at low cost with bounded human oversight. That premise is only credible if the system can detect its own failures and recover from them.

  • specified
    Self-Monitoring + Resilience — every cron, parser, queue has health checks and SLOs
    Module 19 — load-bearing for autonomy
  • specified
    Autonomy Audit — end-to-end review of where humans sit in the loop
    guides which gates can be safely automated
  • live
    Daily Discord digest of ingest activity (new PDFs, new FOIA responses, parser deltas)
    webhook plumbing shipped; webhook URL pending
  • partial
    Sentry observability across web + API + worker
    initialized; alerts pending
  • vision
    Automated cost monitoring (LLM spend per agent, per module, per finding)
  • vision
    Per-agent A/B testing infrastructure (compare prompts, models, thresholds)
    research-grade tooling
  • specified
    Reproducibility kit — given a finding, replay the exact pipeline that produced it
    element of Module 11

§8 Multi-jurisdiction and the larger project

LexDOGE serves Lexington-Fayette first. But the codebase, methodology, and agent stack are not Lexington-specific. The intent is that any community wanting a civic-transparency dashboard can fork LexDOGE and inherit everything.

  • live
    AGPL-3.0 license that closes the SaaS-vendor loophole
    ADR-001
  • live
    Fork-over-multi-tenant architecture (each jurisdiction is its own deployment)
    ADR-002
  • vision
    Configuration-driven jurisdiction setup (one config file → new city)
    partially possible today; full extraction not yet done
  • vision
    Reference fork documentation (how to clone LexDOGE for Knoxville, Louisville, …)
  • live
    Shared upstream improvements flow back to canonical repo
  • vision
    Cross-jurisdiction benchmark dashboard (using Lincoln Institute's FiSC dataset)

§9 The bigger questions

These are not modules. They are the questions the project is, on its longest view, trying to answer — in public, by working.

  1. Can autonomous AI agents do investigative civic work at a quality and consistency journalists and auditors will trust? The answer must be demonstrated, not asserted.
  2. Can a small civic project, run by agents, monitor public finance with the depth that previously required an institutional newsroom?
  3. Can pseudonymous community contribution be a first-class pattern in civic transparency, without becoming an attack surface for bad actors?
  4. Can other communities adopt this codebase and produce findings as strong as Lexington's? The fork model exists. The replication does not — yet.
  5. What does positive-sum AI for public goods look like at scale? LexDOGE is one data point. More are needed.

How to add to this roadmap

Anyone can. Two paths:

  1. Adding a new capability. Open an issue with the rough idea, the audience it serves, and any references. Once it has at least a paragraph of motivation, it lands here as vision. Once someone writes a real spec for it, it advances to specified.
  2. Advancing an existing capability. Pick up the module spec or open issue, implement it, and the next maintainer pass will update the flag.

There is no editorial gate on what counts as “in scope.” LexDOGE is a civic-data project for Lexington-Fayette. If something serves that mission and meets the legal and editorial bar in ADR-003, it belongs here.

This is a living document. It describes what the project intends to be, not what has been promised. Build order is not encoded here. See the README's Project Status for what is shipping right now.