The MCP Pattern: SQLite as the AI-Queryable Cache
I keep building the same thing.
Not the same product — the products are different. One indexes a Hugo blog. One indexes AI conversations. One consolidates medical records from three hospitals. One catalogs a hundred git repositories. But underneath, they all have the same skeleton. After the fifth time, I think the skeleton deserves a name.
The pattern
Domain files (ground truth)
↓ index
SQLite database (read-only cache, FTS5)
↓ expose
MCP server (tools + resources → AI assistant)
That’s it. Three layers. The domain files are always canonical — the database is a disposable cache you can rebuild from them at any time. SQLite gives you structured queries, full-text search, and JSON extraction over data that was previously trapped in flat files. MCP exposes it to an AI assistant that can write SQL, retrieve content, and (in some cases) create new content.
Here’s the inventory:
| Project | Domain | Ground Truth | What the MCP Exposes |
|---|---|---|---|
| hugo-memex | Blog content | Markdown files with YAML front matter | 951 pages, FTS5 search, taxonomy queries, JSON front matter extraction |
| memex | AI conversations | ChatGPT/Claude/Gemini exports | Conversation trees, FTS5 message search, tags, enrichments |
| chartfold | Medical records | Epic, MEDITECH, athenahealth exports | Labs, meds, encounters, imaging, pathology, cross-source reconciliation |
| arkiv | Personal archives | JSONL files from various sources | Unified SQL over heterogeneous personal data |
| repoindex | Git repositories | Local git repos + GitHub/PyPI/CRAN metadata | Repository catalog with activity tracking, publication status |
Five projects. Five completely different domains. One architecture.
Why SQLite
SQLite is the most deployed database in history. It’s on every phone, every browser, every Python installation. But that’s not why I use it.
I use it because it solves three problems at once:
Structured queries over unstructured data. Hugo front matter is YAML trapped inside markdown files. Medical records are scattered across three incompatible EHR export formats. AI conversations are JSON trees with branching paths. SQLite turns all of these into tables you can JOIN, GROUP BY, and aggregate. json_extract() handles the long tail of fields that don’t fit a fixed schema.
Full-text search. FTS5 with porter stemming and unicode61 tokenization gives you relevance-ranked search across any text corpus. No Elasticsearch, no external service, no running daemon. Just a virtual table that lives in the same database file.
...Read more →