Fine-Tuning a Tiny LLM for ElasticSearch DSL

February 19, 2024 admin 1 min read Updated: March 16, 2026

I am fine-tuning a small LLM to generate ElasticSearch DSL queries from natural language. The project is on GitHub.

The idea: take a task that large models handle well (translating “find all orders over $100 from last month” into the corresponding ElasticSearch JSON query) and see if a tiny model can learn it from synthetic data.

The data pipeline works like this. I started by generating examples from GPT-4. Then I wrote a script that samples from those outputs and uses them as few-shot examples for Mistral, which generates a much larger synthetic dataset. The next step is reshaping the data into the expected format and fine-tuning.

This is early stage. The synthetic data exists, the fine-tuning has not happened yet. I will update this post with results.

If you are interested in collaborating on this or related projects, email me at lex@metafunctor.com.

Related Posts

Approximations of Solomonoff Induction

Model Selection in Weibull Series Systems

The Bayesian Prediction Framework

JAF: Streaming Boolean Algebra Over Nested JSON

jsonl-algebra: Relational Algebra for Nested JSON

Discussion