Ask a Tiny Mind

A small language model running locally in your browser via WebAssembly. Pick a size below and click Load: the GGUF downloads once and is cached, then every token of inference happens on your machine. No backend, no API key, no tracking. The default 135M model is a curio (~90 MB, charming-but-dumb); the 1.7B is a slow first-load but produces real prose.

It is still small. Treat it as a curio rather than an oracle: ask short, low-stakes questions and do not trust it about facts. The model knows nothing specific about this blog.

A toggle below enables the voice prior: a token-level n-gram model over every word I have published, mixed with the LLM in probability space at every step. The mixture is p(next) = α · p_ngram + (1 - α) · p_llm, computed directly: we pull the LLM’s full softmax distribution via getLogits(-1), look up each token’s empirical frequency under the longest matched suffix in my corpus, take the linear combination, apply temperature, and sample. With α = 0 it’s pure SmolLM2. With α = 1 it’s pure infinigram, which loops as soon as the generated context drifts off-corpus. Somewhere in the middle, the LLM provides the grammar and the n-gram provides the register.

Runs locally via Wllama. Models: SmolLM2 135M / 360M / 1.7B (Q4_K_M quantization). All three share the same tokenizer, so the voice prior works against any of them without rebuild.