longecho — Interview Insights
This document captures key design decisions from an in-depth interview (2026-01).
Core Philosophy Confirmed
Trust the Future
A consistent theme: don't over-engineer for future scenarios. The archive's job is to preserve content; future systems (LLMs, tools, humans) will figure out how to use it.
Implications: - Skip complex verification infrastructure - Don't pre-annotate for semantic decay — future LLMs can explain context - Don't build elaborate persona extraction — raw conversations ARE the persona - Don't worry about LLM prompt format changes — future models will be smarter
Single Unified Archive
No audience tiers. One archive for everyone (family, researchers, public). Simplicity over audience-tailoring.
Your Archive, Your Call
No consent model for photos/content featuring others. This is a personal archive, not a collaborative one.
The True MVP
Get conversations exported in durable format. That's it.
Specifically: - JSON primary (full tree structure) - Markdown secondary (human-readable, latest path) - Maybe include SQLite if not too large - Raw conversations ARE the persona — skip explicit persona extraction for MVP
This is dramatically simpler than the full spec suggests. Everything else is enhancement.
Key Design Decisions
Temporal Identity
Latest version wins. When the ghost answers "what do you think about X", it represents current (most recent) views. Earlier views are context, not authoritative.
The ghost should have meta-awareness of intellectual history, but speak as current-you.
Imperfection Handling
Include everything, let the ghost handle it gracefully. Don't curate out mistakes or embarrassing moments. Trust the ghost to acknowledge past errors naturally: "I used to think X, but later realized..."
AI Dialogue Representation
Keep full conversations including AI responses. But: - The persona is YOUR messages only - AI responses provide necessary context for understanding your responses - AI contributions are not part of "your voice"
Source Authority for Persona
Conversations AND writings dominate. Both are "your direct voice." Other sources (bookmarks, repos, photos) support but don't define persona.
Quote: "These are actually my writings."
Cross-Reference Detection
Frequency-weighted. Casual mentions fade; obsessions surface. A book mentioned once is noise; a topic discussed 50 times is signal.
Synthesis vs. Source Data
Strict separation. Raw sources in one place, all synthesis/derived data clearly separated with provenance. Never mix interpretation with evidence.
Priority Adjustments
Promoted
- ctk export — THE critical path. MVP is conversations exported.
- Writings (blogs, notes) — Direct voice, high authority for persona.
Demoted
- ptk (photos) — Lower priority than spec suggests. Conversations/writings more urgent for persona.
- mtk (email) — Demoted significantly. Maybe not important at all.
- Verification/signing — Skip entirely. Over-engineering.
Unchanged
- btk (bookmarks) — MEDIUM
- ebk (ebooks) — MEDIUM
- repoindex — MEDIUM (use existing star/annotation features for curation)
ECHO vs. longecho
ECHO is the spec. longecho is one implementation.
This decoupling matters: - ECHO describes format and philosophy - longecho is Python/SQLite implementation - Others could build different implementations - Don't couple ECHO spec too tightly to Python specifics
Technical Ideas to Document
Infinigram Mixture
An idea for biasing LLM output toward authentic voice:
Use a mixture distribution: small weight on an infinigram model (n-gram trained on user's text) combined with a large LLM. The infinigram biases generation toward characteristic phrases/patterns without dominating.
Already implemented by the user. Worth documenting as an option for SOUL layer implementation.
Update Cadence
Provide tools, let user decide. longecho doesn't dictate when to update. User might: - Run manually when they feel like it - Set up a cronjob - Use a dead man's switch - Update on life milestones
Living Use vs. Legacy
Legacy-focused. Primary purpose is for after death. Living use is incidental/bonus.
This affects design: - Don't optimize for daily querying/interaction - Optimize for durability and completeness - Focus on making it understandable to strangers
Recipient Model
Public release. Intended to be publicly accessible after death.
Implications: - No need for access control infrastructure - No private/public tiers - Anyone can have the archive
What Must Happen Now
Things the future CAN'T recover:
- Content capture — Export data from platforms before they die
- Context only you know — "This conversation was about X", "I was wrong here", "John is my brother"
But even these aren't "lost forever" — future systems can work with raw data. The context annotations are nice-to-have, not critical.
Core Tensions
Two things that keep the author up at night:
- Time pressure — Will there be enough time to build this?
- Over-engineering fear — Am I making this too complex despite trying not to?
These tensions are the heart of the project. Every decision should be evaluated against them.
Summary: The Minimal Path
- Export ctk conversations to JSON + markdown
- Include SQLite database if reasonable
- Skip persona extraction — raw conversations suffice
- Skip verification infrastructure
- Trust future LLMs to handle context, extrapolation, persona
- Public release, single archive, no tiers
- Other tools (btk, ebk, repoindex) can export when ready, but aren't blocking
Everything else is optional enhancement.