My AI Cofounder Forgot Everything. So I Built It a Brain.

Last Tuesday, my AI assistant recommended an architecture I'd explicitly rejected two weeks earlier.

Not because it was being creative. Because it had forgotten. The decision was made, the reasoning was documented, and it still walked me through the same bad idea like we'd never spoken. I corrected it. It apologized. I knew it would do it again.

That's the moment I stopped building features and started building memory.

The Embarrassing Audit

I run 11 projects as a solo founder. Claude (operating as "Lex") is my cofounder in everything but equity. We ship code together, triage bugs, draft marketing copy, manage infrastructure. On a good day, it feels like having a brilliant partner who never sleeps.

On a bad day, it feels like onboarding a new hire. Every. Single. Morning.

So I ran an audit. Not a vague "how's our knowledge management?" review. A specific, quantitative one, inspired by a piece from Zak El Fassi showing that a team boosted AI recall from 60% to 93% just by restructuring how files were organized.

My results were worse than their baseline.

Nine of ten projects were what I started calling "memory-dark." No persistent knowledge. No structured context. Every session started from zero. I had 22 files documenting things Claude should never do again (like mocking the database in tests, or using em-dashes in marketing copy). Eleven of them had never been read.

The session log was 6,817 lines. Write-only. The decision log existed but wasn't loaded at startup. When FluxDiagram solved a tricky React animation problem, MyWritingTwin had no idea, even though the solution would have saved hours.

And the number that hurt most: the WHY capture rate was 40%. Meaning 60% of the time, when a decision was made, the reasoning behind it was gone within a week. The WHAT survived. The WHY didn't.

"You Have a Routing Problem, Not a Storage Problem"

Before writing code, I did something I'd started doing for big architectural calls: I ran a council.

Two AI advisors, different models, given the same problem and full context. Three rounds. Zero API cost (both run via CLI with subscription tiers). The adversarial structure is the point. One advisor frames it as cognitive science. The other frames it as engineering. The tension produces something better than either alone.

Lexi (Gemini) came in hot with a framework I hadn't considered: Transactive Memory Systems. It's the research on how teams distribute knowledge. Her point was sharp: "You don't have a storage problem. You have 1,034 documents across 11 projects. The knowledge exists. You have a routing problem. The right memory doesn't surface at the right moment."

She also mapped the whole system to the Atkinson-Shiffrin memory model from cognitive psychology: session logs are sensory input (raw, unfiltered, decaying fast), the search index is short-term memory (accessible but limited), and structured documents like project-intel files are long-term memory (durable, encoded with meaning). The critical step the system was missing was encoding: an attention mechanism that filters raw input into something retrievable. Without it, session logs just pile up. Six thousand lines of sensory input with no pathway to long-term storage.

LexT (Codex) disagreed on framing but agreed on diagnosis: "The biggest risk is ingress quality, not retrieval. If you're storing garbage, better search just finds garbage faster." He wanted an evaluation contract before any building. Define success metrics. Measure a baseline. Then build. And his sharpest line, the one that reframed the whole project for me: "Memory as routable objects, not logs." Stop thinking about memory as an append-only file. Start thinking about it as addressable, queryable, linkable objects with metadata attached.

By round three, two principles had solidified:

Index by topic, not by date. When you need to know about the night-shift agent runner, you shouldn't have to know which Tuesday it was discussed. Chronological memory is how humans journal. It's not how you build institutional knowledge.

Mistakes are the highest-value memories. Failures, root causes, prevention rules. These should load before anything else. Think aviation black box, not changelog.

The Name

I called it Anamnesis.

Greek for "recollection." Specifically, the Platonic idea that learning isn't acquiring new knowledge but remembering what the soul already knows. In the Meno, Socrates shows that an uneducated slave boy can derive a geometry proof just by being asked the right questions. The knowledge was latent. The retrieval mechanism was the missing piece.

1,034 documents. 22 feedback files. 15 architectural decisions. All sitting in files. All invisible to the AI that needed them most.

One Day. Four Tracks. No Waiting.

This is the part where being an AI-native builder actually matters. Four independent tracks, running simultaneously. Not four sprints. Four parallel agents, one afternoon.

Track 1: Schema. Every memory object gets a structured shape: type, topic, evidence path, confidence score, and critically, a why field. Not "we chose Postgres." But "we chose Postgres because DuckDB's file-level locking was causing write conflicts in the concurrent ingestion pipeline, and the team doesn't have the bandwidth to serialize writes." That's what makes a decision retrievable.

Track 2: Ingestion. The 6,817-line session log, compressed to 679 lines by collapsing old entries into topic summaries. All 22 correction files, indexed. A Failure Atlas: seven entries documenting real mistakes with root causes and prevention rules, formatted so the AI reads them before touching anything. Every session starts with the black box.

Track 3: Skills. Two retrieval skills for Claude Code. /brief with spaced repetition scoring, because the most recently written thing isn't always the most important thing. /recall with simultaneous search across all ten memory stores.

Track 4: Evaluation. Sixty questions across three difficulty tiers. Easy: "What's the SSH alias for the VPS?" Medium: "What projects use Remotion?" Hard: "Connect the decision to move the bot to M2 with the feedback about systemctl and the capability that FluxDiagram's animation engine could be borrowed."

Baseline result: Recall@5 = 91%, Accuracy = 86%.

I was proud of those numbers for about two hours. Then I switched the scoring method from word-overlap to an LLM judge, and the real numbers came back: Recall = 68.3%. Accuracy = 66.7%. Word-overlap was lying. The hard questions, the ones that actually test whether the system understands connections, scored 57%.

Honesty hurts. But now I had a real baseline to improve against.

The Database

Phase 2 added the semantic layer. SQLite with FTS5 for keyword search, Gemini embeddings (3072 dimensions) for semantic rerank. Two-stage pipeline: keywords generate 50 candidates, embeddings rerank to the top 5 with confidence scores.

One constraint worth knowing if you build something similar: subagents in Claude Code can't write to ~/.claude/ paths. The indexer has to run from the main session. I burned an hour discovering this the hard way before switching to a staging pattern.

Three Tiers of Knowing

Tier 1 is the black box. Every session starts with corrections, failures, recent decisions. Fixed overhead. Non-negotiable.

Tier 2 is on-demand search. When a question comes up mid-work, /recall searches all ten stores simultaneously. FTS for speed, embeddings for meaning.

Tier 3 is what changed the numbers most. Before touching any project's code, the AI reads that project's brain dump: tech stack, known issues, existing docs, current status. Not retrieved on demand. Pre-loaded as context.

Here's why Tier 3 matters more than the fancy embedding database: the DB indexes what's in files. But 1,034 documents across 11 projects weren't files the AI knew to look for. They existed. They weren't discoverable. Project-intel files are the table of contents. Without them, the library is useless no matter how good the search engine is.

The Real Insight

The deepest finding isn't technical.

Memory systems capture what CHANGED. They don't capture what EXISTS.

Session logs record this session's decisions. They don't describe the system the decisions were about. Feedback files record corrections. They don't describe the baseline. If the AI joins a project mid-stream, everything before the first logged correction is invisible.

That's why 9/10 projects were memory-dark. Not because no work had been done. A lot had. But the memory system only stored deltas. The current state of each project was never written down.

The fix is embarrassingly simple: a structured document describing what each project is, right now. Not its history. Its identity.

Think about the difference between a new team member who reads six months of meeting notes (temporal, exhausting) versus one who reads the product spec and architecture doc first (structural, fast). Both get there eventually. The second one gets there in a single session.

What I Got Wrong

The golden set was too easy. Most questions tested the main memory file, which gets loaded every session. Of course recall was 91%. The hard questions, the ones requiring cross-project synthesis, scored 57%. I was measuring the ceiling, not the floor.

The council was right about ingress quality. Some memory files were vague enough that even perfect retrieval wouldn't help. "We decided to use approach X" without saying why is a dead-end memory. It takes up index space and answers nothing.

The subagent limitation wasn't in any documentation I could find. Cost me an hour of "why isn't this writing?" before I figured out the sandboxing.

From Passive to Active

Everything I'd built so far was passive. The memory existed. The search worked. The project-intel files were there. But the agent still had to know to look.

That's the same gap as having a well-organized library but no syllabus. The books are findable, if you already know what you're looking for. The agent that just woke up doesn't know what it doesn't know.

There's a name for what happens next. Information Foraging Theory calls it semantic abandonment: when a retrieval system returns irrelevant results often enough, users stop querying it. They route around it. The same thing happens with AI agents. If /recall returns noise three times in a row, the agent learns (within that session) to stop using it and starts guessing instead. A passive memory system that works 70% of the time is worse than no system at all, because it erodes trust in the retrieval path itself.

So I built injection hooks. A session-start script that assembles core memory (identity, behavioral rules, failure atlas, active projects, recent decisions) and pushes it into the context window before the first prompt. No search query. No manual /brief command. The agent arrives with the right pages already open on its desk.

The numbers shifted. The search did get better: recall climbed from 68% to 92% after five targeted fixes to project scoping and confidence filtering. But the real gain was that the agent stopped needing to search for things it should already know. The failure atlas isn't something you look up mid-task. It's something you read before you touch anything. Making that automatic changed the error rate overnight.

Then came the harder problem: subagents. I run autonomous agents overnight for code generation, content audits, quality reviews. These agents were spawning with bare prompts: "write a blog post about X" or "review this PR." Zero behavioral context. They'd never seen the failure atlas. They didn't know that em-dashes should be capped at five per file, or that mocking the database had burned us before, or that git add -A had caused collateral damage in three prior PRs.

The result was predictable: 43% of autonomous agents produced zero usable output. They weren't failing on hard problems. They were failing on problems we'd already solved and documented.

The fix was a dynamic injection layer. A script reads all behavioral rules and failure patterns at invocation time, always fresh, never a stale snapshot, and inlines them into the agent's prompt. It goes further: it selects which project skills and context to include based on what the task is and which codebase the agent is working in. A content agent gets the brand voice guidelines and the content quality checklist. A code agent gets the git patterns and the deployment rules. The context is shaped to the work.

This is the difference between passive and active memory. Passive memory answers questions. Active memory prevents them. The agent doesn't ask "should I use git add -A?" because the rule against it is already in its context before it writes the first command.

What This Means for Writing Identity

At MyWritingTwin, we work on the same problem in a different domain: preserving who you are across AI sessions.

The failure mode is identical. AI captures what you tell it this session. It doesn't carry forward your sentence patterns, your punctuation preferences, the phrases you'd never use. The WHY behind your writing style vanishes between sessions.

A Writing DNA Snapshot is the writing equivalent of active memory injection. Not a log of corrections. Not something the AI retrieves on demand. A structural description of what your writing is (rhythm, vocabulary, formality, the anti-patterns that mark something as "not you") loaded into context before the AI writes its first word. The same shift from passive to active, applied to identity instead of infrastructure.

The Anamnesis project confirmed something we already believed: the gap between a useful AI and an exceptional one is almost never the model. It's the memory architecture. A model that remembers what matters about you, reliably, across sessions, with the right context injected before it starts working, will outperform a better model that starts fresh every time.

Build Your Own

The architecture patterns are all open: the three-tier retrieval model, the golden set evaluation approach, the council session format, the hub-and-spoke memory schema. SQLite plus embeddings is genuinely fast and costs nothing at this scale.

But the most impactful piece isn't the database. It's the injection layer. Build the hooks that push the right context into the right agent at session start. Make sure your failure atlas, your behavioral rules, your project identity documents are loaded before the first prompt, not searched for after the third mistake. Passive memory is a library. Active memory is a briefing.

The constraint isn't tooling. It's discipline. Capturing WHY alongside WHAT, every single time, even when you're moving fast and the decision feels obvious. The obvious ones are exactly the ones you'll forget the reasoning for.

Try It on Your Writing

Curious whether your AI assistant actually retains your writing identity across sessions? Paste the same email draft into Claude today and tomorrow. See if it gives you the same suggestions, or different ones. If different, the system isn't remembering you. It's improvising.

A Writing DNA Snapshot gives your AI a structural understanding of your style, injected as active context that shapes every session after it. Not a one-time correction. A permanent upgrade to how your AI writes with you.