Featured work
total-agent-memory totalmemory.dev
Persistent memory for Claude Code & Codex CLI — knowledge graph, multi-representation embeddings, 3D WebGL visualisation.
- Python
- SQLite
- FAISS
- BGE
- Ollama
- MCP
- Three.js
The problem
A coding agent without memory re-asks you the same question every session. Worse, it re-makes the same mistakes. Context windows are not memory — they’re a working set. Vector stores are not memory either — they’re Ctrl+F with embeddings.
I wanted something closer to how a senior engineer works: episodic recall (“we tried that last Tuesday, didn’t work”), semantic recall (“the company convention is to log in JSON”), decay, contradictions, and the right to say “I don’t know”.
Architecture
- Hot path: SQLite + FTS5 for sub-50 ms keyword recall.
- Warm path: BGE / FastEmbed embeddings indexed in FAISS for semantic similarity.
- Graph layer: entities + triples + cognitive activation spreading.
- Re-rank: optional CrossEncoder for higher precision.
- Diversity: MMR pass to prevent the agent from echo-chambering.
The 6-stage pipeline (FTS → BM25 → semantic → fuzzy → graph → MMR) hits R@5 = 97.45 % on LongMemEval default config, comfortably above the published baselines.
Implementation highlights
- Multi-representation storage — every memory is stored as raw text, normalised text, BM25-tokenised text, embedding, and graph node. The right representation is picked per query type.
- Self-extracting knowledge graph — entities and triples are extracted automatically with a small local LLM (Ollama-optional) and merged via a fact-merger that handles contradictions.
- Time travel — every change is checkpointed, so you can
kg at <timestamp>to see what the agent believed last Thursday. - 3D WebGL inspector — the entire decision graph is browsable in a Three.js front-end, drag nodes, replay episodes, see attention spread.
Results
Production-ready, MIT-licensed, MCP-native. Runs on a Raspberry Pi 4 if you turn off the embedding stack. Drop it into any Claude Code, Codex, Cursor, or Cline setup via MCP and the agent stops re-asking you basic questions.
Lessons
- Recall is a stack, not a service. The right answer to “what’s the best memory for AI agents?” is “all of them, fused”.
- Forgetting is a feature. Without decay, the store becomes a haunted attic in three weeks.
- A small, deliberate memory beats a giant index. R@5 gains came from pruning, not from adding more dimensions.