Featured work

total-agent-memory totalmemory.dev

Persistent memory for Claude Code & Codex CLI — knowledge graph, multi-representation embeddings, 3D WebGL visualisation.

Python
SQLite
FAISS
BGE
Ollama
MCP
Three.js

Open live site View on GitHub ← Featured

The problem

A coding agent without memory re-asks you the same question every session. Worse, it re-makes the same mistakes. Context windows are not memory — they’re a working set. Vector stores are not memory either — they’re Ctrl+F with embeddings.

I wanted something closer to how a senior engineer works: episodic recall (“we tried that last Tuesday, didn’t work”), semantic recall (“the company convention is to log in JSON”), decay, contradictions, and the right to say “I don’t know”.

Architecture

Hot path: SQLite + FTS5 for sub-50 ms keyword recall.
Warm path: BGE / FastEmbed embeddings indexed in FAISS for semantic similarity.
Graph layer: entities + triples + cognitive activation spreading.
Re-rank: optional CrossEncoder for higher precision.
Diversity: MMR pass to prevent the agent from echo-chambering.

The 6-stage pipeline (FTS → BM25 → semantic → fuzzy → graph → MMR) hits R@5 = 97.45 % on LongMemEval default config, comfortably above the published baselines.

Implementation highlights

Multi-representation storage — every memory is stored as raw text, normalised text, BM25-tokenised text, embedding, and graph node. The right representation is picked per query type.
Self-extracting knowledge graph — entities and triples are extracted automatically with a small local LLM (Ollama-optional) and merged via a fact-merger that handles contradictions.
Time travel — every change is checkpointed, so you can kg at <timestamp> to see what the agent believed last Thursday.
3D WebGL inspector — the entire decision graph is browsable in a Three.js front-end, drag nodes, replay episodes, see attention spread.

Results

Production-ready, MIT-licensed, MCP-native. Runs on a Raspberry Pi 4 if you turn off the embedding stack. Drop it into any Claude Code, Codex, Cursor, or Cline setup via MCP and the agent stops re-asking you basic questions.

Lessons

Recall is a stack, not a service. The right answer to “what’s the best memory for AI agents?” is “all of them, fused”.
Forgetting is a feature. Without decay, the store becomes a haunted attic in three weeks.
A small, deliberate memory beats a giant index. R@5 gains came from pruning, not from adding more dimensions.