Skip to main content

Featured work

total-agent-memory totalmemory.dev

Persistent memory for Claude Code & Codex CLI — knowledge graph, multi-representation embeddings, 3D WebGL visualisation.

  • Python
  • SQLite
  • FAISS
  • BGE
  • Ollama
  • MCP
  • Three.js

The problem

A coding agent without memory re-asks you the same question every session. Worse, it re-makes the same mistakes. Context windows are not memory — they’re a working set. Vector stores are not memory either — they’re Ctrl+F with embeddings.

I wanted something closer to how a senior engineer works: episodic recall (“we tried that last Tuesday, didn’t work”), semantic recall (“the company convention is to log in JSON”), decay, contradictions, and the right to say “I don’t know”.

Architecture

  • Hot path: SQLite + FTS5 for sub-50 ms keyword recall.
  • Warm path: BGE / FastEmbed embeddings indexed in FAISS for semantic similarity.
  • Graph layer: entities + triples + cognitive activation spreading.
  • Re-rank: optional CrossEncoder for higher precision.
  • Diversity: MMR pass to prevent the agent from echo-chambering.

The 6-stage pipeline (FTS → BM25 → semantic → fuzzy → graph → MMR) hits R@5 = 97.45 % on LongMemEval default config, comfortably above the published baselines.

Implementation highlights

  • Multi-representation storage — every memory is stored as raw text, normalised text, BM25-tokenised text, embedding, and graph node. The right representation is picked per query type.
  • Self-extracting knowledge graph — entities and triples are extracted automatically with a small local LLM (Ollama-optional) and merged via a fact-merger that handles contradictions.
  • Time travel — every change is checkpointed, so you can kg at <timestamp> to see what the agent believed last Thursday.
  • 3D WebGL inspector — the entire decision graph is browsable in a Three.js front-end, drag nodes, replay episodes, see attention spread.

Results

Production-ready, MIT-licensed, MCP-native. Runs on a Raspberry Pi 4 if you turn off the embedding stack. Drop it into any Claude Code, Codex, Cursor, or Cline setup via MCP and the agent stops re-asking you basic questions.

Lessons

  • Recall is a stack, not a service. The right answer to “what’s the best memory for AI agents?” is “all of them, fused”.
  • Forgetting is a feature. Without decay, the store becomes a haunted attic in three weeks.
  • A small, deliberate memory beats a giant index. R@5 gains came from pruning, not from adding more dimensions.