Skip to main content

Featured work

AISWARM

in progress (testing)

Distributed cognitive workforce runtime — Claude, Codex, DeepSeek and local Llama working side-by-side over one codebase, with a2abridge messaging and BrainCore memory underneath.

  • Go 1.25
  • tmux
  • git worktree
  • SQLite (modernc.org)
  • MCP
  • A2A 1.0
  • BrainCore
  • Bubble Tea
  • REST API
  • sandbox-exec / bwrap

The problem

A single AI coding agent — no matter how capable — bottlenecks on its own context, its own provider’s outages and its own pricing curve. The honest production pattern is a swarm: several specialised agents working over the same codebase, each picked for a job it does best, with a planner deciding who gets what and a verifier deciding when to escalate.

AISWARM is what fell out of running that pattern for six months and getting tired of duct-tape. It only makes sense as a triad: AISWARM as the runtime, a2abridge as the open A2A 1.0 messaging mesh, and BrainCore as the cognitive memory plane. Each ships on its own; together they form a distributed cognitive workforce.

Architecture

  • One pure-Go binary, four providers. Claude Code, Codex, DeepSeek and a local Llama (LM Studio / llama.cpp) run in parallel, each in its own tmux session and its own git worktree, each driven by a uniform RPC contract.
  • KindPlanner with 12 typed agent roles. Plans are expanded into roles (planner, scaffolder, refactorer, tester, reviewer, doc-writer, …); a role-aware auto_choose routes tasks 30/30/20/20 across providers (high-stakes → Claude, scaffolding → Llama, tests/docs → Codex, low/medium-stakes glue → DeepSeek). Plans can be re-expanded mid-run via the plan_expand MCP tool.
  • Anthropic-format CLI shim. A small adapter normalises any OpenAI-compatible backend (DeepSeek, Llama via LM Studio) to Anthropic’s request/response shape, so the rest of the system only speaks one protocol.
  • 3-tier verifier. Every change is gated by a composite verifier: tier 1 runs the project’s own command (e.g. go test ./...), tier 2 replays the change against the integration branch to catch regressions, tier 3 asks an LLM judge for a semantic sign-off. Failures are re-queued with structured feedback attached.
  • Peer-helper consult (ASK_PEER). A stuck worker can emit a marker, and the dispatcher spawns a cross-model helper in a new tmux window — read-only, one-shot, hard-capped to 3 consultations per task, with a polling reply file. The original worker keeps its session; the helper writes its answer and exits.
  • Failover chain + soft pause/resume. If a provider 5xx’s, hits a rate limit or times out, the planner shifts the task down a typed chain. Cancelling a task does not kill its dependants — the cascade pauses and resumes cleanly when the parent is fixed.
  • Subtask decomposition over MCP. Big tasks are sliced into atomic units the workers can actually finish in one turn.
  • Cost ledger. Tokens, dollars and latency are tracked per provider, per task; the dashboard tells you which provider is paying for itself.

Implementation highlights

  • 33 internal packages, 8 binaries — Phase 6 closed. The runtime is past the “moves under load” line: planner, dispatcher, runner, verifier, merger, watchdog, quota, state, mcp, a2abridge client, braincore client, language autodetect, notify, consult, tui, config — all green, lint silent.
  • a2abridge underneath. Workers and the orchestrator talk over the open A2A 1.0 protocol — not bespoke pipes — so any external A2A-speaking agent plugs in for free.
  • BrainCore (or any MCP memory) as a side-channel. AISWARM deliberately does not embed its own cognitive memory engine. Memory is a side-channel over MCP: point it at your personal total-agent-memory server or at BrainCore — both work.
  • Hardened workers. Each worker runs inside sandbox-exec (macOS) or bwrap (Linux), in its own git worktree with .a2a/ and .aiswarm/ excluded per-worktree so coordination files never end up in a commit.
  • Detach-by-default daemon. aiswarm run forks into the background after planning, prints the PID and dashboard URL, returns the shell. A REST API (/api/projects, /api/projects/:id/swarms) and a signal(0) liveness probe let the WebUI respawn a dead daemon from the ▶ button.
  • Live dashboard, two surfaces. A Bubble Tea TUI for the terminal and a WebUI with A2A and Memory tabs in the browser — both read from the same atomic flock-protected JSON state store.
  • Pure-Go, zero CGO. SQLite via modernc.org/sqlite in ~/.aiswarm/aiswarm.db. Single static binary; no Docker required for the orchestrator itself.

Current status

Phase 6 is closed: 33 packages green, lint silent, smoke 2/2 merged, 8 binaries shipping. The repository is still private while the wire protocol between planner, verifier and workers stabilises, and while the cost-routing heuristics are tuned against real-world workloads. Once the orchestration layer reaches the same maturity as a2abridge and BrainCore, AISWARM will be open-sourced under the same MIT licence as its two peers.

Lessons so far

  • Swarms don’t fail like a single agent. A single agent gets stuck; a swarm cascades. Half the design budget went into stopping cascades — peer-helper consult, soft pause/resume and the 3-tier verifier all exist for the same reason.
  • Picking the right worker is worth more than picking the right model. A 30 % cheaper provider that’s right 95 % of the time on its lane is a better deal than the top model on everything — that’s why auto_choose is role-aware, not just model-aware.
  • A verifier is not optional, and one tier isn’t enough. “Trust the model’s output” is the failure mode that justifies a verifier the next day. “Trust the unit tests” is the failure mode that justifies tier 2 and tier 3 the week after.
  • Memory belongs outside the runtime. Embedding a memory engine inside the orchestrator was the obvious mistake. Treating memory as a side-channel over MCP let BrainCore and total-agent-memory evolve independently — and let AISWARM ship without becoming a research project.