Services
What I do for clients.
Backend, AI integrations, RAG systems, and the unglamorous work of making AI-generated code actually shippable. Independent contractor, async-first, US / AU / EU clients.
-
Websites & landing pages
Fast, multilingual, SEO-ready — Lighthouse 100/100/100/100.
Marketing sites, product landings, multilingual portfolios. Built with Astro / Next / Vue + Tailwind, deployed via Docker + Traefik, with full structured data, sitemaps, RSS, OG images and i18n. Live examples: this site (vbcherepanov.com) and totalmemory.dev.
What you get
- Astro / Next.js / Vue + Tailwind — chosen by content shape, not by hype.
- JSON-LD (WebSite / Person / Article / Service), sitemap per content type, hreflang, OG.
- Lighthouse 100 / Core Web Vitals green / WCAG AA / cookie banner that respects consent.
- Multilingual (en/ru/sr or any pair) with content collections and proper canonical URLs.
- CI/CD: GitLab or GitHub Actions, Docker multi-stage, Traefik / nginx, zero-downtime deploy.
Stack
- Astro
- Next.js
- Vue / Nuxt
- Tailwind
- Docker
- Traefik
- GitLab CI
-
Backend services & APIs
Go and PHP/Symfony backends that survive production.
REST + gRPC services, OAuth2/OIDC identity, message-driven cores, PostgreSQL/Redis/RabbitMQ. Clean architecture (handler ≤15 lines → service → repo), domain errors, structured logs, metrics + tracing wired in from day one — not patched after the first incident.
What you get
- Go 1.25+ or PHP 8.4 / Symfony 8.0 — picked per team and per workload.
- PostgreSQL 18 schema design, migrations, indexing strategy, keyset pagination.
- Event-driven with RabbitMQ / NATS / Kafka, idempotency keys, outbox pattern.
- OpenAPI / gRPC contracts, code-gen, contract tests, no `any`/`mixed` for business data.
- Observability: Prometheus metrics + structured slog/Monolog + OpenTelemetry traces.
Stack
- Go
- PHP / Symfony
- PostgreSQL
- RabbitMQ
- gRPC
- OpenAPI
- OAuth2/OIDC
-
AI integration
Plug LLMs into your product — without the demo-grade brittleness.
OpenAI, Anthropic, DeepSeek and local models (Ollama, llama.cpp, LM Studio) wired into your backend with proper structured output, tool/function-calling, streaming, retries, cost-aware routing and full observability. Multi-provider out of the box — no vendor lock-in.
What you get
- Provider abstraction (Anthropic / OpenAI / DeepSeek / local) with deterministic fallbacks.
- Tool calling, structured JSON output (zod / pydantic), streaming SSE / WebSocket.
- Token + cost budgets per request, per user, per feature — metered in Prometheus.
- Prompt versioning, eval harness, golden-set regression tests for every prompt change.
- Safety: PII redaction, prompt-injection hardening, content filters, audit log per call.
Stack
- Anthropic
- OpenAI
- Ollama
- MCP
- Function calling
- Structured output
-
RAG systems
Retrieval that actually retrieves — same recipe as total-agent-memory (R@5 = 97 %).
Production RAG pipelines that beat naïve cosine-similarity baselines: hybrid 6-tier retrieval (FTS5/BM25 + embeddings + fuzzy + graph + cross-encoder + MMR), document chunking that respects semantics, evaluation against LongMemEval-style sets. Same architecture that ships in total-agent-memory at 97 % R@5.
What you get
- Vector stores: pgvector (Postgres-native), Qdrant, FAISS — picked per scale & ops model.
- Hybrid retrieval: BM25 + dense + sparse + reranker (BGE-v2-m3 / Cohere / cross-encoder).
- Ingestion pipeline: parsers per format, semantic chunking, embedding reuse, deduplication.
- Evaluation: LongMemEval / LoCoMo / your-own golden set, R@K + nDCG tracked over time.
- Knowledge graph layer on top — entities, relations, temporal facts (Allen algebra).
Stack
- pgvector
- Qdrant
- BGE / Cohere
- Hybrid retrieval
- Eval harness
- MCP
-
Clean up AI-generated codebases
Refactor vibe-coded sprawl into something you can ship and maintain.
You let an agent generate half a project and now it half-works, half-compiles, half-tests. I do a forensic pass: kill the half-done stubs, replace hardcode with config, separate domain from infra, add real tests instead of `// TODO: test`, and bring the architecture back to something a human can extend.
What you get
- Inventory: TODO/FIXME/XXX/HACK/NotImplemented/stub/panic("todo") — kill or implement.
- Hardcoded URLs / IPs / secrets / magic numbers → env, config, named constants.
- Demo / mock data ripped out of prod paths, moved to fixtures / seeds / factories.
- Architecture re-aligned to clean layers: handler ≤15 lines → service → repo, typed DTOs.
- Real tests (unit + integration + golden + regression) — not test files full of `assert true`.
- Security pass: SQL/SSRF/XSS, exposed admin routes, weak auth, secrets in repo history.
Stack
- Refactoring
- Tests
- Security audit
- Architecture review
- CI/CD
-
Tune AI agents to write production-grade code
Claude Code / Cursor / Codex / Cline — wired with memory, hooks, MCP and a feedback loop that catches the bad output before you do.
Most teams use coding agents at 10 % of their capability — no memory, no hooks, no project rules, no verification loop. I set up the same stack that runs on my own machine: total-agent-memory for cross-session knowledge, a2abridge for multi-agent coordination, CLAUDE.md / .cursorrules with real architecture rules, hooks that gate edits, and a feedback loop (tests/build/lint) that runs after every change.
What you get
- MCP servers wired in: total-agent-memory (persistent knowledge), filesystem, A2A bridge.
- CLAUDE.md / .cursorrules / AGENTS.md with real architecture, code-quality and git rules.
- Hooks: pre-edit guards, post-edit lint/test, memory_save reminders, no-stub enforcement.
- Multi-agent setup: Claude + Codex / DeepSeek / local Llama via AISWARM-style orchestration.
- Feedback loop: tests + build + lint + grep on every change; agent never reports "DONE" on red.
- Hand-over: written playbook for your team, plus a 1-hour walkthrough call.
Stack
- Claude Code
- Cursor
- Codex
- MCP
- total-agent-memory
- a2abridge
- AISWARM
-
Architecture & strategy consulting
Second opinion on stack, architecture, AI strategy — by the hour or per engagement.
A short, sharp engagement when you need a senior outside view: stack selection, architectural review, performance audit, AI strategy, hiring signal. Output is a written brief with concrete recommendations, ranked by ROI, with code-level pointers when relevant.
What you get
- Stack selection: Go vs PHP vs Node, Postgres vs Mongo, queue choice — with trade-offs.
- Architecture review: layered vs hexagonal vs microservices, where the bottleneck actually is.
- AI strategy: where LLMs help, where they cost more than they save, build-vs-buy.
- Performance audit: profiling, query plans, cache strategy, async vs sync.
- Hiring signal: read a candidate’s code, run a live design review.
- Format: 1-h call → written brief, or 1-day on-site / paired session.
Stack
- Architecture review
- AI strategy
- Stack selection
- Performance audit
- Hiring