Skip to main content

Services

What I do for clients.

Backend, AI integrations, RAG systems, and the unglamorous work of making AI-generated code actually shippable. Independent contractor, async-first, US / AU / EU clients.

Format
Fixed-price engagements or hourly. Quote on request.
Legal
B2B contracts via Serbian sole proprietorship (Vitalii Cherepanov PR Novi Sad, PIB 115184071). Invoice in EUR or RSD.
Process
Email → 30-min discovery call → written proposal → contract → ship.
  • Websites & landing pages

    Fast, multilingual, SEO-ready — Lighthouse 100/100/100/100.

    Marketing sites, product landings, multilingual portfolios. Built with Astro / Next / Vue + Tailwind, deployed via Docker + Traefik, with full structured data, sitemaps, RSS, OG images and i18n. Live examples: this site (vbcherepanov.com) and totalmemory.dev.

    What you get

    • Astro / Next.js / Vue + Tailwind — chosen by content shape, not by hype.
    • JSON-LD (WebSite / Person / Article / Service), sitemap per content type, hreflang, OG.
    • Lighthouse 100 / Core Web Vitals green / WCAG AA / cookie banner that respects consent.
    • Multilingual (en/ru/sr or any pair) with content collections and proper canonical URLs.
    • CI/CD: GitLab or GitHub Actions, Docker multi-stage, Traefik / nginx, zero-downtime deploy.

    Stack

    • Astro
    • Next.js
    • Vue / Nuxt
    • Tailwind
    • Docker
    • Traefik
    • GitLab CI
  • Backend services & APIs

    Go and PHP/Symfony backends that survive production.

    REST + gRPC services, OAuth2/OIDC identity, message-driven cores, PostgreSQL/Redis/RabbitMQ. Clean architecture (handler ≤15 lines → service → repo), domain errors, structured logs, metrics + tracing wired in from day one — not patched after the first incident.

    What you get

    • Go 1.25+ or PHP 8.4 / Symfony 8.0 — picked per team and per workload.
    • PostgreSQL 18 schema design, migrations, indexing strategy, keyset pagination.
    • Event-driven with RabbitMQ / NATS / Kafka, idempotency keys, outbox pattern.
    • OpenAPI / gRPC contracts, code-gen, contract tests, no `any`/`mixed` for business data.
    • Observability: Prometheus metrics + structured slog/Monolog + OpenTelemetry traces.

    Stack

    • Go
    • PHP / Symfony
    • PostgreSQL
    • RabbitMQ
    • gRPC
    • OpenAPI
    • OAuth2/OIDC
  • AI integration

    Plug LLMs into your product — without the demo-grade brittleness.

    OpenAI, Anthropic, DeepSeek and local models (Ollama, llama.cpp, LM Studio) wired into your backend with proper structured output, tool/function-calling, streaming, retries, cost-aware routing and full observability. Multi-provider out of the box — no vendor lock-in.

    What you get

    • Provider abstraction (Anthropic / OpenAI / DeepSeek / local) with deterministic fallbacks.
    • Tool calling, structured JSON output (zod / pydantic), streaming SSE / WebSocket.
    • Token + cost budgets per request, per user, per feature — metered in Prometheus.
    • Prompt versioning, eval harness, golden-set regression tests for every prompt change.
    • Safety: PII redaction, prompt-injection hardening, content filters, audit log per call.

    Stack

    • Anthropic
    • OpenAI
    • Ollama
    • MCP
    • Function calling
    • Structured output
  • RAG systems

    Retrieval that actually retrieves — same recipe as total-agent-memory (R@5 = 97 %).

    Production RAG pipelines that beat naïve cosine-similarity baselines: hybrid 6-tier retrieval (FTS5/BM25 + embeddings + fuzzy + graph + cross-encoder + MMR), document chunking that respects semantics, evaluation against LongMemEval-style sets. Same architecture that ships in total-agent-memory at 97 % R@5.

    What you get

    • Vector stores: pgvector (Postgres-native), Qdrant, FAISS — picked per scale & ops model.
    • Hybrid retrieval: BM25 + dense + sparse + reranker (BGE-v2-m3 / Cohere / cross-encoder).
    • Ingestion pipeline: parsers per format, semantic chunking, embedding reuse, deduplication.
    • Evaluation: LongMemEval / LoCoMo / your-own golden set, R@K + nDCG tracked over time.
    • Knowledge graph layer on top — entities, relations, temporal facts (Allen algebra).

    Stack

    • pgvector
    • Qdrant
    • BGE / Cohere
    • Hybrid retrieval
    • Eval harness
    • MCP
  • Clean up AI-generated codebases

    Refactor vibe-coded sprawl into something you can ship and maintain.

    You let an agent generate half a project and now it half-works, half-compiles, half-tests. I do a forensic pass: kill the half-done stubs, replace hardcode with config, separate domain from infra, add real tests instead of `// TODO: test`, and bring the architecture back to something a human can extend.

    What you get

    • Inventory: TODO/FIXME/XXX/HACK/NotImplemented/stub/panic("todo") — kill or implement.
    • Hardcoded URLs / IPs / secrets / magic numbers → env, config, named constants.
    • Demo / mock data ripped out of prod paths, moved to fixtures / seeds / factories.
    • Architecture re-aligned to clean layers: handler ≤15 lines → service → repo, typed DTOs.
    • Real tests (unit + integration + golden + regression) — not test files full of `assert true`.
    • Security pass: SQL/SSRF/XSS, exposed admin routes, weak auth, secrets in repo history.

    Stack

    • Refactoring
    • Tests
    • Security audit
    • Architecture review
    • CI/CD
  • Tune AI agents to write production-grade code

    Claude Code / Cursor / Codex / Cline — wired with memory, hooks, MCP and a feedback loop that catches the bad output before you do.

    Most teams use coding agents at 10 % of their capability — no memory, no hooks, no project rules, no verification loop. I set up the same stack that runs on my own machine: total-agent-memory for cross-session knowledge, a2abridge for multi-agent coordination, CLAUDE.md / .cursorrules with real architecture rules, hooks that gate edits, and a feedback loop (tests/build/lint) that runs after every change.

    What you get

    • MCP servers wired in: total-agent-memory (persistent knowledge), filesystem, A2A bridge.
    • CLAUDE.md / .cursorrules / AGENTS.md with real architecture, code-quality and git rules.
    • Hooks: pre-edit guards, post-edit lint/test, memory_save reminders, no-stub enforcement.
    • Multi-agent setup: Claude + Codex / DeepSeek / local Llama via AISWARM-style orchestration.
    • Feedback loop: tests + build + lint + grep on every change; agent never reports "DONE" on red.
    • Hand-over: written playbook for your team, plus a 1-hour walkthrough call.

    Stack

    • Claude Code
    • Cursor
    • Codex
    • MCP
    • total-agent-memory
    • a2abridge
    • AISWARM
  • Architecture & strategy consulting

    Second opinion on stack, architecture, AI strategy — by the hour or per engagement.

    A short, sharp engagement when you need a senior outside view: stack selection, architectural review, performance audit, AI strategy, hiring signal. Output is a written brief with concrete recommendations, ranked by ROI, with code-level pointers when relevant.

    What you get

    • Stack selection: Go vs PHP vs Node, Postgres vs Mongo, queue choice — with trade-offs.
    • Architecture review: layered vs hexagonal vs microservices, where the bottleneck actually is.
    • AI strategy: where LLMs help, where they cost more than they save, build-vs-buy.
    • Performance audit: profiling, query plans, cache strategy, async vs sync.
    • Hiring signal: read a candidate’s code, run a live design review.
    • Format: 1-h call → written brief, or 1-day on-site / paired session.

    Stack

    • Architecture review
    • AI strategy
    • Stack selection
    • Performance audit
    • Hiring