Featured work

php-llamacpp-benchmarks

Six llama.cpp-inspired optimisation patterns translated into idiomatic PHP 8.4 with JIT, plus a naive-vs-optimised CSV-to-Postgres importer. Reproducible via Docker + Make.

PHP 8.4
llama.cpp
FFI
JIT
Docker
PostgreSQL 16
PHPStan L8

View on GitHub ← Featured

Why this exists

llama.cpp is one of the most aggressively optimised C/C++ codebases in active circulation. Mmap’d weights, flat dense buffers, value pools, table dispatch, token streaming, columnar layouts — every cache line and every allocation accounted for. Most of that toolkit is supposed to be language-agnostic. So the question is honest: how much of it survives translation to PHP 8.4 with JIT, and what does each pattern actually buy you?

This repo answers that with numbers. Six micro-benchmarks plus one realistic case study, all reproducible via docker compose and a Makefile. The companion Medium article — I Scaled PHP Until It Broke. Three llama.cpp Patterns Saved It. — walks through the wins and the losses.

What’s in the suite

ID	Technique	What’s measured
B01	FFI mmap	10M-entry table: load time, heap, lookup p50/p95/p99, cross-process cold start
B02	SplFixedArray	10M ints: memory, population, full iteration, random R/W
B03	Object pool	5M Point3D allocations vs reused pool, GC delta
B04	Lookup table vs match vs switch	10M classifications, three implementations side-by-side
B05	Generator	5M records: materialised array vs streaming, peak memory
B06	Column- vs row-oriented	5M records × 5 fields: single-column scan + full-row scan

Every run is warmup → measured iterations → hrtime(true) → memory_get_peak_usage(true) → percentiles via linear interpolation. A 1.2× win shows up as 1.2×. No cooking.

The end-to-end case study

bin/run-case-study.php runs two importers against the same 100K-row CSV and the same Postgres table:

Naive importer — full-array CSV read, new Record() per row, assoc-array dedupe, JSON-loaded country map, single-row INSERTs.
Optimised importer — Generator CSV reader (B05), pooled Record (B03), SplFixedArray-backed dedupe with linear probing (B02), mmap’d binary country table (B01), 1000-row multi-VALUES INSERT.

Six patterns, one realistic workload. The point isn’t a single hero number — it’s seeing exactly which pattern pays for itself, and which one is overkill.

Honesty notes

Benchmarks run inside Docker (php:8.4-cli-bookworm, opcache + JIT). Run the suite a few times — first-run numbers are noisier.
B01 measures warm load time (kernel page cache primed). Real cold-disk numbers depend on storage and are out of scope for an in-process bench.
B03’s pool win is usually modest in a one-shot CLI script. The pattern’s real value shows up in long-running workers (queues, websockets).
B04 may show a smaller win than C-style intuition predicts because PHP 8.3’s JIT compiles match very well. The article discusses this.

What you actually get when you clone it

make build        # build the php image (one-off)
make all          # install, fixtures, db, bench, case-study

Output: results/results.md and results/results.json with per-benchmark p50/p95/p99, peak memory and a reproducible commit hash. Throw it at your CI and watch the numbers drift over time.

Article: I Scaled PHP Until It Broke. Three llama.cpp Patterns Saved It.
Source code: github.com/vbcherepanov/php-llamacpp-benchmarks

Why this exists

What’s in the suite

The end-to-end case study

Honesty notes

What you actually get when you clone it

Related reading