Featured work
php-llamacpp-benchmarks
Six llama.cpp-inspired optimisation patterns translated into idiomatic PHP 8.4 with JIT, plus a naive-vs-optimised CSV-to-Postgres importer. Reproducible via Docker + Make.
- PHP 8.4
- llama.cpp
- FFI
- JIT
- Docker
- PostgreSQL 16
- PHPStan L8
Why this exists
llama.cpp is one of the most aggressively optimised C/C++ codebases in active circulation. Mmap’d weights, flat dense buffers, value pools, table dispatch, token streaming, columnar layouts — every cache line and every allocation accounted for. Most of that toolkit is supposed to be language-agnostic. So the question is honest: how much of it survives translation to PHP 8.4 with JIT, and what does each pattern actually buy you?
This repo answers that with numbers. Six micro-benchmarks plus one realistic case study, all reproducible via docker compose and a Makefile. The companion Medium article — I Scaled PHP Until It Broke. Three llama.cpp Patterns Saved It. — walks through the wins and the losses.
What’s in the suite
| ID | Technique | What’s measured |
|---|---|---|
| B01 | FFI mmap | 10M-entry table: load time, heap, lookup p50/p95/p99, cross-process cold start |
| B02 | SplFixedArray | 10M ints: memory, population, full iteration, random R/W |
| B03 | Object pool | 5M Point3D allocations vs reused pool, GC delta |
| B04 | Lookup table vs match vs switch | 10M classifications, three implementations side-by-side |
| B05 | Generator | 5M records: materialised array vs streaming, peak memory |
| B06 | Column- vs row-oriented | 5M records × 5 fields: single-column scan + full-row scan |
Every run is warmup → measured iterations → hrtime(true) → memory_get_peak_usage(true) → percentiles via linear interpolation. A 1.2× win shows up as 1.2×. No cooking.
The end-to-end case study
bin/run-case-study.php runs two importers against the same 100K-row CSV and the same Postgres table:
- Naive importer — full-array CSV read,
new Record()per row, assoc-array dedupe, JSON-loaded country map, single-row INSERTs. - Optimised importer — Generator CSV reader (B05), pooled Record (B03), SplFixedArray-backed dedupe with linear probing (B02), mmap’d binary country table (B01), 1000-row multi-VALUES INSERT.
Six patterns, one realistic workload. The point isn’t a single hero number — it’s seeing exactly which pattern pays for itself, and which one is overkill.
Honesty notes
- Benchmarks run inside Docker (
php:8.4-cli-bookworm, opcache + JIT). Run the suite a few times — first-run numbers are noisier. - B01 measures warm load time (kernel page cache primed). Real cold-disk numbers depend on storage and are out of scope for an in-process bench.
- B03’s pool win is usually modest in a one-shot CLI script. The pattern’s real value shows up in long-running workers (queues, websockets).
- B04 may show a smaller win than C-style intuition predicts because PHP 8.3’s JIT compiles
matchvery well. The article discusses this.
What you actually get when you clone it
make build # build the php image (one-off)
make all # install, fixtures, db, bench, case-study
Output: results/results.md and results/results.json with per-benchmark p50/p95/p99, peak memory and a reproducible commit hash. Throw it at your CI and watch the numbers drift over time.