Notes from the backend.
How we build distributed systems, data platforms, and agentic infrastructure — and what we learned the hard way.
Why we rewrote our event router in Rust
A 40-line GC pause was costing us tail latency we couldn’t explain away. Here’s what moving the hot path to Rust actually bought us — and what it didn’t.
Backpressure is a feature, not a failure
The systems that survive load aren’t the ones that never say no. They’re the ones that say no early, clearly, and on purpose.
Evaluating agentic systems in production
Offline benchmarks tell you a model can do a task. They don’t tell you your agent will. Here’s the harness we run continuously against live traffic.
The petabyte pipeline that stays debuggable
Scale is easy to add and hard to keep legible. The discipline that kept our Spark pipelines understandable as they grew past a petabyte a day.
On-call you can sleep through: our SLO playbook
Good on-call isn’t heroics at 3am. It’s the boring upfront work of defining what “broken” means before anything breaks.
Scala 3 in anger: what changed for our data team
A year in. The migration cost, the features that earned their keep, and the ones we still don’t reach for.