The archive, mapped
Every blog post and case study on the site with the principles and playbooks each piece argues for. Filter by any principle or playbook to see just the evidence behind it.
Matching the filter
7 itemsHow I run multiple OpenAI-compatible LLM endpoints on a small K3s cluster with AMD Radeon GPUs, and what I had to do to make it stable.
- SLOs for Inference: Latency, Errors, SaturationDec 29, 2025 · 6 min
How to define meaningful SLOs for production inference workloads, and what to do when they break.
- GPU Failure Modes: What Breaks and How to Debug ItDec 29, 2025 · 5 min
Common GPU infrastructure failures in production and how to diagnose them before they become incidents.
- GPU Cost Baseline: What to Measure, What LiesDec 29, 2025 · 4 min
Before you can cut GPU costs, you need to measure them correctly. Here is what to track and what the cloud console will not tell you.
- AI Infra Readiness Audit: What I Check (and What You Get)Dec 29, 2025 · 3 min
A practical checklist for auditing production AI infrastructure: GPU cost baselines, reliability risks, and an executable roadmap.
- Optimizing Real-Time Kubernetes Visualizations: From 25ms to 12ms Per FrameDec 25, 2025 · 7 min
A deep dive into optimizing Canvas 2D and Three.js visualizations for Kubernetes dashboards, covering algorithmic complexity, memory management, and GPU-efficient rendering patterns.
A real integration failure mode: when an API contract is implicit, patient matching becomes a risk management problem. Here is the defensive pattern (validation + contract tests + instrumentation) that makes it operable.