The archive, mapped
Every blog post and case study on the site with the principles and playbooks each piece argues for. Filter by any principle or playbook to see just the evidence behind it.
Matching the filter
7 itemsHow I run multiple OpenAI-compatible LLM endpoints on a small K3s cluster with AMD Radeon GPUs, and what I had to do to make it stable.
- SLOs for Inference: Latency, Errors, SaturationDec 29, 2025 · 6 min
How to define meaningful SLOs for production inference workloads, and what to do when they break.
- Standing Up a GPU-Ready Private AI Platform (Harvester + K3s + Flux + GitLab)Dec 29, 2025 · 6 min
Field notes from building and operating a small private GPU platform with Harvester, K3s, and a GitLab -> Flux delivery loop.
- Hybrid/On-Prem GPU: The Boring GitOps PathDec 29, 2025 · 4 min
A practical guide to running GPU workloads on-prem or hybrid, using Kubernetes and GitOps patterns that make operations boring.
- GPU Failure Modes: What Breaks and How to Debug ItDec 29, 2025 · 5 min
Common GPU infrastructure failures in production and how to diagnose them before they become incidents.
- GPU Cost Baseline: What to Measure, What LiesDec 29, 2025 · 4 min
Before you can cut GPU costs, you need to measure them correctly. Here is what to track and what the cloud console will not tell you.
- AI Infra Readiness Audit: What I Check (and What You Get)Dec 29, 2025 · 3 min
A practical checklist for auditing production AI infrastructure: GPU cost baselines, reliability risks, and an executable roadmap.