The archive, mapped
Every blog post and case study on the site with the principles and playbooks each piece argues for. Filter by any principle or playbook to see just the evidence behind it.
All content
23 items- A One-Page AI Usage Policy That Actually WorksApr 20, 2026 · 8 min
A short, adoptable AI usage policy for engineering teams: what to put on the page, what to leave off, and why the policy matters less than the habits it makes explicit.
- The First 90 Days: Introducing AI-Assisted Dev to a New TeamApr 20, 2026 · 9 min
How I would roll out AI-assisted development on a team that has not standardized: what to do in week one, what to earn the right to argue about later, and what almost always goes wrong.
- Getting Gemma 4 Running on a Radeon 7900 XTX (with and without TurboQuant)Apr 4, 2026 · 8 min
What it took to get Gemma 4 E4B serving cleanly on Radeon through FlexInfer: a stable TRITON lane on a 7900 XTX, an experimental TurboQuant long-context lane on a second node, and the GPTQ pipeline work still underway.
- Build Your Own Legs Before the Crutches FailMar 9, 2026 · 14 min
AI-assisted development is useful leverage, but only if you convert borrowed competence into real judgment before the support becomes a dependency.
- Repo Design Patterns for AI-Assisted Dev: Control Loops, Hooks, and MemoryFeb 9, 2026 · 5 min
Treat your repo like a control system: instruction hierarchy, workflows, hooks, and shared memory that make AI-assisted dev fast, reproducible, and hard to derail.
- Loom: One Registry, Many AI Coding AssistantsFeb 9, 2026 · 6 min
How Loom keeps MCP servers and skills in sync across Codex, Claude, Gemini, VS Code, Antigravity, Kilocode, OpenCode, and Zed.
- Loom-Mode MCP for Advanced, Fast AI-Assisted Dev (Go-Native, Proxy+Daemon)Feb 9, 2026 · 7 min
How to keep AI-assisted development fast and token-efficient: one proxy entry, a Go daemon that routes calls, and a small set of Go-native MCP servers.
- Two-Lane Text GPU Allocation: Quality + Vision/Fast (Plus a Media Lane)Feb 9, 2026 · 11 min
How I redistributed 6 models across 3 GPU nodes to eliminate contention, using priority-based shared groups and label-based aliases for routing and failover.
How I run multiple OpenAI-compatible LLM endpoints on a small K3s cluster with AMD Radeon GPUs, and what I had to do to make it stable.
A case study in healthcare integration: how Source Profiles, a three-phase parsing pipeline, and a workflow DSL turn messy legacy formats into semantic events.
- Deploying MLC-LLM on Dual RX 7900 XTX GPUs: Debugging VRAM, KV Cache, and K8s GPU SchedulingJan 4, 2026 · 12 min
What actually broke when I deployed MLC-LLM across two RX 7900 XTX nodes, and the fixes that made it stable: quantization, KV cache sizing, and Kubernetes GPU hygiene.
- SLOs for Inference: Latency, Errors, SaturationDec 29, 2025 · 6 min
How to define meaningful SLOs for production inference workloads, and what to do when they break.
- Standing Up a GPU-Ready Private AI Platform (Harvester + K3s + Flux + GitLab)Dec 29, 2025 · 6 min
Field notes from building and operating a small private GPU platform with Harvester, K3s, and a GitLab -> Flux delivery loop.
- Hybrid/On-Prem GPU: The Boring GitOps PathDec 29, 2025 · 4 min
A practical guide to running GPU workloads on-prem or hybrid, using Kubernetes and GitOps patterns that make operations boring.
- GPU Failure Modes: What Breaks and How to Debug ItDec 29, 2025 · 5 min
Common GPU infrastructure failures in production and how to diagnose them before they become incidents.
- GPU Cost Baseline: What to Measure, What LiesDec 29, 2025 · 4 min
Before you can cut GPU costs, you need to measure them correctly. Here is what to track and what the cloud console will not tell you.
- AI Infra Readiness Audit: What I Check (and What You Get)Dec 29, 2025 · 3 min
A practical checklist for auditing production AI infrastructure: GPU cost baselines, reliability risks, and an executable roadmap.
- Optimizing Real-Time Kubernetes Visualizations: From 25ms to 12ms Per FrameDec 25, 2025 · 7 min
A deep dive into optimizing Canvas 2D and Three.js visualizations for Kubernetes dashboards, covering algorithmic complexity, memory management, and GPU-efficient rendering patterns.
A real integration failure mode: when an API contract is implicit, patient matching becomes a risk management problem. Here is the defensive pattern (validation + contract tests + instrumentation) that makes it operable.
- Welcome to My HomelabNov 27, 2025 · 5 min
The infrastructure, product surfaces, and live demos behind the FlexInfer, Loom, and fi-fhir work I publish here.
- Running LLMs on Radeon GPUs with ROCmNov 20, 2025 · 5 min
What still works, what changed, and which guardrails matter when you run AMD Radeon GPUs for always-on inference.
- Building Practical AI AgentsNov 15, 2025 · 4 min
What makes an AI agent reliable in production: explicit loops, bounded tools, and visible operator state.
How I built FlexDeck, a full-stack operations dashboard with real-time K8s monitoring, GitLab CI/CD visualization, and AI model management using Go and SolidJS.