Blog
Project updates, tutorials, and notes on Loom, MCP, inference, and healthcare integration. For longer-form implementation writeups, see Writing and Case Studies.
Showing all 17 posts
All Posts
Getting Gemma 4 Running on a Radeon 7900 XTX (with and without TurboQuant)
What it took to get Gemma 4 E4B serving cleanly on Radeon through FlexInfer: a stable TRITON lane on a 7900 XTX, an experimental TurboQuant long-context lane on a second node, and the GPTQ pipeline work still underway.
Build Your Own Legs Before the Crutches Fail
AI-assisted development is useful leverage, but only if you convert borrowed competence into real judgment before the support becomes a dependency.
Two-Lane Text GPU Allocation: Quality + Vision/Fast (Plus a Media Lane)
How I redistributed 6 models across 3 GPU nodes to eliminate contention, using priority-based shared groups and label-based aliases for routing and failover.

Loom-Mode MCP for Advanced, Fast AI-Assisted Dev (Go-Native, Proxy+Daemon)
How to keep AI-assisted development fast and token-efficient: one proxy entry, a Go daemon that routes calls, and a small set of Go-native MCP servers.

Loom: One Registry, Many AI Coding Assistants
How Loom keeps MCP servers and skills in sync across Codex, Claude, Gemini, VS Code, Antigravity, Kilocode, OpenCode, and Zed.

Repo Design Patterns for AI-Assisted Dev: Control Loops, Hooks, and Memory
Treat your repo like a control system: instruction hierarchy, workflows, hooks, and shared memory that make AI-assisted dev fast, reproducible, and hard to derail.
Deploying MLC-LLM on Dual RX 7900 XTX GPUs: Debugging VRAM, KV Cache, and K8s GPU Scheduling
What actually broke when I deployed MLC-LLM across two RX 7900 XTX nodes, and the fixes that made it stable: quantization, KV cache sizing, and Kubernetes GPU hygiene.
AI Infra Readiness Audit: What I Check (and What You Get)
A practical checklist for auditing production AI infrastructure: GPU cost baselines, reliability risks, and an executable roadmap.
GPU Cost Baseline: What to Measure, What Lies
Before you can cut GPU costs, you need to measure them correctly. Here is what to track and what the cloud console will not tell you.
GPU Failure Modes: What Breaks and How to Debug It
Common GPU infrastructure failures in production and how to diagnose them before they become incidents.
Hybrid/On-Prem GPU: The Boring GitOps Path
A practical guide to running GPU workloads on-prem or hybrid, using Kubernetes and GitOps patterns that make operations boring.
Standing Up a GPU-Ready Private AI Platform (Harvester + K3s + Flux + GitLab)
Field notes from building and operating a small private GPU platform with Harvester, K3s, and a GitLab -> Flux delivery loop.