Skip to main content

Blog

Project updates, tutorials, and thoughts on AI, homelab, and development.

Category:
Tags:

Showing all 11 posts

All Posts

January 4, 2026·10 min read
Lab

Deploying MLC-LLM on Dual RX 7900 XTX GPUs: Debugging VRAM, KV Cache, and K8s GPU Scheduling

What actually broke when I deployed MLC-LLM across two RX 7900 XTX nodes, and the fixes that made it stable: quantization, KV cache sizing, and Kubernetes GPU hygiene.

mlc-llmrocmamdradeon+4 more
Read post
December 29, 2025·3 min read
Professional

AI Infra Readiness Audit: What I Check (and What You Get)

A practical checklist for auditing production AI infrastructure: GPU cost baselines, reliability risks, and an executable roadmap.

consultinggpukubernetesmlops+3 more
Read post
December 29, 2025·4 min read
Professional

GPU Cost Baseline: What to Measure, What Lies

Before you can cut GPU costs, you need to measure them correctly. Here is what to track and what the cloud console will not tell you.

gpufinopscostmlops+1 more
Read post
December 29, 2025·5 min read
Professional

GPU Failure Modes: What Breaks and How to Debug It

Common GPU infrastructure failures in production and how to diagnose them before they become incidents.

reliabilitygpudebuggingkubernetes+1 more
Read post
December 29, 2025·4 min read
Professional

Hybrid/On-Prem GPU: The Boring GitOps Path

A practical guide to running GPU workloads on-prem or hybrid, using Kubernetes and GitOps patterns that make operations boring.

gpukubernetesgitopson-prem+2 more
Read post
December 29, 2025·4 min read
Professional

Standing Up a GPU-Ready Private AI Platform (Harvester + K3s + Flux + GitLab)

Field notes from building and operating a small private AI platform with GPU scheduling, GitOps, and production-grade guardrails.

case-studyplatform-engineeringkubernetesk3s+10 more
Read post
December 29, 2025·4 min read
Professional

SLOs for Inference: Latency, Errors, Saturation

How to define meaningful SLOs for production inference workloads, and what to do when they break.

reliabilitysloinferencemlops+1 more
Read post
December 25, 2025·7 min read
Lab

Optimizing Real-Time Kubernetes Visualizations: From 25ms to 12ms Per Frame

A deep dive into optimizing Canvas 2D and Three.js visualizations for Kubernetes dashboards, covering algorithmic complexity, memory management, and GPU-efficient rendering patterns.

performancethree.jscanvasd3+3 more
Read post
November 27, 2025·4 min read
Lab

Welcome to My Homelab

An introduction to my personal site and the homelab infrastructure powering my AI experiments.

homelabkubernetesgitopsrocm
Read post
November 20, 2025·2 min read
Lab

Running LLMs on Radeon GPUs with ROCm

A guide to getting local LLM inference working on AMD Radeon GPUs using ROCm.

rocmamdllminference
Read post
November 15, 2025·2 min read
Lab

Building Practical AI Agents

Thoughts on designing AI agents that actually work for real tasks.

agentslanggraphllm
Read post