Welcome to My Homelab

November 27, 2025·4 min read

labhomelabkubernetesgitopsrocm

I'm Cody Blevins. This site is where I share what I’m building in my homelab: GPU inference on Radeon, small-but-real AI agent systems, and the GitOps plumbing that keeps the whole thing stable.

What started as “can I run LLMs locally?” turned into a small datacenter in my partially-finished basement/game room.

I've been tinkering with technology for as long as I can remember, and this site is where I document the interesting stuff I'm building:

Radeon GPU Inference - Running LLMs locally on AMD hardware using ROCm
AI Agent Systems - Building and orchestrating autonomous AI agents
Kubernetes Everything - Because why run one container when you can orchestrate hundreds?
Fun Side Projects - Weekend hacks that sometimes turn into real tools

The Philosophy

My day job involves a lot of integration architecture and cloud infrastructure. The homelab lets me experiment with the latest tech without the constraints of production environments (or cloud bills that make your CFO cry). There's something deeply satisfying about running your own infrastructure, knowing exactly where your data lives and having full control over the stack.

Plus, repurposing old gaming rigs into GPU compute nodes is still fun.

The Hardware Zoo

Over time I've accumulated what I affectionately call "the zoo", a collection of machines that spans enterprise iron to retired gaming PCs, all working together in surprising harmony.

The Backbone: Dell R730xd

The heart of the operation is a Dell PowerEdge R730xd running Harvester HCI. This beast handles all my virtualization and provides distributed storage via Longhorn. It's not the quietest machine (good thing it's in the basement), but it's rock solid and gives me the flexibility to spin up VMs on demand.

From this single server, I run:

3 K3s control plane nodes (HA Kubernetes, because I've been burned before)
Multiple worker VMs for general compute
Storage pools that back the entire cluster

The Power Duo: AMD Radeon RX 7900 XTX (x2)

The crown jewels of the lab. I run two of these cards, but splitting them into specialized roles has proven more effective than a single massive cluster:

Node 1 (cblevins-7900xtx): Dedicated to Text Inference via LlamaCPP. This card runs Qwen2.5-7B with speculative decoding and Nemotron-8B for agent orchestration.
Node 2 (cblevins-5930k): Dedicated to Image/Video Generation via ComfyUI. This card handles heavy flux/wan workloads (like WanVideo 2.1) without impacting chat responsiveness.

The Legacy Branch

NVIDIA GTX 980 Ti - The old guard (cblevins-gtx980ti). Currently in a "break glass in case of emergency" state (node status: down), but remains in the cluster for legacy CUDA workloads that simply refuse to run on ROCm.

The Edge

Raspberry Pi 5 - Handling lightweight DNS and monitoring tasks. It's the silent observer of the chaos.

The Gaming PC Graveyard

Several machines in my cluster are former gaming rigs. There's poetry in watching an i7-5930K that used to render The Witcher 3's landscapes now serving LLM inference requests.

The Software Stack

All of this hardware is orchestrated through a GitOps workflow using FluxCD. Everything is defined in code, version controlled, and automatically reconciled.

Architecture diagram showing ingress routing to services across Harvester VMs and GPU nodes in a K3s cluster. — **Figure 1.** A simplified view of how traffic lands on services across VM workers and dedicated GPU nodes.

Kubernetes (K3s)

The cluster runs K3s, with the control plane spread across three VMs (k3s-cp-1, k3s-cp-2, k3s-cp-3) and workers on both VMs (k3s-w-4 through k3s-w-7) and bare-metal GPU nodes.

All configuration lives in the platform/gitops repo. Here's a peek at the structure:

platform/gitops/
├── clusters/
│   └── home-cluster/     # Flux definitions
├── k3s/
│   ├── ai/
│   │   ├── litellm/      # Model routing
│   │   ├── llamacpp/     # Text inference
│   │   └── comfyui/      # Image/Video gen
│   ├── apps/             # Web services
│   └── infra/            # Cert-manager, monitoring
└── Dockerfiles/          # Custom images

This "everything as code" approach means I can rebuild the entire cluster from scratch just by bootstrapping Flux.

AI/ML Stack

LlamaCPP - The daily driver for text. Speculative decoding on RDNA3 made a real difference in throughput.
ComfyUI - Running WanVideo and Flux for high-fidelity media generation.
LiteLLM - The universal adapter that makes all my local models look like OpenAI API endpoints to my applications.
vLLM - (Currently mothballed) - Great for high throughput, but LlamaCPP's GGUF efficiency won out for personal use.

Storage

Longhorn provides distributed block storage across the cluster. It's not the fastest, but the redundancy and Kubernetes-native integration make it worth the trade-off. Important data gets replicated, experiments get ephemeral storage.

What's Next?

I'll be posting deep dives into specific projects, expect tutorials on getting local LLM inference humming on AMD GPUs, building AI agents that actually do useful things, and the occasional war story about what happens when you let Kubernetes autoscaling get too ambitious.

Topics on the horizon:

Speculative Decoding: Why 2 models are faster than 1
Building a multi-model inference router
The great migration: moving from Docker Compose to Kubernetes
Why Harvester might be the best hypervisor you've never heard of

If you want to chat about any of this, feel free to reach out.

The homelab community is special: a bunch of tinkerers trying to build useful things with whatever hardware we can get our hands on.

4 min read

kubernetesrocm

Standing Up a GPU-Ready Private AI Platform (Harvester + K3s + Flux + GitLab)

Field notes from building and operating a small private AI platform with GPU scheduling, GitOps, and production-grade guardrails.

10 min read

rocmkubernetes

Deploying MLC-LLM on Dual RX 7900 XTX GPUs: Debugging VRAM, KV Cache, and K8s GPU Scheduling

What actually broke when I deployed MLC-LLM across two RX 7900 XTX nodes, and the fixes that made it stable: quantization, KV cache sizing, and Kubernetes GPU hygiene.