Welcome to My Homelab
November 27, 2025
Hey there! I'm Cody Blevins, and this is my little corner of the internet where I share projects, experiments, and learnings from my homelab. What started as "I wonder if I can run LLMs locally" has evolved into a full-blown datacenter in my partially finished basement/game room.
What You'll Find Here
I've been tinkering with technology for as long as I can remember, and this site is where I document the interesting stuff I'm building:
- Radeon GPU Inference - Running LLMs locally on AMD hardware using ROCm
- AI Agent Systems - Building and orchestrating autonomous AI agents
- Kubernetes Everything - Because why run one container when you can orchestrate hundreds?
- Fun Side Projects - Weekend hacks that sometimes turn into real tools
The Philosophy
My day job involves a lot of integration architecture and cloud infrastructure. The homelab lets me experiment with cutting-edge tech without the constraints of production environments (or cloud bills that make your CFO cry). There's something deeply satisfying about running your own infrastructure—knowing exactly where your data lives and having full control over the stack.
Plus, let's be honest: repurposing old gaming rigs into GPU compute nodes is just fun.
The Hardware Zoo
Over time I've accumulated what I affectionately call "the zoo"—a collection of machines that spans enterprise iron to retired gaming PCs, all working together in surprising harmony.
The Backbone: Dell R730xd
The heart of the operation is a Dell PowerEdge R730xd running Harvester HCI. This beast handles all my virtualization and provides distributed storage via Longhorn. It's not the quietest machine (good thing it's in the basement), but it's rock solid and gives me the flexibility to spin up VMs on demand.
From this single server, I run:
- 3 K3s control plane nodes (HA Kubernetes, because I've been burned before)
- Multiple worker VMs for general compute
- Storage pools that back the entire cluster
The GPU Fleet
This is where things get interesting. I've assembled a heterogeneous GPU cluster that would make a cloud provider's pricing team weep:
AMD Radeon RX 7900 XTX (x2) - The workhorses. With 24GB VRAM each, these handle the heavy lifting for LLM inference. One lives in a dedicated bare-metal node, the other in a repurposed Intel i7-5930K gaming rig that's found new purpose in life. ROCm support has come a long way, and with the right tuning (chunked prefill, speculative decoding), these things absolutely scream.
AMD Radeon VII - The OG. This card was doing ROCm before it was cool. 16GB of HBM2 makes it surprisingly capable for inference workloads, and it handles the overflow when the 7900 XTXs are busy.
NVIDIA GTX 980 Ti - The wildcard. Sometimes you need CUDA for that one tool that doesn't support ROCm yet. It's old, it's loud, but it still works.
The Edge
Raspberry Pi 5 - Because every homelab needs at least one Pi. Currently handling lightweight monitoring and edge tasks. It's not much, but it's honest work.
The Gaming PC Graveyard
Several machines in my cluster are former gaming rigs that served their time grinding Witcher 3, optimizing Factorio factories, and queuing for Dota 2. There's poetry in watching an i7-5930K that used to render The Witcher 3's landscapes now serving LLM inference requests.
The Software Stack
All of this hardware is orchestrated through a GitOps workflow using ArgoCD and Fleet. Everything is defined in code, version controlled, and automatically reconciled. Break something? git revert and grab a coffee.
Kubernetes (K3s)
I run K3s across the cluster—lightweight enough to not waste resources, but full-featured enough to run real workloads. The control plane is HA across three VMs, with workers spread across both VMs and bare-metal GPU nodes.
AI/ML Stack
- vLLM & SGLang - The dynamic duo for LLM serving. vLLM handles most inference with its PagedAttention magic, while SGLang comes in for workloads that benefit from its RadixAttention tree caching.
- LiteLLM - The universal adapter that makes all my models look like OpenAI to upstream services.
- ComfyUI - For when I want to generate images instead of text.
Storage
Longhorn provides distributed block storage across the cluster. It's not the fastest, but the redundancy and Kubernetes-native integration make it worth the trade-off. Important data gets replicated, experiments get ephemeral storage.
What's Next?
I'll be posting deep dives into specific projects—expect tutorials on getting local LLM inference humming on AMD GPUs, building AI agents that actually do useful things, and the occasional war story about what happens when you let Kubernetes autoscaling get too ambitious.
Topics on the horizon:
- Optimizing vLLM for the 7900 XTX (spoiler: it involves a lot of environment variables)
- Building a multi-model inference router
- The great migration: moving from Docker Compose to Kubernetes
- Why Harvester might be the best hypervisor you've never heard of
Stay tuned, and feel free to reach out if you want to chat about any of this stuff. There's something special about the homelab community—we're all just tinkerers at heart, trying to build cool things with whatever hardware we can get our hands on.