Running LLMs on Radeon GPUs with ROCm
November 20, 2025
Running LLMs on Radeon GPUs with ROCm
One of my main homelab projects is getting large language models running efficiently on AMD Radeon GPUs. While NVIDIA dominates the ML space, AMD's ROCm platform has come a long way and offers a compelling alternative for local inference.
Why AMD?
- Price/Performance - The RX 7900 XTX offers great value for inference workloads
- 24GB VRAM - Enough to run 7B-13B models comfortably
- Open Source - ROCm is fully open source
The Stack
My current setup uses:
- ROCm 6.x with PyTorch
- vLLM with ROCm support
- Custom Docker containers for reproducibility
Getting Started
The key is getting ROCm properly installed and configured. Here's a quick overview:
# Install ROCm (Ubuntu)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
sudo dpkg -i amdgpu-install_6.0.60000-1_all.deb
sudo amdgpu-install --usecase=rocm
# Verify installation
rocm-smi
Performance Notes
On my RX 7900 XTX, I'm seeing:
- ~30 tokens/second for Llama 2 7B (fp16)
- ~45 tokens/second with int8 quantization
- Memory efficiency is key for larger models
Next Steps
I'm working on a comprehensive benchmarking suite to compare different models and configurations. Stay tuned for more detailed performance data.
Check out the ROCm Inference Stack project for the full setup.