Back to Blog

Running LLMs on Radeon GPUs with ROCm

November 20, 2025

Running LLMs on Radeon GPUs with ROCm

One of my main homelab projects is getting large language models running efficiently on AMD Radeon GPUs. While NVIDIA dominates the ML space, AMD's ROCm platform has come a long way and offers a compelling alternative for local inference.

Why AMD?

  • Price/Performance - The RX 7900 XTX offers great value for inference workloads
  • 24GB VRAM - Enough to run 7B-13B models comfortably
  • Open Source - ROCm is fully open source

The Stack

My current setup uses:

  • ROCm 6.x with PyTorch
  • vLLM with ROCm support
  • Custom Docker containers for reproducibility

Getting Started

The key is getting ROCm properly installed and configured. Here's a quick overview:

# Install ROCm (Ubuntu)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/amdgpu-install_6.0.60000-1_all.deb
sudo dpkg -i amdgpu-install_6.0.60000-1_all.deb
sudo amdgpu-install --usecase=rocm

# Verify installation
rocm-smi

Performance Notes

On my RX 7900 XTX, I'm seeing:

  • ~30 tokens/second for Llama 2 7B (fp16)
  • ~45 tokens/second with int8 quantization
  • Memory efficiency is key for larger models

Next Steps

I'm working on a comprehensive benchmarking suite to compare different models and configurations. Stay tuned for more detailed performance data.

Check out the ROCm Inference Stack project for the full setup.