Resources
Self Hosted GPU Comparisons
Cost per token for running LLMs locally. Compare RTX 4090, 5090, 5080, and 20+ GPUs across 8B, 14B, and 70B models.
Resources
Cost per token for running LLMs locally. Compare RTX 4090, 5090, 5080, and 20+ GPUs across 8B, 14B, and 70B models.
Running your own AI locally gives you control over data, predictable costs, and no per-token API bills. The trade-off: upfront hardware cost and power. This guide compares GPUs by cost per million tokens—so you can see which card fits your volume and budget.
All figures assume 24/7 operation for 1 year, UK electricity at 26.35 p/kWh, and GPU TDP plus 100W system overhead. "Full cost" = Year 1 only; "3yr amort" = hardware spread over 3 years.
Our lab testing equipment: NVIDIA GeForce RTX 4090 (24GB VRAM), 450W TDP, Driver 590.48.01 / CUDA 13.1. £1,800 RRP, 550W system draw, £1,269/year power.
| Tier | Memory | Tok/s | Year 1 total | £/1M (full) | £/1M (3yr amort) |
|---|---|---|---|---|---|
| Large (70B) | 24 GB | — | — | Cannot run 70B | |
| Medium (14B) | 24 GB | 69 | £3,069 | £1.41 | £0.86 |
| Small (8B) | 24 GB | 104 | £3,069 | £0.94 | £0.57 |
| Optimized (8B) | 24 GB | ~130 | £3,069 | £0.75 | £0.46 |
Qwen3 8B at 16K context, 4-bit. Sorted by 3yr amortized cost.
| GPU | Memory | Tok/s | £/1M (full) | £/1M (3yr) |
|---|---|---|---|---|
| RTX 5070 Ti | 16 GB | 88 | £0.60 | £0.42 |
| RTX 5080 | 16 GB | 94 | £0.69 | £0.47 |
| RTX 3080 Ti | 12 GB | 88 | £0.72 | £0.49 |
| RTX 5060 Ti | 16 GB | 51 | £0.65 | £0.48 |
| RTX 3080 10GB | 10 GB | 74 | £0.66 | £0.50 |
| RTX 3090 Ti | 24 GB | 94 | £0.86 | £0.52 |
| RTX 3090 | 24 GB | 87 | £0.82 | £0.52 |
| RTX 5090 | 32 GB | 145 | £0.78 | £0.49 |
| RTX 4090 | 24 GB | 104 | £0.94 | £0.57 |
| RTX 3060 | 12 GB | 42 | £0.66 | £0.53 |
| RTX 4090 48GB | 48 GB | 106 | £1.11 | £0.62 |
| RTX 6000 Ada | 48 GB | 99 | £1.74 | £0.78 |
| RTX PRO 6000 WS | 96 GB | 141 | £1.98 | £0.80 |
| GPU | Memory | Tok/s | £/1M (full) | £/1M (3yr) |
|---|---|---|---|---|
| RTX PRO 6000 WS | 96 GB | 28 | £9.95 | £4.01 |
| RTX 4090 48GB | 48 GB | 18 | £6.55 | £3.68 |
| RTX 6000 Ada | 48 GB | 14 | £12.28 | £5.49 |
Cloud APIs (GPT-5 mini, Gemini Flash-Lite) run roughly £0.10–0.50/1M tokens. Self-hosted at 9,000+ TPM can be £0.40–0.50/1M—competitive for high-volume, batch-friendly workloads. Large 70B models at 2,400 TPM are often more expensive per token than cloud.
Best value for small models (8B): RTX 5070 Ti or RTX 5080. For 14B: same. For 70B: RTX PRO 6000 WS. RTX 4090 sits mid-pack—a solid lab reference.
Sources: Hardware Corner GPU ranking, Cloudrift benchmarks. UK RRP from Scan, Which, bestvaluegpu.com. Last updated Feb 2026.