Skip to main content

Resources

Self Hosted GPU Comparisons

Cost per token for running LLMs locally. Compare RTX 4090, 5090, 5080, and 20+ GPUs across 8B, 14B, and 70B models.

Why self-hosted?

Running your own AI locally gives you control over data, predictable costs, and no per-token API bills. The trade-off: upfront hardware cost and power. This guide compares GPUs by cost per million tokens—so you can see which card fits your volume and budget.

Assumptions

All figures assume 24/7 operation for 1 year, UK electricity at 26.35 p/kWh, and GPU TDP plus 100W system overhead. "Full cost" = Year 1 only; "3yr amort" = hardware spread over 3 years.

RTX 4090 — Lab Reference

Our lab testing equipment: NVIDIA GeForce RTX 4090 (24GB VRAM), 450W TDP, Driver 590.48.01 / CUDA 13.1. £1,800 RRP, 550W system draw, £1,269/year power.

TierMemoryTok/sYear 1 total£/1M (full)£/1M (3yr amort)
Large (70B)24 GBCannot run 70B
Medium (14B)24 GB69£3,069£1.41£0.86
Small (8B)24 GB104£3,069£0.94£0.57
Optimized (8B)24 GB~130£3,069£0.75£0.46

Best value by model size

  • 8B models: RTX 5070 Ti (£0.42/1M amortized) or RTX 5080 (£0.47/1M)
  • 14B models: RTX 5070 Ti (£0.64/1M) or RTX 5080 (£0.69/1M)
  • 70B models: RTX PRO 6000 WS (£4.01/1M amortized) — only 40GB+ VRAM cards can run 70B

Full GPU comparison — 8B models

Qwen3 8B at 16K context, 4-bit. Sorted by 3yr amortized cost.

GPUMemoryTok/s£/1M (full)£/1M (3yr)
RTX 5070 Ti16 GB88£0.60£0.42
RTX 508016 GB94£0.69£0.47
RTX 3080 Ti12 GB88£0.72£0.49
RTX 5060 Ti16 GB51£0.65£0.48
RTX 3080 10GB10 GB74£0.66£0.50
RTX 3090 Ti24 GB94£0.86£0.52
RTX 309024 GB87£0.82£0.52
RTX 509032 GB145£0.78£0.49
RTX 409024 GB104£0.94£0.57
RTX 306012 GB42£0.66£0.53
RTX 4090 48GB48 GB106£1.11£0.62
RTX 6000 Ada48 GB99£1.74£0.78
RTX PRO 6000 WS96 GB141£1.98£0.80

70B models — only 40GB+ VRAM

GPUMemoryTok/s£/1M (full)£/1M (3yr)
RTX PRO 6000 WS96 GB28£9.95£4.01
RTX 4090 48GB48 GB18£6.55£3.68
RTX 6000 Ada48 GB14£12.28£5.49

Cloud vs self-hosted

Cloud APIs (GPT-5 mini, Gemini Flash-Lite) run roughly £0.10–0.50/1M tokens. Self-hosted at 9,000+ TPM can be £0.40–0.50/1M—competitive for high-volume, batch-friendly workloads. Large 70B models at 2,400 TPM are often more expensive per token than cloud.

The takeaway

Best value for small models (8B): RTX 5070 Ti or RTX 5080. For 14B: same. For 70B: RTX PRO 6000 WS. RTX 4090 sits mid-pack—a solid lab reference.

Sources: Hardware Corner GPU ranking, Cloudrift benchmarks. UK RRP from Scan, Which, bestvaluegpu.com. Last updated Feb 2026.

Estimate cloud API costs Third-party AI challenges Get a custom cost model