Infrastructure

Self-hosted GPU comparisons

Cost per token for running LLMs locally, across RTX 4090, 5090, 5080, and 20+ other GPUs. When self-hosting becomes the cheaper answer.

Stylised server rack with three shelves of GPU cards, rendered as flat geometric shapes

Article body

Why self-hosted?

Running your own AI locally gives you control over data, predictable costs, and no per-token API bills. The trade-off: upfront hardware cost and power. This guide compares GPUs by cost per million tokens, so you can see which card fits your volume and budget.

Assumptions

All figures assume 24/7 operation for 1 year, UK electricity at 26.35 p/kWh, and GPU TDP plus 100W system overhead. "Full cost" = Year 1 only; "3yr amort" = hardware spread over 3 years.

RTX 4090: lab reference

Our lab testing equipment: NVIDIA GeForce RTX 4090 (24GB VRAM), 450W TDP, Driver 590.48.01 / CUDA 13.1. £1,800 RRP, 550W system draw, £1,269/year power.

RTX 4090 performance: tokens/sec, Year 1 total, and cost per million tokens by model tier.
TierMemoryTok/sYear 1 total£/1M (full)£/1M (3yr amort)
Large (70B)24 GBCannot run 70B
Medium (14B)24 GB69£3,069£1.41£0.86
Small (8B)24 GB104£3,069£0.94£0.57
Optimised (8B)24 GB~130£3,069£0.75£0.46

Best value by model size

  • 8B models: RTX 5070 Ti (£0.42/1M amortised) or RTX 5080 (£0.47/1M).
  • 14B models: RTX 5070 Ti (£0.64/1M) or RTX 5080 (£0.69/1M).
  • 70B models: RTX PRO 6000 WS (£4.01/1M amortised). Only 40GB+ VRAM cards can run 70B.

Full GPU comparison: 8B models

Qwen3 8B at 16K context, 4-bit. Sorted by 3yr amortised cost.

Self-hosted GPUs ranked by 3-year amortised cost per million tokens on 8B models.
GPUMemoryTok/s£/1M (full)£/1M (3yr)
RTX 5070 Ti16 GB88£0.60£0.42
RTX 508016 GB94£0.69£0.47
RTX 3080 Ti12 GB88£0.72£0.49
RTX 5060 Ti16 GB51£0.65£0.48
RTX 3080 10GB10 GB74£0.66£0.50
RTX 3090 Ti24 GB94£0.86£0.52
RTX 309024 GB87£0.82£0.52
RTX 509032 GB145£0.78£0.49
RTX 409024 GB104£0.94£0.57
RTX 306012 GB42£0.66£0.53
RTX 4090 48GB48 GB106£1.11£0.62
RTX 6000 Ada48 GB99£1.74£0.78
RTX PRO 6000 WS96 GB141£1.98£0.80

70B models: 40GB+ VRAM only

GPUs capable of running 70B models. 40GB+ VRAM only.
GPUMemoryTok/s£/1M (full)£/1M (3yr)
RTX PRO 6000 WS96 GB28£9.95£4.01
RTX 4090 48GB48 GB18£6.55£3.68
RTX 6000 Ada48 GB14£12.28£5.49

Cloud vs self-hosted

Cloud APIs (GPT-5 mini, Gemini Flash-Lite) run roughly £0.10–0.50/1M tokens. Self-hosted at 9,000+ TPM can be £0.40–0.50/1M, which is competitive for high-volume, batch-friendly workloads. Large 70B models at 2,400 TPM are often more expensive per token than cloud.

The takeaway

Best value for small models (8B): RTX 5070 Ti or RTX 5080. For 14B: same. For 70B: RTX PRO 6000 WS. RTX 4090 sits mid-pack, a solid lab reference.

Sources: Hardware Corner GPU ranking, Cloudrift benchmarks. UK RRP from Scan, Which, bestvaluegpu.com. Last updated Feb 2026.