Infrastructure

Self-hosted GPU comparisons

Cost per token for running LLMs locally, across RTX 4090, 5090, 5080, and 20+ other GPUs. When self-hosting becomes the cheaper answer.

Stylised server rack with three shelves of GPU cards, rendered as flat geometric shapes

Why self-hosted?

Running your own AI locally gives you control over data, predictable costs, and no per-token API bills. The trade-off: upfront hardware cost and power. This guide compares GPUs by cost per million tokens, so you can see which card fits your volume and budget.

Assumptions

All figures assume 24/7 operation for 1 year, UK electricity at 26.35 p/kWh, and GPU TDP plus 100W system overhead. "Full cost" = Year 1 only; "3yr amort" = hardware spread over 3 years.

RTX 4090: lab reference

Our lab testing equipment: NVIDIA GeForce RTX 4090 (24GB VRAM), 450W TDP, Driver 590.48.01 / CUDA 13.1. £1,800 RRP, 550W system draw, £1,269/year power.

RTX 4090 performance: tokens/sec, Year 1 total, and cost per million tokens by model tier.
Tier	Memory	Tok/s	Year 1 total	£/1M (full)	£/1M (3yr amort)
Large (70B)	24 GB	—	—	Cannot run 70B
Medium (14B)	24 GB	69	£3,069	£1.41	£0.86
Small (8B)	24 GB	104	£3,069	£0.94	£0.57
Optimised (8B)	24 GB	~130	£3,069	£0.75	£0.46

Best value by model size

8B models: RTX 5070 Ti (£0.42/1M amortised) or RTX 5080 (£0.47/1M).
14B models: RTX 5070 Ti (£0.64/1M) or RTX 5080 (£0.69/1M).
70B models: RTX PRO 6000 WS (£4.01/1M amortised). Only 40GB+ VRAM cards can run 70B.

Full GPU comparison: 8B models

Qwen3 8B at 16K context, 4-bit. Sorted by 3yr amortised cost.

Self-hosted GPUs ranked by 3-year amortised cost per million tokens on 8B models.
GPU	Memory	Tok/s	£/1M (full)	£/1M (3yr)
RTX 5070 Ti	16 GB	88	£0.60	£0.42
RTX 5080	16 GB	94	£0.69	£0.47
RTX 3080 Ti	12 GB	88	£0.72	£0.49
RTX 5060 Ti	16 GB	51	£0.65	£0.48
RTX 3080 10GB	10 GB	74	£0.66	£0.50
RTX 3090 Ti	24 GB	94	£0.86	£0.52
RTX 3090	24 GB	87	£0.82	£0.52
RTX 5090	32 GB	145	£0.78	£0.49
RTX 4090	24 GB	104	£0.94	£0.57
RTX 3060	12 GB	42	£0.66	£0.53
RTX 4090 48GB	48 GB	106	£1.11	£0.62
RTX 6000 Ada	48 GB	99	£1.74	£0.78
RTX PRO 6000 WS	96 GB	141	£1.98	£0.80

70B models: 40GB+ VRAM only

GPUs capable of running 70B models. 40GB+ VRAM only.
GPU	Memory	Tok/s	£/1M (full)	£/1M (3yr)
RTX PRO 6000 WS	96 GB	28	£9.95	£4.01
RTX 4090 48GB	48 GB	18	£6.55	£3.68
RTX 6000 Ada	48 GB	14	£12.28	£5.49

Cloud vs self-hosted

Cloud APIs (GPT-5 mini, Gemini Flash-Lite) run roughly £0.10–0.50/1M tokens. Self-hosted at 9,000+ TPM can be £0.40–0.50/1M, which is competitive for high-volume, batch-friendly workloads. Large 70B models at 2,400 TPM are often more expensive per token than cloud.

The takeaway

Best value for small models (8B): RTX 5070 Ti or RTX 5080. For 14B: same. For 70B: RTX PRO 6000 WS. RTX 4090 sits mid-pack, a solid lab reference.

Sources: Hardware Corner GPU ranking, Cloudrift benchmarks. UK RRP from Scan, Which, bestvaluegpu.com. Last updated Feb 2026.

Get a custom cost model Compare with cloud API costs