Product guide · Self-hosted

Ollama

Run LLMs locally. Free, open-source. No API bills. Cost is hardware and power.

Use cases

What teams actually use it for

  • Drafting and Q&A
  • Code assistance
  • Document analysis
  • Private/internal use cases
  • Offline or air-gapped
  • Cost-predictable high volume
Pricing

Pricing model

Free software. No per-token costs. You pay for: GPU/hardware, electricity, and maintenance.

Software

Free

Open source, MIT license

Hardware

One-off

GPU required; 8GB+ VRAM for smaller models, 24GB+ for 14B, 40GB+ for 70B

Power

Ongoing

See our Self-hosted GPU Comparisons for £/1M token estimates

Business fit

What to know before you commit

Pros

  • No data leaves your premises
  • Predictable cost (hardware + power)
  • No per-token bills
  • Wide model support (Llama, Mistral, Qwen, etc.)
  • Simple local setup

Considerations

  • Upfront hardware cost
  • You manage updates and security
  • Model quality depends on hardware
  • No managed SLA

When it makes sense: Good for data-sensitive workloads, high volume where cloud costs would exceed hardware, or air-gapped environments. Compare with our GPU cost guide.

Data handling

Where your data goes

All data stays on your hardware. No data sent to third parties. Full control.

GDPR / compliance. Data never leaves your infrastructure.

Data sovereignty. Complete data sovereignty.

Reference source

Want a recommendation for your use case?

Every team's fit is different. We'll model cost and ROI across cloud, self-hosted, and hybrid before recommending anything, including this product.