Compute and runtime
Proxmox or managed Kubernetes for the orchestration layer; Ollama, vLLM, or llama.cpp as the inference runtime. All open-source. All run on commodity hardware. None of them ship your prompts to a third party.
Where you run your AI (cloud, sovereign, or hybrid) shapes its energy footprint, its heat profile, its lifecycle, and the audit trail you can produce at the end of the year. This is the page about that choice, and an honest look at what good actually looks like.
The silent hum of servers and cooling systems is the sound of a significant and growing environmental footprint. It does not have to stay that way. But the lever isn't a sustainability slogan. It's the infrastructure choice underneath.
Cloud AI is brilliant for trying things out. For long-running production workloads, the picture changes. The energy is real, the heat is real, the supply chain is real, and the data sovereignty question that finance and security teams already ask quietly maps almost one-to-one onto the question a sustainability officer is starting to ask out loud.
You own the stack, you own the data, and you own the meter. That last one is what makes the ESG number defensible. Pillar 1: Measurable AI is the page about the meter; this page is the page about the stack.
People talk about AI's carbon as if it's just the kilowatt-hours the GPU draws while running inference. That's a slice of it. The full picture is bigger:
The point isn't that one number captures it. The point is that an infrastructure decision is a decision about all five.
Server heat is usually wasted. Vented to the atmosphere, dumped into a chiller, gone. A small but growing number of operators are doing something better with it.
A few examples worth naming.
It turns something wasteful into something useful, cuts carbon, and makes a data centre a better neighbour. None of these are pilots. They are operating today, at scale, in jurisdictions that decided the regulation needed to lead the market rather than wait for it.
The honest follow-up: most operators aren't doing this. Most data centres still run air-cooled, vent the heat, and write their PUE in the marketing materials. The gap between what's possible and what's typical is wide. If you're picking a colocation partner, ask the question. If you're hosting it yourself, look at liquid cooling. For the latest generation of high-density servers, air-cooling is becoming inefficient anyway, and liquid systems open the door to heat-recovery later.
Source: "The Hidden Costs of Cloud AI" and "The Silent Hum: Building Sustainable Data Centers" (mattshore.co.uk).
The people asking "where is our data" and the people asking "what's our carbon" are arriving at the same answer from opposite directions. Both are asking the operator: what do you control, and what can you prove?
None of this is novel infrastructure engineering. The pieces (Proxmox, Ollama, n8n, vLLM, a smart PDU, a rack-level meter, a managed Kubernetes layer) are commodity. The novel part is the sustainability-reporting engineering: exposing the measurements you already have, timestamping them, and keeping them in a form an auditor can trace.
If your team is reading this because of the Broadcom VMware licensing squeeze or a similar lock-in event, sovereign infrastructure isn't just a sustainability story. It's also the cleanest off-ramp from a vendor that decided to renegotiate. The migration path is well-trodden; the ESG benefit is a side-effect that becomes the headline once the auditor asks.
An honest list. We'd rather name the gap than pretend it's solved.
None of these are reasons to stop. They are the reasons we built a tooling layer underneath the claims, so that when the framework asks for the next number, you can produce it instead of estimating it.
Picking the right components matters as much as the architecture. The KB4AI knowledge base ranks self-hosted and sovereign AI tools on three dimensions: Data Security, Trust, and Risk. Same lens an auditor would apply, applied at procurement.
Proxmox or managed Kubernetes for the orchestration layer; Ollama, vLLM, or llama.cpp as the inference runtime. All open-source. All run on commodity hardware. None of them ship your prompts to a third party.
The Horizon Portal handles per-agent power, kWh, and CO₂e with method-accuracy badges and a methodology footer on every report. Pulse covers Proxmox-and-Docker visibility for the underlying estate.
n8n for the deterministic orchestration. Hermes as the Horizon Portal's AI assistant layer, backed by the same self-hosted Ollama instance and with strict read-only access to the stack. AI-assisted ops without handing over write credentials, and without any prompt or stack data leaving your perimeter.
Audit logs, access control, and primary-data capture aren't add-ons; they're the substrate. The governance library covers the topics any sovereign deployment needs to think through before it ships.
What this isn't. A claim that sovereign infrastructure is automatically lower-carbon than cloud. At small scale, hyperscalers' efficiency and renewable-purchasing often win on raw kWh. The sovereign win is data quality and control. The kWh number you produce is yours, measured, and defensible. The grid mix and the runtime are still the levers that move the absolute figure.