GPU Total Cost of Ownership

[01]

Sticker Price vs. Total Cost of Ownership

Current-gen Blackwell B200 sticker price: $40-50K per GPU at volume; 8-GPU node system pricing $400-485K (Q1 2026; hardware costs rising rapidly, treat as indicative). A 10MW cluster ($64-80M accelerator capex) requires additional capex: networking (InfiniBand/RoCE) adds 8-12% ($3.8-7.7M); power infrastructure adds 5-8% ($2.4-5.1M); cooling systems adds 3-6% ($1.4-3.8M); racks, PDUs, cabling, software add 8-12% ($3.8-7.7M). Total capex reaches $60-90M, or $37.5-56.25K per GPU all-in. Sticker price represents only 53-67% of true hardware capex.

TCO extends beyond capex to include five-year operating costs: depreciation, power, cooling, networking, staffing, insurance, contingency. A 10MW cluster with $75M capex at 4-year depreciation depreciates $18.75M annually. At 80% utilisation and $5.00/hour blended pricing, annual revenue is $35M, making depreciation 54% of gross revenue.

Over five years, total depreciation ($93.75M) plus operating expenses ($15-25M annually, or $75-125M over five years) results in TCO of $170-220M against $175M cumulative revenue. Operating margins (pre-financing costs) are just 2-10%, becoming negative if pricing declines or utilisation falls short.

[02]

Power Consumption & Cost Analysis

Modern GPU accelerators (Blackwell B200) consume 300-500W per unit, with 400W nominal standard. A 10MW cluster requires 640MW power to GPUs, plus networking (5-8%), cooling overhead (8-12%), and management systems (2-3%), totalling 720-780MW cluster-wide power draw. At $0.10/kWh (moderate US colocation), annual power cost = $6.3-6.8M annually (18-19% of gross revenue). Power is the largest opex line item.

Regional variance is significant: Virginia/Texas at $0.08-$0.10/kWh costs $4.6-5.8M annually; Northern Europe at $0.12-$0.15/kWh costs $8.6-10.9M (35-50% of gross revenue); sovereign/subsidised facilities (Saudi Arabia, Iceland) at $0.06-$0.08/kWh cost $3.5-4.6M annually (10-13%).

Power cost sensitivity dominates TCO: a $0.02/kWh increase adds $1.26M annual cost, reducing EBITDA margin 3.6%. Models must include PPA assumptions: long-term PPAs (5-10 years fixed) lock in cost but require commitments; spot pricing creates optionality but volatility. Renewable PPAs (solar, wind) cost $0.04-$0.08/kWh but require 5-10 year commitments and location constraints. Hyperscalers increasingly pair GPU clusters with renewable capacity; mid-market operators often resort to grid power at higher cost.

[03]

Cooling Architecture & Liquid Cooling Economics

Air cooling (standard CRAC/CRAH) achieves PUE of 1.6-1.8x. Modern air cooling with hot-row containment achieves 1.15-1.25x PUE. Liquid cooling (immersion, liquid loop, or hybrid) achieves 1.02-1.10x PUE, eliminating significant cooling overhead. For a 10MW cluster at 650MW power draw: air cooling at 1.20x PUE requires 780MW facility power; liquid cooling at 1.05x PUE requires 682.5MW facility power. Difference = 97.5MW, or $850K-$1.46M annually in power savings (at $0.10-$0.15/kWh).

Liquid cooling capex is substantial: immersion cooling requires $4-8M capex and adds $0.10-$0.20/hour operational overhead (fluid top-up, monitoring, remediation). Liquid loop cooling requires $2-4M capex and adds $0.05-$0.15/hour overhead. Payback is typically 4-6 years at $0.10+/kWh; at lower power costs ($0.06-$0.08/kWh), payback extends to 6-10 years or becomes uneconomical.

Hyperscalers (Google, Meta, Microsoft) routinely deploy liquid cooling for large fleets (30-50% of capacity); mid-market operators adopt selectively; smaller operators typically cannot justify capex. Financing and lease structures exist for cooling (vendor retains ownership, operator pays per-kWh overhead), reducing upfront capex but locking in long-term cost.

[04]

Networking & Interconnect Tradeoffs

GPU cluster networking is critical for training performance (AllReduce, parameter synchronisation) and inference latency. High-speed interconnect options: (1) InfiniBand (HDR 200Gbps, NDR 400Gbps), (2) RoCE (RDMA over Converged Ethernet), (3) standard Ethernet (100Gbps commodity).

InfiniBand capex: $40-80K per GPU ($64-128M for 1,600 GPUs total) plus $500K-$1M annually for support staff and operations. RoCE capex: $10-20K per GPU ($16-32M total) with $200-500K annual overhead. Ethernet capex: $2-5K per GPU ($3.2-8M total) with $50-150K annual overhead.

InfiniBand required for large-scale training (512+ GPUs) where synchronisation latency is critical; RoCE supports mid-scale training (64-512 GPUs) with modest latency penalty; Ethernet suffices for inference where batch latency is not critical. Choosing InfiniBand over RoCE adds $48-96M capex, requiring $8-16M additional annual financing cost.

Neoclouds (CoreWeave, Lambda) primarily deploy RoCE or Ethernet (lower capex, faster deployment, inference-heavy suitable); enterprises requiring large training (OpenAI, Anthropic) deploy InfiniBand. Align networking choice with workload mix: training-heavy needs InfiniBand/RoCE; inference-heavy can use Ethernet and save 50-80% networking capex.

[05]

Software Stack, Insurance, & Facility Costs

GPU cluster software stack includes: (1) hypervisor/orchestration (Kubernetes, OpenStack, proprietary), (2) monitoring tools (Prometheus, Grafana, DataDog, New Relic), (3) networking software (ONAP for SDN, vendor-specific management), (4) customer-facing APIs and billing systems. Software licences and subscriptions typically cost $100K-$500K annually. Open-source software (Kubernetes, Prometheus) minimises licensing cost but requires in-house engineering; commercial software (DataDog, Splunk) adds cost but provides support. Budget $0.01-$0.05/hour per GPU in software/SaaS costs.

Insurance for GPU clusters (equipment, liability, business interruption) typically costs $200K-$500K annually (0.5-1.5% of capex value). Facility rent varies widely: US colocation $1-3/kW/month ($720K-$2.16M annually for 10MW); owned facilities reduce rent but require land/building capex ($10-30M).

Data centre real estate is strategic asset; neoclouds typically lease from large colocation operators (Equinix, Digital Realty) rather than build proprietary centres. All-in facility costs (rent + utilities + maintenance + property tax) typically range $1.50-$3.00/kW/month in mature markets. Greenfield data centre construction (sovereign facilities in Saudi Arabia, UAE, India) often benefits from government land grants, reducing costs to $0.50-$1.50/kW/month.

Key Takeaways

All-in capex for GPU clusters: $37.5-56K per GPU (vs. $40,000-$50,000 per GPU sticker, indicative Q1 2026 pricing;hardware costs are rising rapidly), representing approximately 53-67% sticker as portion of total capex

Power dominates opex: 30-40% of total operating cost; regional variation ($0.06-$0.15/kWh) drives 50%+ cost variance across geographies

Liquid cooling reduces PUE from 1.20x to 1.05x but requires $2-8M capex; payback 4-6 years only at high power costs

Networking choice (InfiniBand vs. RoCE vs. Ethernet) adds $10-80K capex per GPU; InfiniBand only justified for training-heavy workloads

Sticker Price vs. Total Cost of Ownership

Power Consumption & Cost Analysis

Cooling Architecture & Liquid Cooling Economics

Networking & Interconnect Tradeoffs

Software Stack, Insurance, & Facility Costs

GPU Cluster Financial Modelling

Cluster Economics: Key Assumptions

GPU Infrastructure Operating Expenses