Research

On-Prem vs Cloud GPU: The Build-vs-Buy Calculus

TCO breakpoints. Operational overhead. Scaling economics.

[01]

The Three Deployment Models

Enterprise GPU infrastructure deployment falls into three models, each with distinct cost structures, operational requirements, and risk profiles.

Cloud GPU (rented): no upfront capex, pay-per-use or committed-use pricing, provider manages all infrastructure. Cost: $4.80-$7.50/hr per Blackwell GPU on-demand; committed-use discounts of 30-55% for 1-3 year terms. Advantages: speed to deploy (hours to days), elastic scaling, zero operational burden. Disadvantages: highest long-run cost, vendor lock-in risk, limited customisation.

Colocation (owned hardware, rented space): moderate upfront capex for hardware, monthly recurring cost for power, cooling, and rack space. Cost: hardware capex ($60,000-$65,000 per Blackwell GPU) plus colocation fees ($450-$900 per kW per year in primary markets). Advantages: hardware ownership, long-term cost advantage, provider manages facility. Disadvantages: hardware refresh risk, 4-12 week deployment timeline, minimum scale requirements.

On-premise (owned everything): highest upfront capex covering hardware, facility, power, cooling, and networking. Cost: all-in facility build at $15M-$17.25M per MW plus hardware. Advantages: full control, maximum customisation, lowest long-run unit cost at scale. Disadvantages: massive upfront capital, 12-24 month build timeline, full operational responsibility, facility risk.

[02]

TCO Breakeven Analysis

The crossover point between cloud and owned infrastructure depends on utilisation, commitment length, and scale.

At 80% utilisation on a 12-month horizon: cloud GPU costs approximately $28,000-$42,000 per GPU per year (committed-use pricing). Colocation with owned hardware costs approximately $22,000-$28,000 per GPU per year (amortising hardware over 3 years plus colo fees). On-premise costs approximately $18,000-$24,000 per GPU per year at scale (amortising hardware and facility over 5 years).

Breakeven timing: Cloud-to-colocation crossover occurs at approximately 12-18 months for steady-state workloads at 100+ GPU scale. Cloud-to-on-premise crossover occurs at approximately 24-36 months, but requires 500+ GPU scale to justify facility investment.

The hidden variable is utilisation. Cloud's pay-per-use model means cost scales linearly with usage. Owned infrastructure has fixed costs regardless of utilisation. Below 60% utilisation, cloud is cheaper at any scale. Above 75% utilisation, owned infrastructure becomes progressively more cost-effective. Most enterprises overestimate their initial utilisation by 15-25%, which shifts the breakeven point 6-12 months later than modelled.

[03]

Operational Overhead: The Hidden Cost

TCO models that focus on hardware and hosting miss the largest variable: operational overhead. Running GPU infrastructure requires specialist skills that are expensive and scarce.

Cloud GPU operational overhead is minimal: the provider handles hardware, networking, storage, and base-level orchestration. Internal cost is limited to ML engineering and application-level operations. Incremental staff: 0-2 FTEs for infrastructure management.

Colocation operational overhead is moderate: you own the hardware and manage the software stack (OS, drivers, orchestration, monitoring, security) while the colocation provider manages facility operations. Incremental staff: 2-5 FTEs for a 100-500 GPU deployment, covering systems engineering, networking, and on-call.

On-premise operational overhead is substantial: full responsibility for facility operations, hardware maintenance, networking, storage, security, and compliance. Incremental staff: 8-20 FTEs for a 500+ GPU deployment, including facilities engineering, electrical, cooling systems, physical security, and IT operations.

At $150,000-$250,000 fully loaded cost per specialist FTE, operational staffing alone adds $300,000-$5M annually depending on deployment model and scale. This cost is often underestimated in business cases by 40-60%.

[04]

Decision Framework: Which Model, When

The deployment model decision depends on four variables: time horizon, scale, utilisation predictability, and operational capability.

Choose cloud GPU when: time horizon is under 18 months, workload is unpredictable or bursty, scale is under 100 GPUs, you lack GPU infrastructure operations expertise, or speed-to-deploy is critical. Cloud is also the right starting point for proof-of-concept and initial model development regardless of eventual deployment target.

Choose colocation when: time horizon is 2-4 years, workload is steady-state with predictable utilisation above 70%, scale is 100-1,000 GPUs, you have or can hire 2-5 infrastructure specialists, and you want hardware ownership without facility risk. Colocation is the sweet spot for most enterprises scaling from cloud experimentation to production AI infrastructure.

Choose on-premise when: time horizon is 5+ years, scale exceeds 1,000 GPUs, you have specific physical security or isolation requirements that colocation cannot satisfy, you have existing facility capability (power, cooling, space), and you can recruit and retain 8-20 specialist operations staff.

The hybrid path is increasingly common: start on cloud for speed and experimentation (months 1-12), migrate steady-state workloads to colocation as patterns establish (months 12-24), and consider purpose-built facility only if scale and duration justify the capital and operational investment (month 24+).

Key Takeaways
01

Cloud-to-colocation TCO crossover at 12-18 months for steady-state workloads at 100+ GPU scale with 80%+ utilisation; cloud-to-on-premise at 24-36 months requiring 500+ GPUs

02

Below 60% utilisation, cloud is cheaper at any scale; above 75%, owned infrastructure becomes progressively more cost-effective — utilisation is the critical variable

03

Operational staffing adds $300K-$5M annually depending on model: 0-2 FTEs for cloud, 2-5 for colocation, 8-20 for on-premise — typically underestimated by 40-60%

04

Most enterprises overestimate initial utilisation by 15-25%, shifting breakeven 6-12 months later than modelled — conservative assumptions protect capital allocation

05

The hybrid path (cloud → colocation → on-premise) is increasingly standard: start for speed, migrate for cost, build for scale and control

Next Steps

This analysis is produced by Disintermediate, drawing on data from The GPU intelligence platform - tracking 2,800+ companies across 72 categories, real-time GPU pricing from 70+ providers, and advisory engagement experience across the GPU infrastructure value chain.