Research

How GPU Cloud Pricing Works: On-Demand, Reserved, and Spot

Three pricing models. Very different tradeoffs. Significant cost implications.

[01]

On-Demand: Pay As You Go

On-demand pricing lets you provision GPUs when needed and release them when finished, paying by the hour or by the second. No commitments. No upfront payment.

AWS charges $98.32/hour for a p5.48xlarge instance (8 H100 GPUs at 80GB each). Lambda Labs charges around $24/hour for equivalent capacity. CoreWeave charges $22-$28/hour depending on configuration and region.

Pricing data current as of Q1 2026;cloud GPU rates change frequently; verify current rates with providers directly. On-demand is the simplest and most flexible model. It is also the most expensive.

Providers price on-demand at a premium to compensate for the guarantee of availability. For exploratory work;testing a new model architecture, running a one-off fine-tuning job, prototyping a pipeline;on-demand is appropriate. For sustained training runs consuming thousands of GPU-hours per month, on-demand pricing is economically painful. A team training on 64 H100s full-time at $24/hour is spending $36,864 per day;over $1.1M per month. The economics of at-scale training demand a different pricing structure.

[02]

Reserved Instances: Commitment in Exchange for Discount

Reserved instances commit you to paying for a fixed amount of compute for a fixed term;typically one or three years;in exchange for a significant discount versus on-demand. AWS Reserved Instances for GPU compute provide 30-60% discount depending on term length and upfront payment. A 1-year reserved p5.48xlarge with partial upfront payment runs approximately $52/hour effective rate;a 47% discount.

CoreWeave's long-term contracts (12-36 months) produce effective rates of $14-$18/hour for H100 capacity. The commitment is real. You are obligated to pay whether or not you use the capacity.

For teams with variable or uncertain demand, the commitment risk is material. A reserved instance contract on hardware that becomes obsolete mid-term is deadweight capital. Contract flexibility;early termination rights, upgrade provisions;matters significantly in a market where GPU generations turn every 18-24 months. For reserved capacity contract analysis, term sheet templates, and deal comparables, speak to our advisory team at disintermediate.global/services.

[03]

Spot Instances: Surplus at Steep Discount

Spot (or preemptible) instances offer provider surplus capacity at deep discounts versus on-demand pricing. The catch: the provider can reclaim the capacity with short notice (typically 2 minutes on AWS) when demand for on-demand capacity increases. Spot rates are highly volatile and subject to supply-demand conditions;specific prices change frequently and should not be relied upon for planning; contact providers directly for current availability and indicative pricing.

The interruption risk makes spot unsuitable for long training runs without careful engineering. Spot is highly suitable for: batch inference workloads that can be checkpointed and retried; distributed training using fault-tolerant frameworks like PyTorch's elastic training; data preprocessing pipelines that are naturally resumable. Teams that engineer for spot interruption can reduce training costs significantly versus on-demand;a genuinely transformative cost reduction at scale.

[04]

Marketplace Aggregators: Spot Across Providers

A class of platform has emerged that aggregates GPU capacity across multiple providers. Vast.ai, RunPod, and similar platforms list GPU inventory from data centres, mining operators, and enterprise deployments globally.

Pricing on aggregator platforms is highly variable and changes frequently;rates have compressed significantly since 2023 as supply expanded. The tradeoff is reliability and support.

Inventory comes from smaller operators with variable uptime guarantees. For researchers with flexible timelines and fault-tolerant workloads, aggregators offer compelling economics. For enterprises with compliance requirements or latency-sensitive production workloads, aggregators introduce unacceptable risk. Contact providers directly for current pricing;rates are not quoted here as they change too frequently to be reliable reference figures.

[05]

Choosing the Right Mix

Sophisticated GPU buyers use all three models simultaneously. Reserved capacity covers the predictable baseline.

On-demand covers peak bursts. Spot covers fault-tolerant batch work.

This portfolio approach is standard among well-run AI teams spending over $500,000/year on compute. The goal is to match the cost of each GPU-hour to the criticality of the workload running on it. Paying on-demand rates for workloads that could tolerate spot interruption is a common and expensive mistake. For provider-level pricing intelligence, contract negotiation support, and a tailored compute cost model, get in touch at disintermediate.global/contact.

Key Takeaways
01

On-demand: maximum flexibility, highest cost;H100 capacity at $22-$98/hour depending on provider (pricing data current as of Q1 2026;verify current rates with providers directly); appropriate for exploratory and urgent workloads

02

Reserved (1-3 year): 30-60% discount over on-demand; optimal for predictable sustained training workloads with 12+ month runway

03

Spot/preemptible: significant discounts with interruption risk; requires fault-tolerant training infrastructure; rates vary widely and are subject to change;contact providers for current availability and pricing

04

Aggregator marketplaces (Vast.ai, RunPod) list H100 capacity at deep discounts;suitable for researchers with flexible timelines; rates change frequently and vary by provider

05

Sophisticated buyers mix reservation models: reserved capacity for predictable baseline, on-demand for bursts, preemptible for fault-tolerant batch workloads

Next Steps

This analysis is produced by Disintermediate, drawing on data from The GPU intelligence platform - tracking 2,800+ companies across 72 categories, real-time GPU pricing from 70+ providers, and advisory engagement experience across the GPU infrastructure value chain.