Glossary

GPU Infrastructure Glossary

Terms, technologies, and concepts in AI compute infrastructure.

A

All-Reduce Operation
All-reduce is a collective communication operation that aggregates data from all participating processes and distributes...

B

Bare Metal
Bare metal refers to GPU servers accessed without virtualisation or hypervisor overhead — the customer receives direct h...

C

Colocation
Colocation (colo) is a data centre service model where the facility provider supplies power, cooling, physical security,...
Capacity Planning
Capacity planning is the discipline of forecasting GPU compute demand and aligning infrastructure procurement, deploymen...
Cluster Utilisation
Cluster utilisation measures the percentage of available GPU capacity that is actively generating revenue at any given t...

D

Distributed Training
Distributed training is the practice of splitting a deep learning training workload across multiple GPUs — within a node...

E

Edge Compute
Edge compute deploys GPU processing capacity at the network edge — closer to end users and data sources rather than in c...

G

GPU-as-a-Service (GPUaaS)
GPU-as-a-Service is a cloud delivery model where GPU compute capacity is rented on-demand or via reservation, rather tha...
GPU Memory (VRAM)
GPU memory (VRAM, specifically HBM — High Bandwidth Memory) is the on-chip memory that stores model parameters, activati...

H

Hyperscaler
A hyperscaler is a cloud infrastructure provider operating at massive global scale — specifically AWS (Amazon), Azure (M...

I

InfiniBand
InfiniBand is a high-bandwidth, low-latency networking technology developed by Mellanox (now NVIDIA Networking) that ser...
Immersion Cooling
Immersion cooling is a thermal management technique where IT equipment is fully submerged in a thermally conductive but ...
Inference Endpoint
An inference endpoint is a deployed model serving layer that accepts input data and returns predictions or generated con...

L

Liquid Cooling
Liquid cooling encompasses any data centre thermal management approach that uses liquid — typically water or a dielectri...

N

Neocloud
A neocloud is a GPU-focused cloud provider that emerged outside the hyperscaler ecosystem to serve AI and high-performan...
NVLink
NVLink is NVIDIA's proprietary high-speed interconnect for GPU-to-GPU communication within a single node. Unlike InfiniB...
Network Topology
Network topology describes the physical and logical arrangement of interconnections between nodes in a GPU cluster. The ...
Network Bandwidth
Network bandwidth is the maximum data transfer rate of a network connection, measured in gigabits per second (Gb/s) or t...
Network Latency
Network latency is the time delay for data to travel between two points in a network, measured in microseconds (µs) or m...

P

Pipeline Parallelism
Pipeline parallelism distributes different layers of a neural network across multiple GPUs or nodes, with each stage pro...
Power Density
Power density measures the electrical power consumed per unit of data centre floor space, typically expressed as kilowat...
Power Usage Effectiveness (PUE)
PUE is the ratio of total facility energy to IT equipment energy, measuring how efficiently a data centre delivers power...

Q

Quantisation
Quantisation reduces the numerical precision of model weights and activations — from 32-bit floating point (FP32) to 16-...

R

Reserved Instances
Reserved instances are GPU compute resources purchased via a time-bound commitment — typically 1, 6, 12, or 36 months — ...

S

Spot Instances
Spot instances are GPU compute resources offered at variable, discounted pricing with the caveat that the provider can r...
Sovereign Compute
Sovereign compute refers to nationally controlled GPU and AI infrastructure operated within a country's borders, subject...

T

Tensor Parallelism
Tensor parallelism is a distributed computing strategy that splits individual neural network layers across multiple GPUs...
Training Cluster
A training cluster is a tightly coupled array of GPU nodes connected via high-bandwidth interconnects — typically Infini...

U

Unit Economics
Unit economics in GPU infrastructure refers to the revenue, cost, and margin analysis at the per-GPU or per-MW level. Th...

W

Water Usage Effectiveness (WUE)
WUE measures the litres of water consumed per kilowatt-hour of IT energy, quantifying a data centre's water footprint. T...

Get The GPU Weekly

GPU infrastructure intelligence - read by NVIDIA, CoreWeave, and Brookfield. Every Saturday. Free.

Each term includes definition, technical context, and relevance to GPU infrastructure decision-making. Glossary updated as technology and market terminology evolve.

Have a term to suggest?