Glossary

GPU Infrastructure Glossary

Terms, technologies, and concepts in AI compute infrastructure.

A

  • All-Reduce Operation

    All-reduce is a collective communication operation that aggregates data from all participating processes and distributes...

B

  • Bare Metal

    Bare metal refers to GPU servers accessed without virtualisation or hypervisor overhead — the customer receives direct h...

C

  • Colocation

    Colocation (colo) is a data centre service model where the facility provider supplies power, cooling, physical security,...

  • Capacity Planning

    Capacity planning is the discipline of forecasting GPU compute demand and aligning infrastructure procurement, deploymen...

  • Cluster Utilisation

    Cluster utilisation measures the percentage of available GPU capacity that is actively generating revenue at any given t...

D

  • Distributed Training

    Distributed training is the practice of splitting a deep learning training workload across multiple GPUs — within a node...

E

  • Edge Compute

    Edge compute deploys GPU processing capacity at the network edge — closer to end users and data sources rather than in c...

G

  • GPU-as-a-Service (GPUaaS)

    GPU-as-a-Service is a cloud delivery model where GPU compute capacity is rented on-demand or via reservation, rather tha...

  • GPU Memory (VRAM)

    GPU memory (VRAM, specifically HBM — High Bandwidth Memory) is the on-chip memory that stores model parameters, activati...

H

  • Hyperscaler

    A hyperscaler is a cloud infrastructure provider operating at massive global scale — specifically AWS (Amazon), Azure (M...

I

  • InfiniBand

    InfiniBand is a high-bandwidth, low-latency networking technology developed by Mellanox (now NVIDIA Networking) that ser...

  • Immersion Cooling

    Immersion cooling is a thermal management technique where IT equipment is fully submerged in a thermally conductive but ...

  • Inference Endpoint

    An inference endpoint is a deployed model serving layer that accepts input data and returns predictions or generated con...

L

  • Liquid Cooling

    Liquid cooling encompasses any data centre thermal management approach that uses liquid — typically water or a dielectri...

N

  • Neocloud

    A neocloud is a GPU-focused cloud provider that emerged outside the hyperscaler ecosystem to serve AI and high-performan...

  • NVLink

    NVLink is NVIDIA's proprietary high-speed interconnect for GPU-to-GPU communication within a single node. Unlike InfiniB...

  • Network Topology

    Network topology describes the physical and logical arrangement of interconnections between nodes in a GPU cluster. The ...

  • Network Bandwidth

    Network bandwidth is the maximum data transfer rate of a network connection, measured in gigabits per second (Gb/s) or t...

  • Network Latency

    Network latency is the time delay for data to travel between two points in a network, measured in microseconds (µs) or m...

P

  • Pipeline Parallelism

    Pipeline parallelism distributes different layers of a neural network across multiple GPUs or nodes, with each stage pro...

  • Power Density

    Power density measures the electrical power consumed per unit of data centre floor space, typically expressed as kilowat...

  • Power Usage Effectiveness (PUE)

    PUE is the ratio of total facility energy to IT equipment energy, measuring how efficiently a data centre delivers power...

Q

  • Quantisation

    Quantisation reduces the numerical precision of model weights and activations — from 32-bit floating point (FP32) to 16-...

R

  • Reserved Instances

    Reserved instances are GPU compute resources purchased via a time-bound commitment — typically 1, 6, 12, or 36 months — ...

S

  • Spot Instances

    Spot instances are GPU compute resources offered at variable, discounted pricing with the caveat that the provider can r...

  • Sovereign Compute

    Sovereign compute refers to nationally controlled GPU and AI infrastructure operated within a country's borders, subject...

T

  • Tensor Parallelism

    Tensor parallelism is a distributed computing strategy that splits individual neural network layers across multiple GPUs...

  • Training Cluster

    A training cluster is a tightly coupled array of GPU nodes connected via high-bandwidth interconnects — typically Infini...

U

  • Unit Economics

    Unit economics in GPU infrastructure refers to the revenue, cost, and margin analysis at the per-GPU or per-MW level. Th...

W

Get The GPU Weekly

GPU infrastructure intelligence - read by NVIDIA, CoreWeave, and Brookfield. Every Saturday. Free.

Each term includes definition, technical context, and relevance to GPU infrastructure decision-making. Glossary updated as technology and market terminology evolve.

Have a term to suggest?