Pipeline Parallelism
Pipeline parallelism distributes different layers of a neural network across multiple GPUs or nodes, with each stage processing its assigned layers sequentially. Unlike tensor parallelism which splits within layers, pipeline parallelism splits between layers — Stage 1 processes layers 1-10, Stage 2 processes layers 11-20, and so on. Data flows through the pipeline as micro-batches, allowing all stages to compute simultaneously on different batches. Pipeline parallelism can work across network-connected nodes because inter-stage communication is less frequent than intra-layer communication.
Pipeline parallelism introduces "pipeline bubbles" — periods where stages are idle waiting for input or output. Efficient implementations minimise bubbles through micro-batching and interleaved scheduling. GPipe and PipeDream pioneered these optimisation techniques. In practice, most large-scale training uses a combination of tensor parallelism within nodes and pipeline parallelism across nodes, sometimes combined with data parallelism for additional scale — a configuration known as 3D parallelism.
Pipeline parallelism is relevant when evaluating multi-node cluster configurations. The inter-node bandwidth requirements for pipeline parallelism are lower than for tensor parallelism, which affects network infrastructure requirements and therefore deployment costs.
This glossary is maintained by Disintermediate as a reference for GPU infrastructure professionals, investors, and operators. Each entry reflects terminology as used in active advisory engagements and market intelligence work.