Tensor Parallelism vs Pipeline Parallelism for LLM Inference
Tensor parallelism and pipeline parallelism split models across GPUs in fundamentally different ways with different latency and throughput implications. A practical guide covering how each works, NVLink vs InfiniBand trade-offs, and how they combine for 70B+ multi-node serving.