← Back to home
Parallel Training Strategies
Overview of common distributed training parallelism approaches used in large-scale machine learning.
Parallelism Strategies Reference
3D Parallelism Reference
Key Concepts
- Data Parallelism: Replicate the model across multiple devices; each processes a different batch of data and gradients are aggregated.
- Model Parallelism: Partition the model across devices; different layers or components run on different devices.
- Pipeline Parallelism: Split computation into stages; each stage runs on a different device, processing different mini-batches in a pipeline.
- 3D Parallelism: Combine all three approaches to scale to extremely large models and training runs.