Software attributes
Other attributes
DeepSpeed is an open source deep learning optimization library for PyTorch. The library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware. DeepSpeed is part of Microsoft's AI at Scale initiative to enable artificial intelligence capabilities at scale.
DeepSpeed can train DL models with trillions of parameters on current generation GPU clusters. Adopters of DeepSpeed have produced a language model with over 17 billion paramters called Turing-NLG.
DeepSpeed provides memory-efficient data parallelism and enables training models without model parallelism. For example, DeepSpeed can train models with up to 13 billion parameters on NVIDIA V100 GPUs with 32GB of device memory.
Part of DeepSpeed's effectiveness is its reduction of the training memory footprint through a solution called Zero Redundancy Optimizer (ZeRO). ZeRO partitions model states and gradients to save significant memory. Furthermore, it reduces activation memory and fragmented memory.