News

With PyTorch 1.5, the RPC framework can be used to build training applications that make use of distributed architectures if they’re available.
Fortunately, almost all of the PyTorch optimizers' parameters have reasonable default values. As a general rule of thumb, for binary classification problems I start by trying SGD using the default ...
The solutions will include: 1) MPI-driven Deep Learning, 2) Co-designing Deep Learning Stacks with High-Performance MPI, 3) Out-of- core DNN training, and 4) Hybrid (Data and Model) parallelism. Case ...
In this video from the Swiss HPC Conference, DK Panda from Ohio State University presents: Scalable and Distributed DNN Training on Modern HPC Systems. The current wave of advances in Deep Learning ...
Soumith Chintala, PyTorch project lead, seems to share Zaharia's ideas about distributed training being the next big thing in deep learning, as it has been introduced in the latest version of PyTorch.