After months in preview, PyTorch 2.0 has been made generally available by the PyTorch Foundation.
The open source PyTorch project is among the most widely used technologies for machine learning (ML) training. Originally started by Facebook (now Meta), PyTorch 1.0 came out in 2018 and benefitted from years of incremental improvements.
>>Don’t miss our special issue: The quest for Nirvana: Applying AI at scale.<<
In September 2022, the PyTorch Foundation was created in a bid to enable more open governance and encourage more collaboration and contributions. The effort that has paid dividends, with the beta of PyTorch 2.0 going into preview in December 2022. PyTorch 2.0 benefits from 428 different contributors that provided new code and capabilities to the open source effort.
Performance is a primary focus for PyTorch 2.0 and one that developers have not been shy to promote. In fact, one of the key new features is Accelerated Transformers, formerly known as “Better Transformers.” These are are at the heart of modern Large Language Models (LLMs) and generative AI, enabling models to make connections between different concepts.
“We are particularly excited about the significant performance enhancements in this next generation of PyTorch series, which empowers developers with greater innovation to shape the future of PyTorch,” Ibrahim Haddad, executive director of the PyTorch Foundation said in a written statement to VentureBeat.
How PyTorch 2.0 will accelerate the ML landscape
A goal for the PyTorch project is to make training and deployment of state-of-the-art transformer models easier and faster.
Transformers are the foundational technology that has helped to enable the modern era of generative AI, including OpenAI’s models such as GPT-3 (and now GPT-4). In PyTorch 2.0 accelerated transformers, there is high-performance support for training and inference using a custom kernel architecture for an approach known as scaled dot product attention (SPDA).
As there are multiple types of hardware that can support transformers, PyTorch 2.0 can support multiple SDPA custom kernels. Going a step further, PyTorch integrates custom kernel selection logic that will pick the highest-performance kernel for a given model and hardware type.
The impact of the acceleration is non-trivial, as it helps enable developers to train models faster than prior iterations of PyTorch.
“With just one line of code to add, PyTorch 2.0 gives a speedup between 1.5x and 2.x in training Transformers models,” Sylvain Gugger, primary maintainer of HuggingFace transformers, wrote in a statement published by the PyTorch project. “This is the most exciting thing since mixed precision training was introduced!”
Intel helps to lead work on improving PyTorch for CPUs
Among the many contributors to PyTorch 2.0 is none other than silicon giant Intel.
Arun Gupta, VP and GM of open ecosystems at Intel, told VentureBeat that his company is highly supportive of open-source software and PyTorch moving to an open governance model in the PyTorch Foundation hosted by the Linux Foundation. Gupta noted that Intel is a top 3 contributor to PyTorch and is active within the community.
While AI and ML work is often closely associated with GPUs, there is a role for CPUs as well, and that has been an area of focus for Intel. Gupta said that Intel leads the TorchInductor optimizations for CPUs. Gupta explained that the TorchInductor CPU optimization enables the benefits of the new PyTorch compiler that is part of the 2.0 release to run on CPUs.
PyTorch also integrates capabilities referred to by the project as the Unified Quantization Backend for x86 CPU Platforms. The unified backend provides PyTorch the ability to choose the best implementation for quantization for a training platform. Intel has been developing its own oneDNN technology, which is also available for the rival open source TensorFlow ML library. The new unified backend also has support for the FBGEMM approach originally developed by Facebook/Meta as well.
“The end user benefit is they just select a single CPU backend, with best performance and best portability,” said Gupta. “Intel sees compilation as a powerful technology that will help PyTorch users get great performance even when running new and innovative models.”
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.