Installing PyTorch has no business being so complicated on Linux as it still is in late 2022.
The main problem is that there are four independently moving parts and very little guidance on how to align them:
- Python version
- PyTorch version
- CUDA version
- Nvidia drivers version
- GPU card
On the first impression, it should work easily no? After all, conda is the de-facto queen of scientific computing, both Pytorch and Nvidia provide configurators for command-line, platform specific installations, PyTorch installation and CUDA drivers installation. Ubuntu is relatively “mainstream” and “corporate”, meaning that there is a single-click choice to install proprietary drivers from NVIDIA that are automatically determined based on the GPU card you have
For anyone who had a shot at trying to install PyTorch has realized there is an interdependence that’s not always easy to debug and resolve. After a couple of weeks lost a year ago, I was aware of the problem when I was starting to configure a new machine for ML work, but I still lost almost half a day to debug it and make it works.
Specifically, the problem was that NVIDIA CUDA version is currently at 12 (12.1 specifically), whereas the latest version of PyTorch wants 11.6 or at least 11.7, not even the last 11 series release – the 11.8.
For that, we will need to start by checking PyTorch requirements on the official site and choose the last compatible CUDA version. Here it is 11.7.
After that, we go and locate in the CUDA releases archives the relevant version. Here it is the CUDA-11.7.1. However, there is a catch-22 here. The default web installer any sane user would use (add key to keyring + apt-get install) will actually install CUDA-12. Yuuuup. And the downgrading experience is not the best, nor the most straightforward. So you MUST use a local installer command, that pins the version (here).
However, this is not it yet. Before installing CUDA, you need to make sure you have the proper drivers version, that are compatible with CUDA and the graphics card.
The current drivers version for Linux for NVIDIA drivers is 525.XX.XX for my graphics according to Nvidia’s reverse compatibility, fortunately for me it works with CUDA 11.7, otherwise a compatibility pack would have been needed. Moreover, your graphic cards might not be supported by the latest NVIDIA drivers, in which case you would need to work backwards to find the last release of PyTorch and connected packages that would still be supporting the CUDA stack you have access to.
Fortunately for me, it was not the case, so I could start installing things from there.
- Install latest Linux Nvidia drivers
- Install the specific CUDA version, in local (11.7.1 for me)
- Install the latest Anaconda
- Finally, install the current version of PyTorch
This could and should have been a one-liner with automated dependencies resolution or at least part of the installation stack on the Pytorch website.
It’s an outdated installation procedure straight from the 1990s, with user figuring out dependencies and resolving unexpected behaviors from those dependencies.
In 2022 we can and usually do better than that.
Especially for a major toolchain used by millions.