How to install PyTorch Nvidia GPU stack on Ubuntu LTS 22.04 (late 2022 – early 2023)

December 30, 2022January 8, 2023 Andrei KNo Comments

Installing PyTorch has no business being so complicated on Linux as it still is in late 2022.

The main problem is that there are four independently moving parts and very little guidance on how to align them:

Python version
PyTorch version
CUDA version
Nvidia drivers version
GPU card

On the first impression, it should work easily no? After all, conda is the de-facto queen of scientific computing, both Pytorch and Nvidia provide configurators for command-line, platform specific installations, PyTorch installation and CUDA drivers installation. Ubuntu is relatively “mainstream” and “corporate”, meaning that there is a single-click choice to install proprietary drivers from NVIDIA that are automatically determined based on the GPU card you have

Right?

Wrong.

For anyone who had a shot at trying to install PyTorch has realized there is an interdependence that’s not always easy to debug and resolve. After a couple of weeks lost a year ago, I was aware of the problem when I was starting to configure a new machine for ML work, but I still lost almost half a day to debug it and make it works.

Specifically, the problem was that NVIDIA CUDA version is currently at 12 (12.1 specifically), whereas the latest version of PyTorch wants 11.6 or at least 11.7, not even the last 11 series release – the 11.8.

For that, we will need to start by checking PyTorch requirements on the official site and choose the last compatible CUDA version. Here it is 11.7.

After that, we go and locate in the CUDA releases archives the relevant version. Here it is the CUDA-11.7.1. However, there is a catch-22 here. The default web installer any sane user would use (add key to keyring + apt-get install) will actually install CUDA-12. Yuuuup. And the downgrading experience is not the best, nor the most straightforward. So you MUST use a local installer command, that pins the version (here).

However, this is not it yet. Before installing CUDA, you need to make sure you have the proper drivers version, that are compatible with CUDA and the graphics card.

The current drivers version for Linux for NVIDIA drivers is 525.XX.XX for my graphics according to Nvidia’s reverse compatibility, fortunately for me it works with CUDA 11.7, otherwise a compatibility pack would have been needed. Moreover, your graphic cards might not be supported by the latest NVIDIA drivers, in which case you would need to work backwards to find the last release of PyTorch and connected packages that would still be supporting the CUDA stack you have access to.

Fortunately for me, it was not the case, so I could start installing things from there.

So:

Install latest Linux Nvidia drivers
Install the specific CUDA version, in local (11.7.1 for me)
Install the latest Anaconda
Finally, install the current version of PyTorch

This could and should have been a one-liner with automated dependencies resolution or at least part of the installation stack on the Pytorch website.

It isn’t.

It’s an outdated installation procedure straight from the 1990s, with user figuring out dependencies and resolving unexpected behaviors from those dependencies.

In 2022 we can and usually do better than that.

Especially for a major toolchain used by millions.

Jupyter/Ipython notebooks

September 6, 2018 Andrei K1 Comment

After writing it down a couple of weeks ago for Hacker News, here is the recap and some updates:

I am a computational biologist with a heavy emphasis on the data analysis. I did try Jupyter a couple of years ago and here are my concerns with it, compared to my usual flow (Pycharm + pure python + pickle to store results of heavy processing).

Extracting functions is harder
Your git commits become completely borked
Opening some data-heavy notebooks is neigh impossible once they have been shut down
Import of other modules you have in local is pretty non-trivial.
Refactoring is pretty hard
Sphinx for autodoc extraction is pretty much out of the picture
Non-deterministic re-runs – depending on the cell
execution order you can get very different results. That’s an issue
when you are coming back to your code a couple of months later and
try to figure what you did to get there.
Connecting to the ipython notebook, even from the environments like Pycharm is highly non-trivial, just as the mapping to the OS
filesystem
Hard to impossible to inspect the contents of the ipython notebook when it’s hosted on Github due to the encoding snafus

There are likely work-arounds for most of these problems, but the issue is that with my standard workflow they are non-issues to start with.

In my experience, Jupyter is pretty good if you rely only on existing libraries that you are piecing together, but once you need to do more involved development work, you are screwed.

How to upgrade MediaWiki – approximate 2018 guide

February 13, 2018February 13, 2018 Andrei KNo Comments

Unfortunately, unlike WordPress, MediaWiki doesn’t come with a single-button update version. Perhaps because of that, perhaps because of my laziness, I have been postponing my updates of Wikimedia websites for over five years by now. However, in the light of recent vulnerability revelations, I have finally decided to upgrade my installations and started trying to figure what exactly I needed, given I only have web interfaces and FTP access to the website I manage.

First of all, this link gives a good overview of the whole process. For my specific case, I was upgrading to the 1.30, which required a number of edits to the config file, explained here. Now, what seemed to be happening was that after backing up my database (done for me by my hosting provider) and files (that I could to by FTP), I just needed to copy the files from the latest release version (REL1_30 in my case – DO NOT DO IT, see edit below) and copy it to the directories via FTP and then just run the database update script at wiki.mywebsite.org/mw-config/. Seems pretty easy, right?

Nope, not so fast! The problem is that this distribution does not contain a crucial directory that you need to run the installation and without which you wiki installation will fail with a 500 code without leaving anything in the error logs of the server.

This step isn’t really mentioned in the installation guide, but you actually need to remove the existing /vendor folder in your installation over FTP, build the latest version for your build with a git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git into a /vendor folder on your machine and then upload the files to your server.

Only after that step can you connect the /mw-config/ and finish upgrading the wiki.

So yeah, let’s hope that in a not-so-distant future MediaWiki would have the same handy ‘update now’ button as the WordPress. Because something is telling me that there are A LOT of outdated MediaWiki installs out there…

Edit:

After spending a couple additional hours dealing with additional issues: do not use the “core” build, but instead download the complete one, including all the skins, extensions and vendor files from here.

Linux server security

March 2, 2017March 2, 2017 Andrei KNo Comments

DISCLAIMER: I AM NOT AN INFOSEC EXPERT. THIS ARTICLE IS MORE OF A MEMO FOR MYSELF. IF YOU LOOSE DATA OR HAVE A BREACH, I BEAR NO RESPONSIBILITY IN IT.

Now, because of all the occasions at which I had to act as a makeshift sysadmin, I did end up reading a number of policies and pick up some advice I wanted to group in a single place, if but for my own memory.

Installation:

Use SE Linux distro
Use an intrusion prevention tool, such as Fail2Ban
Configure primary and secondary DHS
Switch away from the password-protected SSH to a key-based SSH log-in. Diable root login all together (/etc/ssh/sshd_config, PermitRootLogin no). Here is an Ubuntu/OpenSSH guide.
Remove network super-service packages
Disable Telnet and FTP (SFTP should be used)
use chroot where available, notably for webservers and FTP servers
encrypt the filesystem
disable remote root login
disable sudo su – all the root actions need to be done with a sudo

Audit:

Once the server has been build, run Lynsis. It will audit your system and suggest additional steps to protect your machine
Force multi-factor authentification for the roots, especially via SSH. Here is a tutorial from Digital Ocean.

Watching the logs:

Tools like Cacti, Nagios, Splunk, sysstat to help you to monitor the logs for an intrusion
Automated log review with messages send upon admin log-in
Install psacct/acct or similar
Install an integrity checking tool (Tripwire, AIDe, SamHain, OSSEC)

If you have more than one logging system to watch:

Snare
Splunk and Lucene (ElasticSearch) for log search
LogWatch (guide from Digital Ocean), LogStash, Graylog2, Sawmill

Configuring PyCharm for remote development

March 2, 2017March 2, 2017 Andrei KNo Comments

I do most of my programming from my windows laptop and/or desktop computer. However, in order to be able to develop anything sane, I need to operate fully in Linux. I used to have to dualboot or even to have two machines, but now that I have access to a stable server I can safely ssh into, I would rather just use my IDE to develop directly on it. Lucky enough for me, PyCharm has an option for it.

A how-to guide to do this is pretty straightforward, well-explained on the PyCharm blog and docs explaining how to configure a remote server that is not a Vagrant box.

There are three steps in the configuration:

setting up the deployment server and auto-update
setting up the remote interpreter
setting up the run configuration

Setting up the deployment server:

Tools | Deployment | Configuration > configure your sftp server, go ahead and perform the root autodetection (usually the /home/uname) and uncheck the “available only for this project. You will need that last option in order to configure the remote interpreter. Go ahead, go into the mapping, perform the equivalence mappings for the project, but be aware the home from the previous screen, if filled, would be prepended to any path you try to map to on the remote server. So if you want your project to go to /home/uname/PycharmProjects/my_project and your root is /home/uname/, the path you are mapping to needs to be /PycharmProjects/my_projet.

Now, head to the Tools | Deployment click the automatic upload, so that every edit you do on your machine is constantly uploaded to the remote server.

Setting up the remote interpreter:

Head to the File | Settings | Project | Interpreter, click on the cogwheel and click on add remote. At that point by default PyCharm will fill in the properties for the “deployment configuration”. In my case I needed to tweak a bit the python interpreter path, since I use Anaconda Python (scientific computing). If like me you use Anaconda2 and store it in your home directory, you will need to replace the interpreter path by /home/uname/anaconda/bin/python. At that point, just click save and you are good for this part.

Setting up the run configuration:

With the previous two steps finished, when you go into Run | Edit configuration, add the main running script to the Script field, check that the python interpreter is configured to be the remote one and then click on the three small dots next to “path mappings” field and fill it out, at least with the location of the script on your machine mapped to it’s location on the remote.

That’s it, you are good to go!

My Python programming stack for a refactoring job

December 16, 2015February 10, 2017 Andrei KNo Comments

IDE:

Pycharm professional edition:

Marvelous support for refactoring: renaming, moving\
Marvelous support for code testing coverage
Excellent support for styling (PEP8, PEP308)
Marking of where code was edited + side bar allowing a really easy navigation through code
Split edition windows (in case you are editing two very distant locations withing the same file)
Support for all the file formats I ever have to work with – Excellent integration with git/GitHub

Before I push:

flake8 --select=C --max-complexity 10 myProject

As a rule of thumb, everything with McCabe complexity over 15 needs to be refactored
Everything with complexity over 10 needs to be looked into to make sure method is not trying to do too much at once.

clonedigger

Tells the complementary story of the project complexity and redundance
In my experience, more easy to read and understand than pylint version
Allows me to poll together methods that do same things and extract it as a function for better control

Upon push:

Travis-Ci.org + Coveralls.io

Made me think about how my end users will install it
Forces me to write unittests wherever possible.
Unittests + coverage are good: they force me to think about paths that code can be taking and implement proper tests or insert nocoverage statements
Once implemented, they make testing the code much simpler

Quantified Code.com Landscape.io

Checks my code for smells and advice consistent with different PEPs.
Cites sources, so that I don’t have just the verification, but also the spirit behind its introduction
Sends me an e-mail upon each push telling me if I’ve introduced additional smells or errors upon pushing
In the end fairly complementary to the tools I already have when it comes to the static analysis’

In addition to all of the above:

Developer Decision log:

Because I need to have a bird’s eye view of the project, I put all the decisions or decisions to make as I read and/or refactor code there, to make sure I know what is done and what is to do.
Unlike issues, it is a little bit more in-depth explanation of what is going on, not necessarily to be shown in the docs, nor necessarily worth opening/closing an issue.

What is missing

Profiling tools. I would really like to know how the modifications I introduce to code impact performance, but I don’t think I have yet found a way of doing it other than log everything and profile specific functions when I feel they are being really slow.

On the other premature optimization never brought anything good and a proper structuring of the project would make profiling and optimization easier in the long run.

Python: importing, monkey-patching and unittesting

November 23, 2015 Andrei KNo Comments

I am in the phase of refactoring a lot of my code from several years ago for a project relying a lot on module-level constants (like database connection configurations). For me, defining constants in the beginning of the module and then several functions based on them that I will be using later on in the code instead of wrapping all the internals in a class that is dynamically initialized every time one of its methods needs to be used elsewhere just sounds much more Pythonic.

However I have been progressively running into more and more issues with this approach. At first, when I tried to use Sphinx-autodoc to extract the API documentation for my project. Sphinx imports modules one by one in order to extract the docstrings and generate an API documentation from them. Things can get messy when it does it on the development machine, but things get worse when the whole process is started in an environment that doesn’t have all the necessary third-party software installed, that would allow for instance a proper database connection. In my case I god hurt by the RTFD and had to solve the problem through the use of environment variables.

on_rtd = os.environ.get('READTHEDOCS', None) == 'True'

This, however lead to the pollution of production code with switches that were there just to prevent constants initialization. In addition to that, a couple of months down the road, when I started using Travis-Ci and writing unit-tests, this practice of using modules came back to bite me in my back again. In fact, when I was importing the modules that contained functions that relied on interaction with database, it automatically pulled the module that was responsible for connection with database and attempted to connect it with the database that was not necessarily present in the Travis-Ci boxed environment nor that I would be particularly eager to test while testing a completely function.

In response to that, I can see several possible ways of managing it:

Keep using the environment variables in the production code. Rely on RTFD to supply READTHEDOCS environment variable and set the UNITTEST environment variable when the unittesting framework is getting started. Check for those environment variables each time we are about to perform an IO operation and mock it if they are true.
Instead of environment variables, use an active configs pattern: import configs.py and read/write variables within it from the modules where it gets imported.
Pull together all the active behavior from the modules into class initialization routines and perform initialization in the __init__.py for classes, once again depending of what is going on.
Use the dynamic nature of Python to monkey-patch actual DB connection module before it gets imported in the subsequent code.

Following a question I’ve asked on Stackoverflow, it seems that the last option would be the most recommended, because it does not involve increasing the complexity of the production code, just move elements to the module that implements the unittesting.

I think that what I would really need to use in Python would be a pre-import patch that would replace some functions BEFORE they are imported in a given environment. All in all it leaves an uneasy feeling of the fact that unlike many other parts of Python, the import system isn’t as well thought through as it should be. If I had to propose an “I’d wish” be of the Python import system, these two suggestions would be the biggest ones:

Import context replacement wrappers:

@patch(numpy, mocked_numpy)
import module_that_uses_numpy

Proper relative imports (there should always be only one right way of doing it):

<Myapp.scripts.user_friendly_script.py>

from MyApp.source.utilities.IO_Manager import my_read_fle

[... rest of the module ..]


> cd MyApp/scripts/
> python user_friendly_script.py
   Works properly!

Compare that to the current way things are implemented:

> python -m MyApp.souruser_friendly_script
   Works properly!
> cd MyApp/scripts/
> python user_friendly_script.py
   Fails...

It seems however that the implementation of the pre-import patching of modules is possible in Python, even if it is not really that easy to implement.

After digging through this blog post, it seems that once modules have been imported once, they are inserted into the `sys.modules` dictionary that buffers them for future imports. In other terms, if I want to do run-time patching, I will need to inject a mock object into that dictionary to override the name that was originally that is used in importing and that leads to the secondary effect of database connection.

Provided that sys.modules modification has a potential to break the Python import machinery, a (relatively) saner injection of Mock module would have been to insert a finder object into sys.meta_path which won’t break the core python import mechanics. This can be achieved by implementing a find_module() class within the importlib.abc.Finder. However, these methods seem to be specific to the Python 3.4 and that we might need to run an alternative import from a path that would instead patch the normal module behavior and mock database connection.

Let’s see if I will manage to pull this one off…

Dependency of a dependency of a dependency

October 29, 2015 Andrei KNo Comments

Or why cool projects often fail to get traction

Today I tried to install a project I have been working for a while on a new machine. It relies heavily on storing and quering data in “social network” manner, and hence not necessarily very well adapted to the relational databases. When I was staring to work on it back in the early 2013, I was still a fairly inexperienced programmer, so I decided to go with a new technology to underlie it neo4j graph database. And since I was coding in Python and fairly familiar with the excellent SQLAlchemy ORM and was looking for something similar to work with graph databases my choice fell on the bulbflow framework by James Thronotn. I complemented it with JPype native binding to python for quick insertion of the data. After the first couple of months of developer’s bliss and everything working as expected and being build as fast as humanely possible, I realized that things were not going to be as fun as I initially expected.

Python 3 was not compatible with JPype library that I was accessing to rapidly insert data into neo4j from Python. In addition to that JPype was quickly dropping out of support and was in general too hard to set up, so I had to drop it down.
Bulbflow framework in reality relied on the Gremlin/Groovy Tinkerpop stack implementation in the neo4j database, was working over a REST interface and had no support for batching. Despite several promises of implementation of batching by it’s creator and maintainer, it never came to life and I found myself involved in a re-implementation that would follow that principles. Unfortunately I had not enough experience with programming to develop a library back then, nor enough time to do it. I had instead to settle for a slow insertion cycle (that was more than compensated for by the gain of time on retrieval)
A year later, neo4j released the 2.0 version and dropped the Gremlin/Groovy stack I relied on to run my code. They had however the generosity of leaving the historical 1.9 maintenance branch going, so provided that I had already poured along the lines of three month full-time into configuration and debugging of my code to work with that architecture, I decided to stick with 1.9 and maintain them
Yesterday (two and a half years after start of development, when I had the time to pour the equivalent of six more month of full-time into the development of that project), I realized that the only version of 1.9 neo4j still available for download to common of mortals that will not know how to use maven to assemble the project from GitHub repository is crashing with a “Perm Gen: java heap out of memory” exception. Realistically, provided that I am one of the few people still using 1.9.9 community edition branch and one of the even fewer people likely to run into this problem, I don’t expect developers will dig through all the details to find the place where the error is occurring and correct it. So at that point, my best bet is to put onto GitHub a 1.9.6 neo4j and link to it from my project, hoping that neo4j developers will show enough understanding to not pull it down

All in all, the experience isn’t that terrible, but one thing is for sure. Next time I will be building a project I would see myself maintain in a year’s time and installing on several machines, I will think twice before using a relatively “new” technology, even if it is promising and offers x10 performance gain. Simply because I won’t know how it will be breaking and changing in the upcoming five years and what kind of efforts it will require for me to maintain the dependencies of my project.

Installing Nvidia CUDA on a Ubuntu 14.04

September 4, 2015September 4, 2015 Andrei KNo Comments

First, we need to properly install and configure the Nvidia drivers. It is pretty well explained how to do it here.
In my case I had to run a console installation from the console because for some reason the GUI have not properly recognized my GeForce GT 610

Then we need to install the CUDA drivers as explained here.

Understanding the M-word

April 19, 2015November 18, 2015 Andrei KNo Comments

After hearing quite a lot about monads from pretty excited folks in Haskel and javascript I’ve decided to figure out on my own what the M-word meant and everyone was so excited about it.

After reading the “You could have invented Monads” and the “Translation from Haskell to Javascript of selected portions of the best introductions to monads I’ve ever read”, it seems that monads can be well represented by a type-modifying wrapper combined with an associated composition rule that would make the compositions that were working on the non-wrapped functions work on the wrapped functions.

One of the reasons why monads might be really exciting for the people is because they are a set of mathematical objects whose implementation in code allows a very powerful control over the side-effects in the functional style programming. A second would be the fact that they avoid a vast amount of the template, boilerplate code that is there to manage the piping of the program and not its business logic. As an instance, a monad would allows a simple modification of the function so that it can manage an std.err pipe in addition to the std.out pipe, without having to manually edit every single functions that might require such a re-piping.

If you ever had to maintain any larger codebase, you are probably aware by now that your worst enemies are side-effects. You can wrap them into arguments of functions you pass them to, but that is long and tedious. You can also try to break them down into classes or namescopes, but if you are trying to do something sufficiently complex, at some point you will end up either with a superclass or an arbitrarily shattered module.

As a Pythonista, I am able to offset the complexity problems a little bit with use of wrappers, iterators and a bit of functional programming capacities in Python, and if monads are really as good as they are claimed to be in controlling the computational complexity, I am all for having more of them!

However, there are a couple of StackExchange questions and blog posts make me wonder if there is not more to it and if I just don’t have the background to understand it yet.

Update on 18 Nov 2015:

After reading more in dept about the monads in Haskell and monad transformers, it seems that monads are a way of beefing-up types by defining a modification that shifts normal types into monadic type space to add additional properties to them, making the logic more clear. All in all it seems to be a pretty neat pattern for a well-controlled type alteration (quite in the spirit of LISP macros, but with more rigidity). Monad transformes seem to be taking the combination power even further by allowing the type alterations to be mixed together, but it still seems that there is no unique and clear definition of what they are and how they work, at least outside Haskell.

Increasing information density

Evolving ideas

Category: Programming