Python package management#
Introduction#
Python comes with an extensive standard library, and you are strongly encouraged to use those packages as much as possible, since this will ensure that your code can be run on any platform that supports Python.
However, many useful extensions to and libraries for Python come in the form of packages that have to be installed separately. There are a couple of different supported approaches to using and installing Python packages on the VSC clusters.
Since many Python packages have been made available through the module system, using Python packages from modules is usually the best starting point. Given the astounding number of packages, it is however not sustainable to install each and every one system wide, so if you need a somewhat exotic package and/or a specific version of a package that is not available via the module system, you will need to install it yourself.
The recommended approach in that case is to manage a virtual environment with pip or alternatively to make use of the uv package manager. In order to automate support for different (micro)architectures and facilitate building environments on top of modules, the section introducing vsc-venv is worth reading. Finally, you can also consider Conda-based environment managers.
Python packages from modules#
Before attempting to install a Python package yourself, it is worth checking
if an appropriate version of the package is already installed on the cluster.
For numpy (a package commonly used in scientific computing), the search
query and example output look as follows:
$ module spider numpy
----------------------
numpy:
----------------------
Versions:
numpy/1.25.1 (E)
numpy/1.26.4 (E)
numpy/2.3.1 (E)
...
For detailed information about a specific "numpy" package (including how
to load the modules) use the modules full name. Note that names that
have a trailing (E) are extensions provided by other modules. For
example:
$ module spider numpy/2.3.1
The trailing (E) after the listed versions indicates that there is no
module named numpy, but instead it is provided as an
extension of
another module, which is quite common in the case of Python packages. To get
information on which module should be loaded in order to make a specific
version of numpy available, you can use:
$ module spider numpy/2.3.1
------------------------
numpy: numpy/2.3.1 (E)
------------------------
This extension is provided by the following modules. To access the
extension you must load one of the following modules. Note that any
module names in parentheses show the module location in the software
hierarchy.
SciPy-bundle/2025.06-gfbf-2025a
This output tells us that the module SciPy-bundle/2025.06-gfbf-2025a has
to be loaded in order to make numpy version 2.3.1 available. The
SciPy-bundle provides many packages needed for scientific computing.
To check which Python packages are currently available, you can execute:
python3 -m pip list -v
It will list all packages that are installed for the Python distribution you are using. This can include Python packages from loaded modules as well as Python packages you installed yourself.
Managing Python virtual environments with pip#
In case the Python package you need is not available from a module, you can either request a central installation or install it yourself. This section shows how you can install packages yourself in a Python virtual environment or venv. A Python virtual environment is an isolated environment in which you can safely install Python packages, independent from those installed in the system or in other virtual environments. For example, using virtual environments is very convenient for Python developers as it allows working on multiple software projects at the same time.
It is recommended to use centrally installed modules already as much as possible. They provide a robust and performant base to build your virtual environments.
In this section, we show how you can combine modules with virtual environments in the HPC to get the best of two worlds.
Warning
Since Python is an interpreted language, pure Python code is independent of
the CPU microarchitecture, i.e., Python
scripts are identical on each machine where they are installed. There is
however a very important caveat: many scientific Python packages do their
heavy computational lifting inside libraries written in a lower-level
language (typically C, C++ or Fortran) and those libraries will typically
target one or more specific (micro)architectures. As a consequence, installed Python
packages can be different depending on the hardware on which they are
installed. To make sure your installation works and gives good performance,
we recommend to create a virtual environment on a node with the same
(micro)architecture as the nodes where the virtual environment will be used. This
is especially important for heterogeneous clusters, where (micro)architecture may
differ across login nodes and cluster partitions. To get the (micro)architecture
of the current node, you can use the $VSC_ARCH_LOCAL environment
variable. If you need to use an environment on multiple (micro)architectures,
create a separate one for each. The vsc-venv utility can help with this.
Start by launching an interactive job (click here when working on a Slurm cluster and here when working on a Torque cluster for more information) in the cluster partition of choice, for example in the zen4 partition. Make sure to apply the options appropriate for the specific cluster you are working on.
srun --partition=zen4 --pty bash -l
Load a Python module as base of the virtual environment. Choose a Python version that is suitable for the additional Python packages that will be installed in the virtual environment:
module load Python/3.11.3-GCCcore-12.3.0
Optional Load other modules with additional Python packages:
module load SciPy-bundle/2023.07-gfbf-2023a
The Python software modules in the HPC include a very limited list of Python packages, but many other modules are also available. A common software module is
SciPy-bundle, a bundle of data science packages such asnumpy,pandas, andscipy. Also thePython-bundle-PyPImodules provides a number of popular Python packages. Use the method discussed in Python packages from modules to search for packages you need as dependencies.Create your virtual environment.
Warning
Avoid creating your virtual environments in your home directory. The storage space of your home is very small and can quickly become filled up with installation files. Use a folder in your personal
$VSC_SCRATCHor$VSC_DATAstorage; or in your VO if you are part of one. Also take into account that a virtual environment typically contains many small files and parallel file systems are not suited to handle this well. If the large number of files becomes problematic, it is worthwhile to look into containerizing your virtual environment.This example code block creates a new virtual environment in the venv-zen4 directory. The -zen4 suffix in the name is used to indicate the (micro)architecture on and for which (micro)architecture this environment was created.
python3 -m venv venv-${VSC_ARCH_LOCAL} --system-site-packages
The option
--system-site-packagesensures using the Python packages already available via the loaded modules instead of installing them in the virtual environment.Before we can use the virtual environment, we must activate it:
source venv-zen4/bin/activate
Once the virtual environment is active, its name will be displayed in front of the shell prompt (
(venv-zen4)in this example). Make sure to keep this virtual environment activated when executing the following steps.We recommend to always upgrade
pipto the latest version:(venv-zen4) $ pip install pip --upgrade
Now we can install additional Python packages inside the active virtual environment, for example the icecream package:
(venv-zen4) $ pip install icecream --no-cache-dir --no-build-isolation
The option
--no-cache-dirensures installing the most recent compatible versions of the dependencies, ignoring the versions available in your cache.The option
--no-build-isolationensures using the Cython compiler and other (build) dependencies from loaded modules instead of building in an isolated environment.It is possible to specify a specific version of package, for instance to install tblite version 0.4.0 the command becomes:
(venv-zen4) $ pip install tblite==0.4.0 --no-cache-dir --no-build-isolation
Once your work is finished, use the command
deactivateto deactivate your virtual environment:(venv-zen4) $ deactivate
Reactivating your virtual environment#
Whenever you want to go back to any of your virtual environments make sure to:
Launch an interactive job in the same partition you used when creating the virtual environment (or add the sbatch
--partitionoption to your job script):srun --partition=zen4 --pty bash -l
Load the same software modules that you used in the creation of the virtual environment:
module load Python/3.11.3-GCCcore-12.3.0 SciPy-bundle/2023.07-gfbf-2023a
Reactivate the virtual environment:
source venv-zen4/bin/activate
Recreating an environment#
It is common that you want to recreate an environment, for instance to install it for a different (micro)architecture or to allow other people to have exactly the same package versions. This can be easily achieved by making use of pip list, which produces a list of currently available packages and their version:
$ source venv-zen4/bin/activate
(venv-zen4) $ pip list --format=freeze
...
icecream==2.1.8
...
numpy==1.25.1
...
By saving the listed packages to a file:
(venv-zen4) $ pip list --format=freeze > requirements.txt
it becomes easy to recreate the environment, in the following example for another (micro)architecture:
$ srun --partition=zen5 --pty bash -l
$ module load Python/3.11.3-GCCcore-12.3.0 SciPy-bundle/2023.07-gfbf-2023a
$ python3 -m venv venv-zen5 --system-site-packages
$ source venv-zen5/bin/activate
(venv-zen5) $ pip install -r requirements.txt
The vsc-venv utility#
As discussed earlier, it is recommended to create separate Python virtual
environments for separate architectures. Together with the fact that exactly
the same modules that were used during the environment creation need to be
loaded whenever the environment is used, managing Python virtual environments
on VSC clusters can become tedious. This is where the vsc-venv script
comes to the rescue: it encapsulates the creation and management of Python
virtual environments and avoids multiple issues with the default venv
included in Python. The vsc-venv command manages multiple virtual
environments for different clusters in a transparent way, while guaranteeing
the same module environment.
Usage#
A virtual environment can be activated by running the following command:
$ module load vsc-venv
$ source vsc-venv --activate --requirements requirements.txt [--modules modules.txt]
Here, requirements.txt is the path to a file containing the Python
dependencies to install in the virtual environment. For more information on
the requirements.txt file, see the pip documentation
The optional --modules option can be used to provide a modules.txt file
that lists the modules to load before activating the virtual environment.
Automatically, the modules are loaded and the environment is activated. When running this command for the first time, the dependencies from the requirements file are installed.
Now, the software can be run and Python packages installed in the virtual environment can be used, along with software provided via centrally installed modules.
You can get insights on the current environment using the following commands:
$ python --version # Python version
$ pip list # List of installed Python packages
$ module list # List of loaded modules
To deactivate the virtual environment, run:
$ source vsc-venv --deactivate
Example#
For this example, it is assumed the following files are present in the current directory:
modules.txt:
SciPy-bundle/2023.11-gfbf-2023b
Pillow/10.2.0-GCCcore-13.2.0
and requirements.txt:
beautifulsoup4==4.12.3
We run the following commands create and activate the environment:
$ module load vsc-venv
$ source vsc-venv --activate --requirements requirements.txt --modules modules.txt
As this creates the virtual environment for the first time, a venvs
subdirectory is created in the current directory. Within venvs/, an
additional subdirectory is created for the virtual environment:
for example venv-RHEL8-zen2 (note that the name will depend on the type of node
you are working on, it is automatically determined based on environment
variables like $VSC_ARCH_LOCAL and $VSC_OS_LOCAL).
Now, Python 3.12 is loaded and the numpy (provided by the SciPy-bundle
module), PIL (provided by the Pillow module), and bs4 Python
packages can be used.
To deactivate the virtual environment, run:
$ source vsc-venv --deactivate
If we want to create a virtual environment for another (micro)architecture, simply
repeat the steps above on a node of that (micro)architecture. After this, the
venvs directory will contain an additional subdirectory with the virtual
environment for the new (micro)architecture.
uv as Python package manager#
The discussion thus far made use of pip as the package manager, but there other options available. This section will in particular show how uv (an extremely fast Python package and project manager written in Rust) can be used. Some advantages of uv over pip are that it can resolve dependencies much faster and that it can provide any desired Python version (whereas pip relies on existing Python installations).
The first step is getting uv itself, which usually can be done conveniently on the cluster by loading a module, for instance:
$ module load uv/0.4.20-GCCcore-13.3.0
uv provides a so-called pip interface, which makes it very easy to use uv if you are familiar with pip. For instance to create a virtual environment with uv similar to the one discussed in a previous section, you can use the following commands:
$ uv venv venv-zen4 --python 3.11
$ source venv-zen4/bin/activate
$ uv pip install icecream --no-cache-dir --no-build-isolation
$ deactivate
Most of the remarks and warnings from before apply to uv as well. For more specific uv information, please consult the uv documentation pages