wICE quick start guide

wICE is the most recent KU Leuven/UHasselt Tier-2 cluster. It can be used for most workloads, and has nodes with a lot of memory, as well as nodes with GPUs.

It does not have separate login nodes and can be accessed from the Genius login nodes

Running jobs on wICE

There are several type of nodes in the wICE cluster: normal compute nodes, GPU nodes, big memory nodes and nodes configured for interactive use. The resources specifications for jobs have to be tuned to use these nodes properly.

In general, the maximum walltime for wICE jobs is 3 days (72 hours). Only jobs submitted to the batch_long partition are allowed to have walltimes up to 7 days (168 hours), as illustrated below.

The wICE cluster uses a different workload manager than Genius: Slurm instead of Torque+Moab. More information about converting pbs scripts and commands into Slurm can be found here

Submit to a compute node

In these examples, the submission will be done from login nodes of the Genius cluster, therefore you need to always remember to specify the cluster you want to use.

To submit to a compute node you need to provide the required number of nodes and cores. For example to request 2 nodes with each 72 cores for 2 hours you can submit like this:

$ sbatch --cluster=wice --nodes=2 --ntasks-per-node=72 --time=2:00:00  -A lp_myproject  myjobscript.slurm

More information about accounting used on wice can be found here.

Submit a long job to a compute node

To submit to a compute node a job longer than 3 days you need to request a separate partition:

$ sbatch --cluster=wice --nodes=2 --ntasks-per-node=72 --time=6-16:00:00 --partition=batch_long -A lp_myproject  myjobscript.slurm

Submit to the interactive partition

The interactive nodes are located in a separate partition. The users are allowed to request a maximum of 8 cores for a maximum walltime of 16 hours. These nodes are intended for interactive use. Instead of submitting a job script, you open an interactive session on a compute node as follows:

$ srun -n 1 -t 01:00:00 -A lp_myproject --partition=interactive --cluster=wice --pty bash -l

If a GPU is necessary for the visualization process, it can be requested (max 1 GPU and max 8 cores for at most 16 hours). The available GPU is a single A100 which has been split in 7 GPU instances (one of which will be allocated to your job). Additionally, X11 forwarding should be enabled:

$ srun -N 1 -t 16:00:00 --ntasks-per-node=8 --gpus-per-node=1 -A lp_myproject -p interactive --cluster=wice --x11 --pty bash -l

Note

The interactive partition is intended for relatively lightweight interactive work, such as compiling software, running small preprocessing scripts, small-scale debugging, or visualization. This is the reason why the amount of resources you can get in a job is limited on the interactive partition. In case you must do heavy computational work in an interactive way, it is also possible to submit interactive jobs to the other partitions. For instance suppose you need to debug a program using more than 8 cores. In that case you can use the command above to run an interactive job, changing the partition to batch, gpu, or bigmem and adapting the resources as needed. Do note that in general it is recommended to run heavy computational work in a script which you run as a batch job (so without opening an interactive terminal on the compute node).

Submit to a big memory node

The big memory nodes (2048GB of RAM) are also located in a separate partition. In case of the big memory nodes it is also important to add your memory requirements (the maximum of memory per core that can be requested is 28000MB/core), for example:

$ sbatch --cluster=wice --time=01:00:00 --nodes=2 --ntasks-per-node=72 --partition=bigmem --mem-per-cpu=28000M --account=lp_myproject myjobscript.slurm

Submit to a GPU node

The GPU nodes are also located in a separate cluster partition so you will need to explicitly specify it when submitting your job. We also configured the GPU nodes as shared resources, meaning that different users can simultaneously use a portion of the same node. However every user will have exclusive access to the number of GPUs requested. If you want to use only 1 GPU of type A100 you can submit for example like this:

$ sbatch --cluster=wice -A lp_myproject -N 1 --ntasks=18 --gpus-per-node=1 --partition=gpu myjobscript.slurm

Note that in case of 1 GPU you have to request 18 cores. In case you need more GPUs you have to multiply the 18 cores with the number of GPUs requested, so in case of for example 3 GPUs you will have to specify this:

$ sbatch --cluster=wice -A lp_myproject -N 1 --ntasks=54 --gpus-per-node=3 --partition=gpu myjobscript.slurm