What are standard terms used in HPC?#

HPC cluster

A relatively tightly coupled collection of compute nodes, the interconnect typically allows for high bandwidth, low latency communication. Access to the cluster is provided through a login node. A resource manager and scheduler provide the logic to schedule jobs efficiently on the cluster. A detailed description of the VSC clusters and other hardware is available.

Compute node

An individual computer, part of an HPC cluster. Currently most compute nodes have two sockets, each with a single CPU, volatile working memory (RAM), a hard drive, typically small, and only used to store temporary files, and a network card. The hardware specifications for the various VSC compute nodes is available.

CPU

Central Processing Unit, the chip that performs the actual computation in a compute node. A modern CPU is composed of numerous cores, typically 10 to 36. It has also several cache levels that help in data reuse.

Core

Part of a modern CPU, a core is capable of running processes, and has its own processing logic and floating point unit. Each core has its own level 1 and level 2 cache for data and instructions. Cores share last level cache.

Cache

A relatively small amount of (very) fast volatile memory (when compared to regular RAM), on the CPU chip. A modern CPU has three cache level, L1 and L2 are specific to each core, while L3 (also referred to as Last Level Cache, LLC) is shared among all the cores of a CPU.

RAM

Random Access Memory used as working memory for the CPUs. On current hardware, the size of RAM is expressed in gigabytes (GB). The RAM is shared between the two CPUs on each of the sockets. This is volatile memory in the sense that once the process that creates the data ends, the data in the RAM is no longer available. The complete RAM can be accessed by each core.

Walltime

The actual time an application runs (as in clock on the wall), or is expected to run. When submitting a job, the walltime refers to the maximum time the application can run, i.e., the requested walltime. For accounting purposes, the walltime is the time the application actually ran, typically less than the requested walltime.

Node-hour

Unit of work indicating that an application ran for a time t on n nodes, such that n*t = 1 hour. Using 1 node for 1 hour is 1 node-hour. This is irrespective of the number of cores on the node you actually use.

Node-day

Unit of work indicating that an application ran for a time t on n nodes such that n*t = 24 hours. Using 3 nodes for 8 hours results in 1 node day.

Core-hour

Unit of work indicating that an application ran for a time t on p cores, such that p*t = 1 hour. Using 20 cores, no matter on how many nodes, for 1 hour results in 20 core-hours.

Memory requirement

The amount of RAM required to successfully run an application. It can be specified per process for a distributed application, expressed in GB.

Storage requirement

The amount of disk space required to store the input and output of an application, expressed in GB or TB.

Temporary storage requirement

The amount of disk space needed to store temporary files during the run of an application, expressed in GB or TB.

Single user per node policy

Indicates that when a process of user A runs on a compute node, no process of another user will run on that compute node concurrently, i.e., the compute node will be exclusive to user A. However, if one or more processes of user A are running on a compute node, and that node’s capacity in terms of available cores and memory is not exceeded, processes that are part of another job submitted by user A may start on that compute node.

Single job per node policy

Indicates that when a process of a job is running on a compute node, no other job will concurrently run on that node, regardless of the resource that still remain available.

Serial application

A program that runs a single process, with a single thread. All computations are done sequentially, i.e., one after the other, no explicit parallelism is used.

Shared memory application

An application that uses multiple threads for its computations, ideally concurrently executed on multiple cores, one per thread. Each thread has access to the application’s global memory space (hence the name), and has some thread-private memory. A shared memory application runs on a single compute node. Such an application is also referred to as a multi-core or a multi-threaded application.

Threads

A process can concurrently perform multiple computations, i.e., program flows. In scientific applications, threads typically process their own subset of data, or a subset of loop iterations.

OpenMP

A standard for shared memory programming using C/C++/Fortran that makes abstraction of explicit threads. OpenMP is widely used for scientific programming.

Distributed application

An application that uses multiple processes. The application’s processes can run on multiple compute nodes. These processes communicate by exchanging messages, typically implemented by calls to an MPI library. Messages can be used to exchange data and coordinate the execution.

Process

An independent computation running on a computer. It may interact with other processes, and it may run multiple threads. A serial and shared memory application run as a single process, while a distributed application consists of multiple, coordinated processes.

MPI

Message Passing Interface, a de-facto standard that defines functions for inter-process communication. Many implementations in the form of libraries exist for C/C++/Fortran, some vendor specific.

GPU

A Graphical Processing Unit is a hardware component specifically designed to perform graphics related tasks efficiently. GPUs have been pressed into service for scientific computing. A compute node can be equipped with multiple GPUs. Software has to be designed specifically to use GPUs, and for scientific computing, CUDA and OpenACC are the most popular programming paradigms.

GPGPU

General Purpose computing on Graphical Processing Units refers to using graphic accelerators for non-graphics related tasks such as scientific computing.

CUDA

Compute Unified Device Architecture, an extension to the C programming language to develop software that can use GPU for computations. CUDA application run exclusively on NVIDIA hardware.

OpenACC

Open ACCelerators is a standard for developing C/C++/Fortran applications that can use GPUs for general purpose computing. OpenACC is mainly targeted to scientific computing.