wICE hardware#

wICE is KU Leuven/UHasselt’s latest Tier-2 cluster. It has thin nodes, large memory nodes, interactive nodes and GPU nodes. This cluster is in production since February 2023.

Hardware details#

  • 172 thin nodes

    • 2 Intel Xeon Platinum 8360Y CPUs@2.4 GHz (Ice lake), 36 cores each
      (1 NUMA domain and 1 L3 cache per CPU)

    • 256 GiB RAM

    • 960 GB SSD local disk

    • partitions batch/batch_long, submit options

  • 5 big memory nodes

    • 2 Intel Xeon Platinum 8360Y CPUs@2.4 GHz (Ice lake), 36 cores each
      (2 NUMA domains and 1 L3 cache per CPU)

    • 2048 GiB RAM

    • 960 GB SSD local disk

    • partition bigmem, submit options

  • 4 GPU nodes, 16 GPU devices

    • 2 Intel Xeon Platinum 8360Y CPUs@2.4 GHz (Ice lake), 36 cores each
      (2 NUMA domains and 1 L3 cache per CPU)

    • 512 GiB RAM

    • 4 NVIDIA A100 SXM4, 80 GiB GDDR, connected with NVLink

    • 960 GB SSD local disk

    • partition gpu, submit options

  • 5 interactive nodes

    • 2 Intel Xeon Gold 8358 CPUs@2.6 GHz (Ice lake), 32 cores each
      (2 NUMA domains and 1 L3 cache per CPU)

    • 512 GiB RAM

    • 1 NVIDIA A100, 80 GiB GDDR

    • 960 GB SSD local disk

    • partition interactive, submit options

The nodes are connected using an Infiniband HDR-100 network, the islands are indicated on the diagram below.

wICE hardware diagram

Hardware details (extension)#

We are currently installing and testing additional hardware for wICE, which will be made accessible during the spring of 2024:

  • 68 thin nodes

    • 2 Intel Xeon Platinum 8468 CPUs (Sapphire Rapids), 48 cores each
      (4 NUMA domains and 1 L3 cache per CPU)
      The base and max CPU frequencies are 2.1 GHz and 3.8 GHz, respectively.

    • 256 GiB RAM

    • 960 GB SSD local disk

    • partitions batch_sapphirerapids/batch_sapphirerapids_long

  • 4 GPU nodes, 16 GPU devices

    • 2 AMD EPYC 9334 CPUs (Genoa), 32 cores each
      (4 NUMA domains and 4 L3 caches per CPU)
      The base and max CPU frequencies are 2.7 GHz and 3.9 GHz, respectively.

    • 768 GiB RAM

    • 4 NVIDIA H100 SXM4, 80 GiB HBM3, connected with NVLink

    • 960 GB SSD local disk

    • partition gpu_h100

  • 1 huge memory node

    • 2 Intel Xeon Platinum 8360Y CPUs (Ice lake), 36 cores each
      (1 NUMA domain and 1 L3 cache per CPU)
      The base and max CPU frequencies are 2.4 GHz and 3.5 GHz, respectively.

    • 8 TiB RAM

    • 960 GB SSD local disk

    • partition hugemem

Only the thin nodes are interconnected with Infiniband HDR-100, with all nodes inside the same network island. The GPU nodes can only communicate over ethernet (no high-performance interconnect). All nodes are however connected to the Lustre parallel file system through an Infiniband HDR-100 network.

The thin nodes and GPU nodes are furthermore the first ones in the data center to be direct liquid cooled.