Memory bandwith and latency measurements KU Leuven Tier-2#
Memory bandwidth and latencies for main memory, as well as latencies for L2
and LLC cache. Measurements have been performed using Intel’s Memory Latency
Checker (mlc
).
icelake nodes Tier-2#
Intel(R) Memory Latency Checker - v3.10 *** Unable to modify prefetchers (try executing 'modprobe msr') *** So, enabling random access for latency measurements Measuring idle latencies for random access (in ns)... Numa node Numa node 0 1 0 89.2 142.7 1 142.9 89.0 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 351762.1 3:1 Reads-Writes : 313084.7 2:1 Reads-Writes : 297154.4 1:1 Reads-Writes : 291841.4 Stream-triad like: 320198.4 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 1 0 175690.9 55676.8 1 55692.3 175912.6 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 264.04 351871.6 00002 265.59 351747.1 00008 259.87 351724.8 00015 243.81 350234.2 00050 184.82 327723.9 00100 153.80 297201.4 00200 109.56 173692.5 00300 102.11 123303.1 00400 99.73 95482.2 00500 98.46 77827.2 00700 100.06 56746.4 01000 95.63 41144.3 01300 95.68 32297.0 01700 94.76 25179.4 02500 94.09 17571.8 03500 94.28 12837.9 05000 93.89 9247.0 09000 92.56 5483.5 20000 91.23 2867.0 Measuring cache-to-cache transfer latency (in ns)... Local Socket L2->L2 HIT latency 60.1 Local Socket L2->L2 HITM latency 64.2 Remote Socket L2->L2 HITM latency (data address homed in writer socket) Reader Numa Node Writer Numa Node 0 1 0 - 127.1 1 127.2 - Remote Socket L2->L2 HITM latency (data address homed in reader socket) Reader Numa Node Writer Numa Node 0 1 0 - 127.0 1 127.4 -
cascadelake nodes Tier-2#
Intel(R) Memory Latency Checker - v3.10 *** Unable to modify prefetchers (try executing 'modprobe msr') *** So, enabling random access for latency measurements Measuring idle latencies for random access (in ns)... Numa node Numa node 0 1 0 80.8 140.4 1 141.6 81.5 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 225630.8 3:1 Reads-Writes : 208349.9 2:1 Reads-Writes : 206510.3 1:1 Reads-Writes : 200031.8 Stream-triad like: 187482.4 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 1 0 113073.7 34305.5 1 34322.2 112810.7 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 155.60 226127.7 00002 156.23 226129.4 00008 155.32 226082.8 00015 155.90 226627.2 00050 157.25 221318.3 00100 119.65 181499.4 00200 103.11 120242.4 00300 97.46 87252.6 00400 94.40 67737.9 00500 94.95 55202.5 00700 90.12 40541.6 01000 88.52 29035.8 01300 87.39 22709.1 01700 88.36 17635.5 02500 85.50 12323.4 03500 84.00 9059.1 05000 83.91 6582.9 09000 86.54 3975.8 20000 82.21 2240.7 Measuring cache-to-cache transfer latency (in ns)... Local Socket L2->L2 HIT latency 48.8 Local Socket L2->L2 HITM latency 49.3 Remote Socket L2->L2 HITM latency (data address homed in writer socket) Reader Numa Node Writer Numa Node 0 1 0 - 114.1 1 113.2 - Remote Socket L2->L2 HITM latency (data address homed in reader socket) Reader Numa Node Writer Numa Node 0 1 0 - 114.0 1 113.5 -
skylake nodes Tier-2#
Intel(R) Memory Latency Checker - v3.10 *** Unable to modify prefetchers (try executing 'modprobe msr') *** So, enabling random access for latency measurements Measuring idle latencies for random access (in ns)... Numa node Numa node 0 1 0 90.2 144.3 1 143.2 89.1 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 211194.2 3:1 Reads-Writes : 193422.0 2:1 Reads-Writes : 192554.0 1:1 Reads-Writes : 185556.1 Stream-triad like: 172770.4 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 1 0 104362.2 34290.6 1 34283.3 106028.4 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 166.18 210643.1 00002 177.85 208001.7 00008 166.21 206740.7 00015 167.47 206801.3 00050 163.31 201772.4 00100 128.73 164062.6 00200 116.34 110208.0 00300 105.64 79347.3 00400 103.73 61575.2 00500 101.35 50291.8 00700 102.52 36809.1 01000 96.23 26420.4 01300 95.39 20649.2 01700 94.13 16054.2 02500 93.31 11207.0 03500 95.06 8212.5 05000 92.51 5982.5 09000 92.46 3634.3 20000 91.66 2027.6 Measuring cache-to-cache transfer latency (in ns)... Local Socket L2->L2 HIT latency 50.5 Local Socket L2->L2 HITM latency 49.9 Remote Socket L2->L2 HITM latency (data address homed in writer socket) Reader Numa Node Writer Numa Node 0 1 0 - 114.0 1 114.6 - Remote Socket L2->L2 HITM latency (data address homed in reader socket) Reader Numa Node Writer Numa Node 0 1 0 - 114.0 1 123.2 -