In HPC we talk about Slots, Cores and CPUs: what does it all mean?

Dr Rosemary Francis
5 min readAug 15, 2024

--

In High Performance Computing it is common to find the terms slots, cores and CPUs used interchangeably, but they are not quite the same so I’m going to explore the differences now.

Most readers of this post probably have a reasonable handle on CPUs vs cores. In the past a CPU was a single core. Then as CPUs got larger, designers started to duplicate various components so that instructions could be executed in parallel. Later came hyper threading where multiple threads couple be executed in parallel across many duplicated components, also called simultaneous multi threading or SMT. Finally, we got multi-core architectures, which became common in the mid 2000s. These days CPUs with 48 cores are increasingly common. We even have exotic architectures that presents multiple virtual cores for each physical core on a device so that the machine looks like it has more cores than it really has, but there is hardware support for virtualization and the associated isolation.

So, the hardware-centric view is that a CPU is short for “CPU socket” which has multiple cores per socket and multiple SMT threads per core. The largest servers have multiple sockets so may have multiple CPUs, each with many cores. The definition of CPU that is more Linux centric and OS centric is the idea that “CPU” is short for “logical CPU”, capable of running one SMT thread. AWS refers to logical CPUS as vCPUs, as does Azure. Microsoft specifically gives the definition of a vCPU as being a core on a physical machine. Thus CPU can mean socket, core, or thread, and core can mean CPU, core, or thread. In HPC for the most part when we talk about CPUs or cores, we really mean logical CPUs, capable of running one thread and presented as a CPU core to the OS.

Where do slots come in? Slots offer a means to control access to resources on a compute node while abstracting away from the physical architecture. In HPC the number of slots available usually correspond to the number of cores on a machine. They are not always the same as we will see below, but it’s a good place to start. So if you have 64 slots available on a compute node then you can run 64 on-core application (jobs) on that node, or you can run 4 16-core jobs, or any other combination that does not exceed 64 cores. Workload managers vary in their ability to control CPU affinity. If a job requests one core and then exceeds that, most workload managers will permit that by default. Many have settings that allow more strict allocation and enforcement of resources.

How do specific workload managers handle slots, cores and CPUs?

One aspect of HPC that is difficult is that different workload managers have different terminology for slots cores and CPUs.

Altair PBS Professional uses the term ncpus to control access to slots. Jobs request ncpus and once the number of ncpus has been exhausted on a compute node then no other jobs will be scheduled on that node. PBS Professional can be configured to define and allocate ncpus as either cores or threads. By default, resources_available.ncpus is set to the number of SMT threads reported by the OS, but can also be configured to set it to the number of physical cores. In addition, PBS can optionally make the SMT threads corresponding to the allocated cores available to the job processes while not being counted in resources_available.ncpus and thus not individually scheduled resources.

Altair Grid Engine instead uses the term slots to control access to slots. As in PBS, the number of slots on a machine usually corresponds to the number of SMT threads available, but could also correspond the number of cores or may be an entirely different value. I will discuss when this could be different later on.

Altair Accelerator is a little different in that it allows independent selection of cores and slots. The number of cores available on the machine always corresponds to the number of cores available. The number of slots usually corresponds to the number of jobs that can run on that machine, which may be the same or less than the number of cores. Although a job may request a number of slots and cores alongside other requests such as memory or licenses, it is common for users to request cores only and for each job to consume one slot. In this way the concept of cores in accelerator is a closer match for the concept of slots in Grid Engine or ncpus in PBS.

Why might you have more slots than cores?

It is common in PBS and Grid Engine to configure more slots than you have cores on a machine when the customer is running a lot of jobs that have very low CPU utilization. These jobs could be GPU-intensive jobs with low CPU involvement, but more commonly they are interactive jobs with very low CPU utilization. It is therefore advantageous for the admin to allow 32 interactive sessions on a machine with 16 cores and they would do that by configuring 32 slots or ncpus.

Why might you have more cores than slots?

For more CPU intensive workloads it is common to configure fewer slots than cores. This is because there is always some system overhead on any given machine and a workload that genuinely consumes all cores on a compute node will be slowed down by system tasks. For a 64 core machine you might configure 60 slots for example to leave some cores available for housekeeping on that machine.

If the workload manager has not been configured to enforce CPU utilization within the requested resources then it can be a good idea to configure fewer slots than cores to allow for some jobs to use more CPU cores than they have requested. Altair Accelerator customers often configure eg 20 slots on a machine with 24 cores so that no more than 20 single-core jobs can run, allowing for some to exceed their core requests.

Summary

So in summary, a compute node in an HPC cluster may have multiple CPU sockets. Each CPU socket may have multiple cores and each core may run multiple threads, presented as vCPUs. The number of slots available on that machine may be more or less than the number of cores or threads or vCPUs. Jobs may request slots and in some cases cores. The physical resources used ideally will correspond to the resources they have requested.

--

--

Dr Rosemary Francis
Dr Rosemary Francis

Written by Dr Rosemary Francis

Computer Scientist. Founder. Entrepreneur. Mum. Fellow of the Royal Academy of Engineering. Member of the Raspberry Pi Foundation.

No responses yet