Slurm Workload Manager
Slurm is an open-source workload manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Slurm's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes (see Caos NSA and Perceus: All-in-one Cluster Software Stack by Jeffrey B. Layton) and is used by Intel on their 48-core "cluster on a chip". More complex configurations can satisfy the job scheduling needs of world-class computer centers and rely upon a MySQL database for archiving accounting records, managing resource limits by user or bank account, or supporting sophisticated job prioritization algorithms.
While other workload managers do exist, Slurm is unique in several respects:
- Scalability: It is designed to operate in a heterogeneous cluster with up to tens of millions of processors.
- Performance: It can accept 1,000 job submissions per second and fully execute 500 simple jobs per second (depending upon hardware and system configuration).
- Free and Open Source: Its source code is freely available under the GNU General Public License.
- Portability: Written in C with a GNU autoconf configuration engine. While initially written for Linux, other UNIX-like operating systems have proven easy porting targets.
- Power Management: Job can specify their desired CPU frequency and power use by job is recorded. Idle resources can be powered down until needed.
- Fault Tolerant: It is highly tolerant of system failures, including failure of the node executing its control functions.
- Flexibility: A plugin mechanism exists to support various interconnects, authentication mechanisms, schedulers, etc. These plugins are documented and simple enough for the motivated end user to understand the source and add functionality.
- Resizable Jobs: Jobs can grow and shrink on demand. Job submissions can specify size and time limit ranges.
- Status Jobs: Status running jobs at the level of individual tasks to help identify load imbalances and other anomalies.
Slurm provides workload management on many of the most powerful computers in the world including:
- Sequoia, an IBM BlueGene/Q system at Lawrence Livermore National Laboratory with 1.6 petabytes of memory, 96 racks, 98,304 compute nodes, and 1.6 million cores, with a peak performance of over 20 Petaflops.
- Stampede at the Texas Advanced Computing Center/University of Texas is a Dell with over 80,000 Intel Xeon cores, Intel Phi co-processors, plus 128 NVIDIA GPUs delivering 2.66 Petaflops.
- Tianhe-1A designed by The National University of Defense Technology (NUDT) in China with 14,336 Intel CPUs and 7,168 NVDIA Tesla M2050 GPUs, with a peak performance of 2.507 Petaflops.
- TGCC Curie, owned by GENCI and operated in the TGCC by CEA, Curie is offering 3 different fractions of x86-64 computing resources for addressing a wide range of scientific challenges and offering an aggregate peak performance of 2 PetaFlops.
- Tera 100 at CEA with 140,000 Intel Xeon 7500 processing cores, 300TB of central memory and a theoretical computing power of 1.25 Petaflops.
- Lomonosov, a T-Platforms system at Moscow State University Research Computing Center with 52,168 Intel Xeon processing cores and 8,840 NVIDIA GPUs.
- LOEWE-CSC, a combined CPU-GPU Linux cluster at The Center for Scientific Computing (CSC) of the Goethe University Frankfurt, Germany, with 20,928 AMD Magny-Cours CPU cores (176 Teraflops peak performance) plus 778 ATI Radeon 5870 GPUs (2.1 Petaflops peak performance single precision and 599 Teraflops double precision) and QDR Infiniband interconnect.
- Rosa, a Cray XT5 at the Swiss National Supercomputer Centre named after Monte Rosa in the Swiss-Italian Alps, elevation 4,634m. 3,688 AMD hexa-core Opteron @ 2.4 GHz, 28.8 TB DDR2 RAM, 290 TB Disk, 9.6 GB/s interconnect bandwidth (Seastar).
Last modified 7 December 2012