Prolog and Epilog Guide

SLURM supports a multitude of prolog and epilog programs. The first table below identifies what prologs and epilogs are available for job allocations, when and where they run.

Parameter

Location

Invoked by

User

When executed

Prolog (from slurm.conf)

Compute or front end node

slurmd daemon

SlurmdUser (normally user root)

First job or job step initaion on that node

PrologSlurmctld (from slurm.conf)

Head node (where slurmctld daemon runs)

slurmctld daemon

SlurmctldUser

At job allocation

Epilog (from slurm.conf)

Compute or front end node

slurmd daemon

SlurmdUser (normally user root)

At job termination

EpilogSlurmctld (from slurm.conf)

Head node (where slurmctld daemon runs)

slurmctld daemon

SlurmctldUser

At job termination

This second table below identifies what prologs and epilogs are available for job step allocations, when and where they run.

Parameter

Location

Invoked by

User

When executed

SrunProlog (from slurm.conf) or srun --prolog

srun invocation node

srun command

User invoking srun command

Prior to launching job step

TaskProlog (from slurm.conf)

Compute node

slurmstepd daemon

User invoking srun command

Prior to launching job step

srun --task-prolog

Compute node

slurmstepd daemon

User invoking srun command

Prior to launching job step

TaskEpilog (from slurm.conf)

Compute node

slurmstepd daemon

User invoking srun command

Completion job step

srun --task-epilog

Compute node

slurmstepd daemon

User invoking srun command

Completion job step

SrunEpilog (from slurm.conf) or srun --epilog

srun invocation node

srun command

User invoking srun command

Completion job step

Plugins functions are may also be useful to execute logic at various well defined points.

SPANK is another mechanism that may be useful to invoke logic in the user commands, slurmd daemon, and slurmstepd daemon.

Failure Handling

If the Epilog fails (returns a non-zero exit code), this will result in the node being set to a DOWN state. If the EpilogSlurmctld fails (returns a non-zero exit code), this will only be logged. If the Prolog fails (returns a non-zero exit code), this will result in the node being set to a DOWN state and the job requeued to executed on another node. If the PrologSlurmctld fails (returns a non-zero exit code), this will result in the job requeued to executed on another node if possible. Only batch jobs can be requeued. Interactive jobs (salloc and srun) will be cancelled if the PrologSlurmctld fails.


Based upon work by Jason Sollom, Cray Inc. and used by permission.

Last modified 26 November 2012