Report CPU/Mem usage in slurm job standard output

In O2 the SLURM scheduler will not include a report of CPU and Memory usage in the standard output file or email once the job is completed. That information is available after a job completes by querying SLURM database with the command sacct, examples of how to use sacct command are available here.

It is also possible to print information about a job in its standard output file which is created by the scheduler for each job. The two paragraphs below contains examples on how to include a detailed report on resources used by the job both for jobs submitted using the --wrap="" option or via slurm sbatch script.

A quick example with --wrap option

(Note: '-n 1' is needed for srun command. It make sure only run one copy of the command)

In the example below a job is submitted using the sbatch command and passing the commands to be executed directly with the --wrap="" option. In this example the command to be executed is hostname which will simply return the name of the compute node. You can replace hostname with the command your job is running. Note that the hostname command is preceded by srun -n 1, this instruction tells the scheduler to run the command as a first and separate step and collect resource usage data when completed.

The second command passed to the --wrap="" flag is sleep 5s this sleep time is a time buffer to make sure the scheduler has time to transfer the usage data recorded locally on the compute node to the central scheduler database. The last command sacct is the query to the central scheduler database for the desired information now available for this job which is then printed on the job standard output file.

# submit a simple job to check host name sbatch -p short -t 0-0:10:0 --mem 2G -o myjob.log --wrap "srun -n 1 hostname; sleep 5s; sacct --units M --format=jobid,user%5,state%7,CPUTime,ExitCode%4,MaxRSS,NodeList,Partition,ReqTRES%25,Start,End -j \$SLURM_JOBID" # Here is the content printed in the standard output file myjob.log compute-e-16-187.o2.rc.hms.harvard.edu JobID User State CPUTime Exit MaxRSS NodeList Partition ReqTRES Start End -------- ----- ------- ---------- ---- ---------- ------------------ ---------- ------------------------- ------------ ------------ 10388280 ld32 RUNNING 00:00:00 0:0 compute-e-16-193 short cpu=1,mem=2048M,node=1 10:14:31 Unknown 1038828+ COMPLE+ 00:00:00 0:0 0.65M compute-e-16-193 10:14:31 10:14:31

A quick example submitting the job using a sbatch script  

(Note: '-n 1' is needed for srun command. It make sure only run one copy of the command.)

First create a script, in this example named myJob.shthat contains both the scheduler flags, with the usual format #SBATCH flag, and the commands to be executed. In this second example the command to be executed is date which will return the current date and time.

Similarly to the first example the date command is preceded by srun -n 1 to execute it as a single,separate slurm step so that the following sacct command can be used to report the job resource usage.

#!/bin/bash #SBATCH -p short  #SBATCH -t 0-0:10:0 #SBATCH -o myJob.out #SBATCH --mem=2G unset SLURM_CPU_BIND srun -n 1 date sleep 5 # wait for slurm to get the job status into its database sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID

After creating the myJob.sh file submit the job using the command sbatch and passing the script file as argument

sbatch myJob.sh

If a workflow includes a sequence of commands each of those can be preceded by srun -n 1 and in this case the final sacct command will report detailed resource usage for each command separately. For example:

Important:

When the command sacct is executed the overall job will be reported as RUNNING even if the srun steps are complete, This happens because the job is still actually running and executing the command sacct.

The scheduler considers each command preceded by srun as a separate step. This implies that each step-command is evaluated separately and it is possible to have a COMPLETED job that contains FAILED steps. Moreover if a srun step fails the scheduler will move to the next srun step and will not terminate the job. 

For example running the following job:

Will produce the following output:

The first job step produces an error as expected, however the scheduler moves over and still executes the second job step srun date which prints the current date to the standard output file, this is also captured on the "State" field of the job summary.

The slurm scheduler provides a flag (-K) that could be passed to the srun command to catch an error and should in such case interrupt the job, however the flag -K does not seem to be functional at the moment and RC is awaiting a response from SchedMD, the company supporting the scheduler.

At the moment a workaround to achieve the same result is to modify the previous script as showed in the example below:

where the lines  

srun -n 1 this_is_an_error

srun -n 1 date

are  modified to be 

srun -n 1 bash -c "this_is_an_error || scancel $SLURM_JOBID "

srun -n 1 bash -c "date || scancel $SLURM_JOBID "

In this case the srun will execute a bash sequence: first the desired job command (in this case our artificial error), then, if and only if the first command fail the job is cancelled before the second srun-step can be executed.

In this case the output will be:

Please note that since the job is interrupted before reaching the sacct command no resource usage information is printed on the standard output.