Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


In O2 the SLURM scheduler will not include a report of CPU and Memory usage in the standard output file or email once the job is completed. That information is available after a job completes by querying SLURM database with the command sacct, examples of how to use sacct command are available here.

It is also possible to print information about a job in its standard output file which is created by the scheduler for each job. The two paragraphs below contains examples on how to include a detailed report on resources used by the job both for jobs submitted using the --wrap="" option or via slurm sbatch script.

A quick example with --wrap option (Note: '-n 1' is needed for srun command. It make sure only run one copy of the command)

In the example below a job is submitted using the sbatch command and passing the commands to be executed directly with the --wrap="" option. In this example the command to be executed is hostname which will simply return the name of the compute node. You can replace hostname with the command your job is running. Note that the hostname command is preceded by srun -n 1, this instruction tells the scheduler to run the command as a first and separate step and collect resource usage data when completed.

...

Code Block
linenumberstrue
# submit a simple job to check host name
sbatch -p short -t 0-0:10:0 --mem 2G -o myjob.log --wrap "srun -n 1 hostname; sleep 5s; sacct --units M --format=jobid,user%5,state%7,CPUTime,ExitCode%4,MaxRSS,NodeList,Partition,ReqTRES%25,Start,End -j \$SLURM_JOBID" 


# Here is the content printed in the standard output file myjob.log
compute-e-16-187.o2.rc.hms.harvard.edu
JobID User State CPUTime Exit MaxRSS NodeList Partition ReqTRES Start End 
-------- ----- ------- ---------- ---- ---------- ------------------ ---------- ------------------------- ------------ ------------ 
10388280 ld32 RUNNING 00:00:00 0:0 compute-e-16-193 short cpu=1,mem=2048M,node=1 10:14:31 Unknown 
1038828+ COMPLE+ 00:00:00 0:0 0.65M compute-e-16-193 10:14:31 10:14:31 

A quick example submitting the job using a sbatch script  (Note: '-n 1' is needed for srun command. It make sure only run one copy of the command.)

First create a script, in this example named myJob.shthat contains both the scheduler flags, with the usual format #SBATCH flag, and the commands to be executed. In this second example the command to be executed is date which will return the current date and time.

Similarly to the first example the date command is preceded by srun -n 1 to execute it as a single,separate slurm step so that the following sacct command can be used to report the job resource usage.


Code Block
linenumberstrue
#!/bin/bash
#SBATCH -p short           
#SBATCH -t 0-0:10:0
#SBATCH -o myJob.out
#SBATCH --mem=2G

srun -n 1 date             
sleep 5                                 # wait for slurm to get the job status into its database
sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID

...

Code Block
linenumberstrue
#!/bin/bash
#SBATCH -p short           
#SBATCH -t 0-0:10:0
#SBATCH -o myJob.out
#SBATCH --mem=2G

srun -n 1 your_first_command_here
srun -n 1 your_second_command_here
srun -n 1 your_third_command_here
             
sleep 5                                 # wait for slurm to get the job status into its database
sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID


Important:

When the command sacct is executed the overall job will be reported as RUNNING even if the srun steps are complete, This happens because the job is still actually running and executing the command sacct.

...