In O2 the SLURM scheduler will not include a report of CPU and Memory usage in the standard output file or email once the job is completed. That information is available after a job completes by querying SLURM database with the command sacct, examples of how to use sacct command are available here.
It is also possible to print information about a job in its standard output file which is created by the scheduler for each job. The two paragraphs below contains examples on how to include a detailed report on resources used by the job both for jobs submitted using the --wrap="" option or via slurm sbatch script.
A quick example with --wrap option (Note: '-n 1' is needed for srun command. It make sure only run one copy of the command)
In the example below a job is submitted using the sbatch command and passing the commands to be executed directly with the --wrap="" option. In this example the command to be executed is hostname which will simply return the name of the compute node. You can replace hostname with the command your job is running. Note that the hostname command is preceded by srun -n 1, this instruction tells the scheduler to run the command as a first and separate step and collect resource usage data when completed.
...
Code Block | ||
---|---|---|
| ||
# submit a simple job to check host name sbatch -p short -t 0-0:10:0 --mem 2G -o myjob.log --wrap "srun -n 1 hostname; sleep 5s; sacct --units M --format=jobid,user%5,state%7,CPUTime,ExitCode%4,MaxRSS,NodeList,Partition,ReqTRES%25,Start,End -j \$SLURM_JOBID" # Here is the content printed in the standard output file myjob.log compute-e-16-187.o2.rc.hms.harvard.edu JobID User State CPUTime Exit MaxRSS NodeList Partition ReqTRES Start End -------- ----- ------- ---------- ---- ---------- ------------------ ---------- ------------------------- ------------ ------------ 10388280 ld32 RUNNING 00:00:00 0:0 compute-e-16-193 short cpu=1,mem=2048M,node=1 10:14:31 Unknown 1038828+ COMPLE+ 00:00:00 0:0 0.65M compute-e-16-193 10:14:31 10:14:31 |
A quick example submitting the job using a sbatch script (Note: '-n 1' is needed for srun command. It make sure only run one copy of the command.)
First create a script, in this example named myJob.sh, that contains both the scheduler flags, with the usual format #SBATCH flag, and the commands to be executed. In this second example the command to be executed is date which will return the current date and time.
Similarly to the first example the date command is preceded by srun -n 1 to execute it as a single,separate slurm step so that the following sacct command can be used to report the job resource usage.
Code Block | ||
---|---|---|
| ||
#!/bin/bash #SBATCH -p short #SBATCH -t 0-0:10:0 #SBATCH -o myJob.out #SBATCH --mem=2G srun -n 1 date sleep 5 # wait for slurm to get the job status into its database sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID |
...
Code Block | ||
---|---|---|
| ||
#!/bin/bash #SBATCH -p short #SBATCH -t 0-0:10:0 #SBATCH -o myJob.out #SBATCH --mem=2G srun -n 1 your_first_command_here srun -n 1 your_second_command_here srun -n 1 your_third_command_here sleep 5 # wait for slurm to get the job status into its database sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID |
Important:
When the command sacct is executed the overall job will be reported as RUNNING even if the srun steps are complete, This happens because the job is still actually running and executing the command sacct.
...