Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

Note:  Any time <userid> is mentioned in this document, it should be replaced with your HMS ID (formerly eCommons ID) and omit the <>. Likewise, <jobid> should be replaced with an actual job ID, such as 12345. The name of a batch job submission script should be inserted wherever <jobscript> is mentioned.

SLURM

command

Sample command syntax

Meaning

sbatch

sbatch <jobscript>

Submit a batch (non-interactive) job.

srun

srun --pty -t 0-0:5:0 -p interactive /bin/bash

Start an interactive session for five minutes in the interactive queue with default 1 CPU core and 4GB of memory

squeue

squeue -u <userid>

View status of your jobs in the queue. Only non-completed jobs will be shown.

We have an easier-to-use alternative command called O2squeue.

scontrol

scontrol show job <jobid>

Look at a running job in detail. For more information about the job, add the -dd parameter.

scancel

scancel <jobid>

Cancel a job. scancel can also be used to kill job arrays or job steps.

scontrol

scontrol hold <jobid>  

Pause a job

scontrol

scontrol release <jobid>

Release a held job (allow it to run)

sacct

sacct -j <jobid>

Check job accounting data. Running sacct is most useful for completed jobs.

We have an easier-to-use alternative command called O2_jobs_report.

sinfo

sinfo

See node and partition information. Use the -N parameter to see information per node.

...

We recommend using commands such as O2squeue or O2sacctO2_jobs_report for job monitoring instead of relying on the email notifications, which do not contain much information.

...

Again, we would recommend leveraging commands like O2squeue or O2sacctO2_jobs_report for job monitoring instead of the SLURM email notifications.

...

If we save this script as srun_in_sbatch.sh, it can be submitted by sbatch srun_in_sbatch.sh. After the job completes, you can see the job statistics (which will be broken down by numbered job steps) by running sacct -j <jobid> or by using O2sacct O2_jobs_report -j <jobid> .

Job Arrays

Job arrays can be leveraged to quickly submit a number of similar jobs. For example, you can use job arrays to start multiple instances of the same program on different input files, or with different input parameters. A job array is technically one job, but with multiple tasks.

...


Example monitoring commands

...

By default, you will not receive an execution report by e-mail when a job you have submitted to Slurm completes. If you would like to receive such notifications, please use --mail-type and optionally --mail-user in your job submission. Currently, the SLURM emails contain minimal information (jobid, job name, run time, status, and exit code); this was an intentional design feature from the SLURM developers. At this time, we suggest running sacct/O2sacctO2_jobs_report queries for more detailed information than the job emails provide.

...

We recommend monitoring your jobs using commands on O2 such as squeue or O2squeue, and sacct or O2sacctO2_jobs_report.

More information for a job can be found by running sacct -j <jobid>, or by using O2sacct O2_jobs_report -j <jobid>. The following command can be used to obtain accounting information for a completed job:

...

Similarly, if your job used too much memory, you will receive an error like: Job <jobid> exceeded memory limit <memorylimit>, being killed. For this job, sacct or O2sacct O2_jobs_report will report a larger MaxRSS than ReqMem, and OUT_OF_MEMORY job status. You will need to rerun the job, requesting more memory.

...