Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: de-duplicated table with sbatch workflow information

Table of Contents

...

The current partition configuration:

Partition

Workflow

Specification *

short

> 2 short jobs

12 hours

medium

> 2 medium jobs

5 days

long

> 2 long jobs

30 days

interactive

Interactive work (e.g. MATLAB graphical interface, editing/testing code) rather than a batch job

2 job limit, 12 hours (default 4GB memory)

mpi

MPI parallel job using multiple nodes (with sbatch -a and mpirun.sh)

5 days limit

priority

1 or 2 jobs at a time

2 job limit, 30 day limit

transfer

4 cores max, 5 days limit, 5 concurrently cores per user

gpu, gpu_requeue, gpu_quad, gpu_mpi_quad

GPU job

see Using O2 GPU resources

...

Note:  Any time <userid> is mentioned in this document, it should be replaced with your HMS ID (formerly eCommons ID) and omit the <>. Likewise, <jobid> should be replaced with an actual job ID, such as 12345. The name of a batch job submission script should be inserted wherever <jobscript> is mentioned.

SLURM

command

Sample command syntax

Meaning

sbatch

sbatch <jobscript>

Submit a batch (non-interactive) job.

srun

srun --pty -t 0-0:5:0 -p interactive /bin/bash

Start an interactive session for five minutes in the interactive queue with default 1 CPU core and 4GB of memory

squeue

squeue -u <userid>

View status of your jobs in the queue. Only non-completed jobs will be shown.

We have an easier-to-use alternative command called O2squeue.

scontrol

scontrol show job <jobid>

Look at a running job in detail. For more information about the job, add the -dd parameter.

scancel

scancel <jobid>

Cancel a job. scancel can also be used to kill job arrays or job steps.

scontrol

scontrol hold <jobid>  

Pause a job

scontrol

scontrol release <jobid>

Release a held job (allow it to run)

sacct

sacct -j <jobid>

Check job accounting data. Running sacct is most useful for completed jobs.

We have an easier-to-use alternative command called O2_jobs_report.

sinfo

sinfo

See node and partition information. Use the -N parameter to see information per node.

...

  • the partition (using -p)

  • a runtime limit, i.e., the maximum hours and minutes (-t 2:30:00) the job will run. The job will be killed if it runs longer than this limit, so it's better to overestimate.

Most users will be submitting jobs to the short, medium, long, priority, or interactive partitions, some in mpi. Here is a guide to choose a proper partition:

...

Workflow

...

Best queue to use

...

Time limit

...

1 or 2 jobs at a time

...

priority

...

1 month

...

> 2 short jobs

...

short

...

12 hours

...

> 2 medium jobs

...

medium

...

5 days

...

> 2 long jobs

...

long

...

1 month

...

MPI parallel job using multiple nodes (with sbatch -a and mpirun.sh)

...

mpi

...

5 days

...

GPU job

...

gpu, gpu_quad,gpu_mpi_quad, gpu_requeue

...

5 days

...

Interactive work (e.g. MATLAB graphical interface, editing/testing code) rather than a batch job

...

interactive

...

sbatch options quick reference

...

If yes, then if a dependency is failed, the job will automatically cancel itself. Our current configuration already removes jobs whose dependencies will never be met, but it is probably best to always include this flag when submitting a job with dependencies (in case our configuration changes in the future).

Monitoring Jobs

There are several commands you may wish to use to see the status of your jobs, including:

...