|
...
The current partition configuration:
Partition | Workflow | Specification * |
---|---|---|
short | > 2 short jobs | 12 hours |
medium | > 2 medium jobs | 5 days |
long | > 2 long jobs | 30 days |
interactive | Interactive work (e.g. MATLAB graphical interface, editing/testing code) rather than a batch job | 2 job limit, 12 hours (default 4GB memory) |
mpi | MPI parallel job using multiple nodes (with sbatch | 5 days limit |
priority | 1 or 2 jobs at a time | 2 job limit, 30 day limit |
transfer | 4 cores max, 5 days limit, 5 concurrently cores per user | |
gpu, gpu_requeue, gpu_quad, gpu_mpi_quad |
...
Note: Any time <userid>
is mentioned in this document, it should be replaced with your HMS ID (formerly eCommons ID) and omit the <>. Likewise, <jobid>
should be replaced with an actual job ID, such as 12345. The name of a batch job submission script should be inserted wherever <jobscript>
is mentioned.
SLURM command | Sample command syntax | Meaning |
---|---|---|
sbatch |
| Submit a batch (non-interactive) job. |
srun |
| Start an interactive session for five minutes in the interactive queue with default 1 CPU core and 4GB of memory |
squeue |
| View status of your jobs in the queue. Only non-completed jobs will be shown. We have an easier-to-use alternative command called O2squeue. |
scontrol |
| Look at a running job in detail. For more information about the job, add the |
scancel |
| Cancel a job. |
scontrol |
| Pause a job |
scontrol |
| Release a held job (allow it to run) |
sacct |
| Check job accounting data. Running We have an easier-to-use alternative command called O2_jobs_report. |
sinfo |
| See node and partition information. Use the |
...
the partition (using
-p
)a runtime limit, i.e., the maximum hours and minutes (
-t 2:30:00
) the job will run. The job will be killed if it runs longer than this limit, so it's better to overestimate.
Most users will be submitting jobs to the short, medium,
long, priority
, or interactive
partitions, some in mpi
. Here is a guide to choose a proper partition:
...
Workflow
...
Best queue to use
...
Time limit
...
1 or 2 jobs at a time
...
priority
...
1 month
...
> 2 short jobs
...
short
...
12 hours
...
> 2 medium jobs
...
medium
...
5 days
...
> 2 long jobs
...
long
...
1 month
...
MPI parallel job using multiple nodes (with sbatch -a
and mpirun.sh
)
...
mpi
...
5 days
...
...
gpu, gpu_quad,gpu_mpi_quad, gpu_requeue
...
5 days
...
Interactive work (e.g. MATLAB graphical interface, editing/testing code) rather than a batch job
...
interactive
...
sbatch
options quick reference
...
If yes, then if a dependency is failed, the job will automatically cancel itself. Our current configuration already removes jobs whose dependencies will never be met, but it is probably best to always include this flag when submitting a job with dependencies (in case our configuration changes in the future).
Monitoring Jobs
There are several commands you may wish to use to see the status of your jobs, including:
...