Table of Contents

This page gives a basic introduction to the O2 cluster for new cluster users. Reading this page will help you to submit interactive and batch jobs to the Slurm scheduler on O2, as well as teach you how to monitor and troubleshoot your jobs as needed.

The basic process of running jobs:

You login via SSH (secure shell) to the host: o2.hms.harvard.edu
1. If you are connecting to O2 from outside of the HMS network, you will be required to use two-factor authentication.
2. Please reference this wiki page for detailed instructions on how to login to the cluster.
If necessary, you copy your data to the cluster from your desktop or another location. See the File Transfer page for data transfer instructions. You may also want to copy large data inputs to the

...

scratch filesystem for faster processing.
You submit your job - for example, a program to map DNA reads to a genome - specifying how long your job will take to run and what partition to run in. You can modify your job submission in many ways, like requesting a large amount of memory, as described below.
Your job sits in a partition (job status PENDING), and when it's your turn to run, SLURM finds a computer that's not too busy.
Your job runs on that computer (also known as a "compute node"), and goes to job status RUNNING. While it's running, you don't interact with the program. If you're running a program that requires user input or pointing and clicking, see Interactive Sessions below.
The job finishes running (job status COMPLETED, or FAILED if it had an error). You get an email when the job is finished if you specified --mail-type (and optionally --mail-user) in your job submission command.
If necessary, you might want to copy data back from the scratch filesystem to a backed-up location, or to your desktop.

Definitions

Here's some of the terminology that you'll find used in relation to SLURM in this and other documents:

...

We can also get more granular in talking about where to store your files, specifically what directories a cluster user can write to on O2. Every cluster user has a /home directory with a 100GiB quota. When you login to the cluster, you will be placed in your home directory. This directory is named like /home/user, where "user" is replaced with your eCommons HMS ID in lowercase. Additionally, each cluster user can use /n/scratch3scratch for storage of temporary or intermediary files. The per-user scratch3 scratch quota is 10TiB 25TiB or 1 2.5 million files/directories. Any file on scratch3 scratch that is not accessed modified for 30 45 days will be deleted, and there are no backups of scratch3 scratch data. A cluster user must create their scratch3 scratch directory using a provided script. Finally, there are lab or group directories. These are located under /n/groups, /n/data1, or /n/data2. The quota for group directories is shared for all group members, and there is not a standard quota that all groups have. Any of these directory options (/home, /n/scratch3scratch, group directory) can be used to store data that will be computed against on O2. When you submit a job to the SLURM scheduler on O2, you will need to mention in which directory your data (that you want to compute on) is stored and where you want the output data to be stored.

For more details on using these Storage options on O2, please refer to the Filesystems, Filesystems Quotas, Scratch3 Scratch Storage and the File Transfer wiki pages.

...

The current partition configuration:

Partition	Specification *
short	12 hours
medium	5 days
long	30 days
interactive	2 job limit, 12 hours (default 4GB memory)
mpi	5 days limit
priority	2 job limit, 30 day limit
transfer	4 cores max, 5 days limit, 5 concurrently cores per user
gpu, gpu_requeue, gpu_quad, gpu_mpi_quad	see Using O2 GPU resources

Check our How to choose a partition in O2 chart to see which partition you should use.

...

Note: Any time <userid> is mentioned in this document, it should be replaced with your HMS account, ID (formerly called an eCommons ID () and omit the <>). Likewise, <jobid> should be replaced with an actual job ID, such as 12345. The name of a batch job submission script should be inserted wherever <jobscript> is mentioned.

SLURM command	Sample command syntax	Meaning
sbatch	`sbatch <jobscript>`	Submit a batch (non-interactive) job.
srun	`srun --pty -t 0-0:5:0 -p interactive /bin/bash`	Start an interactive session for five minutes in the interactive queue with default 1 CPU core and 4GB of memory
squeue	`squeue -u <userid>`	View status of your jobs in the queue. Only non-completed jobs will be shown. We have an easier-to-use alternative command called O2squeue.
scontrol	`scontrol show job <jobid>`	Look at a running job in detail. For more information about the job, add the `-dd` parameter.
scancel	`scancel <jobid>`	Cancel a job. `scancel` can also be used to kill job arrays or job steps.
scontrol	`scontrol hold <jobid>`	Pause a job
scontrol	`scontrol release <jobid>`	Release a held job (allow it to run)
sacct	`sacct -j <jobid>`	Check job accounting data. Running `sacct` is most useful for completed jobs. We have an easier-to-use alternative command called

O2sacct

O2_jobs_report.
sinfo	`sinfo`	See node and partition information. Use the `-N` parameter to see information per node.

Submitting Jobs

There are two commands to submit jobs on O2: sbatch or srun.

...

Most users will be submitting jobs to the short, medium, long, priority, or interactive partitions, some in mpi. Here is a guide to choose a proper partition:

Workflow	Best queue to use	Time limit
1 or 2 jobs at a time	`priority`	1 month
> 2 short jobs	`short`	12 hours
> 2 medium jobs	medium	5 days
> 2 long jobs	`long`	1 month
MPI parallel job using multiple nodes (with sbatch `-a` and `mpirun.sh`)	`mpi`	5 days
GPU job	`gpu, gpu_quad,gpu_mpi_quad, gpu_requeue`	5 days

and 1 day


Interactive work (e.g. MATLAB graphical interface, editing/testing code) rather than a batch job	`interactive`	12 hours

`sbatch` options quick reference

...

We recommend using commands such as O2squeue or O2sacct O2_jobs_report for job monitoring instead of relying on the email notifications, which do not contain much information.

...

Again, we would recommend leveraging commands like O2squeue or O2sacct O2_jobs_report for job monitoring instead of the SLURM email notifications.

...

If we save this script as srun_in_sbatch.sh, it can be submitted by sbatch srun_in_sbatch.sh. After the job completes, you can see the job statistics (which will be broken down by numbered job steps) by running sacct -j <jobid> or by using O2sacct O2_jobs_report -j <jobid> .

Job Arrays

Job arrays can be leveraged to quickly submit a number of similar jobs. For example, you can use job arrays to start multiple instances of the same program on different input files, or with different input parameters. A job array is technically one job, but with multiple tasks.

...

The full set of environment variables when a job array is submitted:

ENV_VAR	function
SLURM_JOB_ID	the jobID of each job in the array (distinct). Passable with `%j`.
SLURM_ARRAY_JOB_ID	the jobID of the whole array (the same for every job in the array; equal to the SLURM_JOB_ID of the first job dispatched in the array). Passable with `%A`.
SLURM_ARRAY_TASK_ID	index of the job in the array (distinct). Passable with `%a`.

To control how many jobs can be executed at a time, specify this inside the --array flag with %. To modify the above sbatch command to only allow 5 running jobs in the array at a time:

...

Other dependency parameters:

parameter	usage
after:jobid[:jobid...]	asynchronous execution (begin after <jobid>(s) has begun
afterany:jobid[:jobid...]	begin after <jobid>(s) has terminated (EXIT or DONE)
afterok:jobid[:jobid...]	begin after <jobid>(s) have successfully finished with exit code of 0
afternotok:jobid[:jobid...]	begin after <jobid>(s) has failed
singleton	begin after any jobs with the same name and user have terminated

Using ? with a dependency allows it to be satisfied no matter what. It is possible to chain multiple dependencies together:

...

squeue - by default squeue will show information about all users' jobs. Use -u <userid> to get information just about yours. An easier-to-use alternative command to squeue is called O2squeue.
scontrol - most scontrol options can't be invoked by regular users, but scontrol show job <jobid> is a useful command that gives detailed job information. This command only works for currently running jobs.
sstat - shows status information for currently running jobs. Many fields can be requested using the --format parameter. Reference the job status fields in the sstat documentation for more information.
sacct - reports accounting information for jobs and job steps. This works for both running or completed jobs, but it is most useful for completed jobs. Many fields can be requested using the --format parameter. Check the job accounting fields in the sacct documentation for more information. An easier-to-use alternative command to sacct is called O2sacctO2_jobs_report.

Example monitoring commands

...

By default, you will not receive an execution report by e-mail when a job you have submitted to Slurm completes. If you would like to receive such notifications, please use --mail-type and optionally --mail-user in your job submission. Currently, the SLURM emails contain minimal information (jobid, job name, run time, status, and exit code); this was an intentional design feature from the SLURM developers. At this time, we suggest running sacct/O2sacctO2_jobs_report queries for more detailed information than the job emails provide.

...

We recommend monitoring your jobs using commands on O2 such as squeue or O2squeue, and sacct or O2sacctO2_jobs_report.

More information for a job can be found by running sacct -j <jobid>, or by using O2sacct O2_jobs_report -j <jobid>. The following command can be used to obtain accounting information for a completed job:

...

Similarly, if your job used too much memory, you will receive an error like: Job <jobid> exceeded memory limit <memorylimit>, being killed. For this job, sacct or O2sacct O2_jobs_report will report a larger MaxRSS than ReqMem, and OUT_OF_MEMORY job status. You will need to rerun the job, requesting more memory.

...

You will need to connect to O2 with the -XY flags in your SSH command. Substitute <username> for your eCommonsHMS ID:

Code Block
$ ssh -XY <username>@o2.hms.harvard.edu

...

Each job has a runtime limit, which is the maximum number of seconds that the job can be in RUNNING state. This is also known as "wall clock time". You specify this limit with the -t parameter with sbatch or srun. If your job does not complete prior to the runtime limit, then your job will be killed. Each partition has a maximum time limit. See below for details:

Queue	Maximum time limit
short	12 hours
medium	5 days
long	30 days
interactive	12 hours
mpi	5 days
priority	30 days
transfer	5 days
gpu, gpu_requeue, gpu_quad, gpu_mpi_quad	see Using O2 GPU resources

Requesting resources

You may want to request a node with specific resources for your job. For example, your job may require 4 GB of free memory in order to run. Or you might want one of the nodes set aside for file transfers or other purposes.

...

groups (/n/groups)
log (/n/log - for web hosting users)
files (/n/files - accessible through transfer partition, but you must request access to run jobs in this partition)
scratch3scratch (/n/scratch3scratch - for storage of temporary or intermediary files)

Example Usage:

sbatch --constraint="files"

...

Versions Compared

Old Version 86

New Version Current

Key

Definitions

Submitting Jobs

`sbatch` options quick reference

Job Arrays

Example monitoring commands

Requesting resources

Page Comparison

Versions Compared

Old Version 86

New Version Current

Key

Definitions

Submitting Jobs

sbatch options quick reference

Job Arrays

Example monitoring commands

Requesting resources

`sbatch` options quick reference