Moving from Orchestra to O2
Deprecated Page
This documentation was written in 2016 specifically for those transitioning from our previous cluster named Orchestra to the then-brand-new O2 cluster. As the Orchestra cluster was retired in March 2018, we would strongly recommend that any new O2 cluster user review the "Using Slurm Basic" page instead of this page. This page is no longer being updated.
The Orchestra cluster used the LSF scheduler. LSF takes bsub commands and dispatches jobs to cluster nodes.
O2 uses a different scheduler called Slurm. Most of the concepts are the same, but the command names and flags are different. For example, in LSF you submit a bowtie job to a queue with a time limit using bsub -q mcore -W 10:0 -n 4 bowtie -p 4 ...
; in Slurm you submit a job to a partition using sbatch -p short -t 10:0:0 -c 4 -N 1 --wrap="bowtie -p 4 ..."
. (As there is more than one way to submit jobs through Slurm, more details on the changes can be found below.)
Logging in to O2
Note: if you have only logged into Orchestra before, you need a separate O2 account. Contact Research Computing to get one.
- ssh login to the hostname:
o2.hms.harvard.edu
- If you're connecting to O2 from outside of the HMS network, you will need to use two-factor authentication.
- Use PuTTY on Windows or Terminal on Mac or whatever method you used for Orchestra. You will land on a machine named something like
login01
,login02
, etc.
Where are my files?
In general, O2 reads the same file paths as Orchestra. If you change a file in /home/ab123/blah
on Orchestra and then log into O2, that same file will be changed.
IMPORTANT NOTE: /groups
on Orchestra is called /n/groups
on O2! Everything underneath that should look the same, though. So, for example, many bowtie2 databases are in /n/groups/shared_databases/bowtie2_indexes
.
Where is my software?
We have switched from the "environment modules" system to "Lmod". For the average user, this won't make much difference, although you will need to change the names of modules you load. The modules are no longer in categories like "seq" and "dev".
Like on Orchestra, typing module avail
will tell you what software is available, and module load samtools/1.3.1
loads the samtools module.
Typing module spider
will give you a list of all software modules available. There is also a search tool, module keyword
, which can be used with a search term to return matching modules.
As Lmod is a hierarchical system, you may need to load a prerequisite module to be able to load you need (or see it in module avail
). For example, if you'd like to load the Cython module, but don't know the prerequisite modules, run module spider cython
. Then, you can load the prerequisites (listed in the module spider
output), and will be able to load Cython.
NOTE: If you are used to running programs with full paths like /opt/samtools/...
, that won't work, because there is no /opt
on O2. The apps are installed under /n/app
, but we'd prefer that you use modules when possible.
Beta test note: You need to module load gcc/6.2.0
to see many of the available bioinformatics modules. module spider
will list of all of the available modules, even if you do not have gcc/6.2.0
loaded.
What are nodes like?
The majority of nodes on O2 have 32 cores and 256 GB RAM each.
NOTE: you might be tempted to run your jobs with as many cores as possible (20 is the maximum number of cores you can request in all queues except mpi). Most multi-threaded programs (like many NGS applications) will not be 20 times faster on 20 cores. You might want to experiment with 1 core, 4, 8, etc., to see what kind of speedups you get. Asking for more cores will often lead to a substantially longer pend time, which can remove any benefit of a slight speedup in run time.
SLURM accounts
Though you can log in to O2 login nodes with your eCommons authentication credentials, in order to submit jobs to the SLURM scheduler, your eCommons ID needs to be assigned to a SLURM account. To check whether this has already been done, please run the command:
sshare -Unu $USER
If you get output similar in form to this:
simpson_hjs42 abc12 1 0.001106 0 0.000000 1.000000
...then you're ready to start submitting jobs to SLURM.
Getting no output from this command, on the other hand, means that your eCommons ID is not yet assigned to a SLURM account. As a result, if you attempt to submit a job to SLURM, you will see an error like one of the following:
sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
srun: error: Unable to allocate resources: Invalid account or account/partition combination specified
In order to associate your eCommons ID with a SLURM account, and thereby be permitted to submit jobs to the SLURM scheduler (and avoid such errors), please submit a request to us using the Account Request form.
What if I am using both Orchestra and O2?
.bashrc
file in your home directory, aka ~/.bashrc
), you may notice error messages when logging into O2, or when starting an interactive session. Orchestra and O2 read the same ~/.bashrc
, which means the commands in the file will execute regardless of which cluster you're on. For example, if you load modules in your ~/.bashrc
, these same modules may not exist on O2. Likewise, if you add a directory in /groups
to your PATH
, it will not work because /groups
is now /n/groups
on O2. ~/.bashrc
, ~/.bash_profile
, etc.) so that they work properly on all the RC clusters (O2, Orchestra, and Transfer), you can have their code test for the value of the HMS_CLUSTER
environment variable, which is automatically set upon login to one of the values o2
, orchestra
, or transfer
.if [[ "$HMS_CLUSTER" = o2 ]] then # O2-specific settings elif [[ "$HMS_CLUSTER" = orchestra ]] then # Orchestra-specific settings fi
Translate LSF commands
This table gives rough equivalents for the LSF commands you're using to using. There's more information below for some of them. You can also read the man pages for the Slurm commands.
(Note: Below and throughout the document, text like <jobid> should be replaced with an actual job ID, like 12345, and <userid> would be replaced by your eCommons ID, without the <> around it.
LSF command | Slurm analogue | sample command syntax |
---|---|---|
bsub | sbatch (submit a batch job) srun (for interactive, or starting a job step) | sbatch <jobscript> |
bjobs | squeue | squeue -j <jobid> squeue -u <userid> (all jobs by a user) scontrol show job <jobid> (like bjobs -l) |
bkill | scancel | scancel <jobid> |
bstop | scontrol hold | scontrol hold <jobid> |
bresume | scontrol release | scontrol release <jobid> |
bqueues | sinfo | sinfo -p <partition> |
bhosts | sinfo -N (or scontrol show nodes) | |
bhist | sacct | sacct -j <jobid> |
Partitions (aka Queues)
The partitions on O2 are like the shared queues on Orchestra, including short
, medium
, long
, priority
, interactive
, mpi, transfer and gpu
.
Changes from Orchestra and other notes:
- The
medium
queue, which allows jobs up to 5 days - Jobs in the
long
queue (for now) will not be suspended - There is no
mcore
queue. You can run multi-core jobs in the regular queues mpi
partition is for MPI jobs only, not multi-threaded shared-memory jobs (this is true of Orchestra too). Please check our dedicated mpi wiki page for additional information- gpu partition limit is based on the total amount of GPU-hours allocated for each user
Check our How to choose a partition in O2 chart to see which partition you should use.
Submitting jobs
The sbatch
command
Like LSF, Slurm allows you to submit jobs from the command line, or to create scripts that get submitted. We encourage users to use scripts, which are more reproducible and easier to troubleshoot. The #SBATCH
commands at the top of the script are like the #BSUB
commands at the top of LSF scripts.
A typical (basic) batch job submission script looks like the following:
#!/bin/bash #SBATCH -c 1 # Number of cores requested #SBATCH -t 5 # Runtime in minutes # Or use HH:MM:SS or D-HH:MM:SS, instead of just number of minutes #SBATCH -p short # Partition (queue) to submit to #SBATCH --mem-per-cpu=8G # 8 GB memory needed (memory PER CORE) #SBATCH --open-mode=append # append adds to outfile, truncate deletes first ### In filenames, %j=jobid, %a=index in job array #SBATCH -o %j.out # Standard out goes to this file #SBATCH -e %j.err # Standard err goes to this file #SBATCH --mail-type=END # Mail when the job ends #write command-line commands below this line hostname
You can run this script by saving it as myjob.sh
and running sbatch myjob.sh
on the command line.
Useful variable substitutions:
Sub | Meaning | LSF analogue |
---|---|---|
%j | jobid | %J |
%a | job array id | %I (capital i) |
%A | master jobid for array | %J |
%N | node name | |
%u | userid |
Slurm job arrays utilize %A
and %a
to make distinctions between jobs in the array, so it is suggested to use these variables for output files.
You don't have to use a separate script. You can use the --wrap
option to run a single command. However, we discourage running jobs this way because they are harder to troubleshoot (SLURM job accounting doesn't retain commands used with --wrap), and certain complex commands (that include piping, or using | ) will not be interpreted properly.
Here are some interesting sbatch flags and LSF analogues. Most flags have long and short versions. For example, sbatch --ntasks=4
is the same as sbatch -n 4
.
Note the --
flag, which reserves memory. The default memory reservation on O2 is only 1 Gigabyte, which is less than Orchestra. So if you get an error that says "Exceeded job memory limit", you'll need to use this option.m
em-per-cpu
flag | usage | LSF analogue | notes |
---|---|---|---|
--ntasks (-n) | -n <num_cores> | -n <num_cores> | number of tasks. (Use with -N). Currently, we suggest that the -c parameter be used for most jobs, not -n. |
--cpus-per-task (-c) | -c <num_cores> | -n <num_cores> | cores requested PER TASK, this is what is typically used for multithreaded parallel jobs. |
--nodes (-N) | -N <num_nodes> | -R "span[hosts=<num>]" | Automatically set to 1 except for MPI queue |
--time (-t) | -t <runtime> | -W <runtime> | specified in minutes. (Format is D-HH:MM:SS If one colon used, runtime is specified in MM:SS.) Required for all job submissions. |
--partition (-p) | -p <queue_name> | -q <queue_name> | Queues are called "Partitions" in Slurm |
--mem-per-cpu | --mem-per-cpu=<memory> | -R "rusage[mem=<memory>]" | RAM memory per core. To avoid confusion, it's best to include a size unit, one of K, M, G, T for kilobytes, megabytes, etc. So e.g. --mem-per-cpu=8G asks for 8 GB per core. |
--open-mode | --open-mode=[append|truncate] | -o vs -oe; -e vs -ee | choose whether to append to or truncate output files. Default action on Orchestra was like append. |
--output (-o) | -o <output_file> | -o <output_file> | default = stdout and stderr are in same file, named "slurm-%j.out", where the "%j" is replaced by the job ID. The default file name for job arrays is "slurm-%A_%a.out", where "%A" is the job ID and "%a" is the array index. |
--error (-e) | -e <err_file> | -e <err_file> | |
--job-name (-J) | --job-name=<jobname> | -J <jobname> | default = name of batch script |
--mail-type | --mail-type=<type>[,<type>...] | -B (BEGIN) -N (END) | types: NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit) BEGIN, END, FAIL will only send once if used with a job array. |
--mail-user | --mail-user=<email> | -u <email> | default = user that submitted the job (~/.forward files work like on Orchestra) used with --mail-type |
--dependency (-d) | --dependency=<deps> | -w <deps> | See below for details on specifying dependencies |
--wrap | --wrap='echo "hello"' | bsub echo "hello" | Runs a command without needing to make a shell script |
Job Customization
If you need some flexibility in your script (e.g. you need to submit the same job but with multiple input files, etc.), submission scripts can take command line arguments as well as inheriting environment variables (your entire current environment is exported with the job when it submits). Here is a shell script that takes a single command line argument:
#!/bin/bash #SBATCH -c 1 # Request one core #SBATCH -N 1 # Request one node (if you request more than one core with -c, also using # -N 1 means all cores will be on the same node) #SBATCH -t 0-00:05 # Runtime in D-HH:MM format #SBATCH -p short # Partition to run in #SBATCH --mem=100 # Memory total in MB (for all cores) #SBATCH -o hostname_%j.out # File to which STDOUT will be written, including job ID #SBATCH -e hostname_%j.err # File to which STDERR will be written, including job ID #SBATCH --mail-type=FAIL # Type of email notification- BEGIN,END,FAIL,ALL #SBATCH --mail-user=abc123@hms.harvard.edu # Email to which notifications will be sent echo $1 > file.txt
If you submit this script like so:
sbatch arguments.sh hello
When the job finishes, you should have an output file called file.txt
that contains the text hello
. If you are savvy at bash, you can use this functionality to script loops that can vary the parameters you send to your submission script to submit lots of similar jobs at the same time without having to manually modify the script itself.
You can also take environment variables as well. Say you run the following command:
export DESTFILE=file.txt
Then make the following change to arguments.sh
before submitting it as above:
echo $1 > $DESTFILE
You will still get the same result as above, even though you have not exported DESTFILE
inside your submission script. The value of DESTFILE
was exported along with the rest of your environment at submission time, because it was already set. However, for obvious reasons, it is recommended that you make such exports inside your submission script for documentation purposes so that you have a convenient record of what you set those variables to if you return to your submission some time later.
The srun
command
The srun
command is used for job submissions, and shares many options with sbatch
! However, srun
is usually for job steps, not batch jobs like sbatch
. Some example use cases for srun
are: starting an interactive session, or running job steps (such as within an script that will be submitted with sbatch)
. Depending on how srun
is invoked, you may need to request resources to be allocated.
Interactive Sessions
In the dev cluster, interactive sessions are currently configured to be submitted to any partition, not just interactive
. To submit an interactive session (with a run time of 12 minutes and memory of 8GB):
srun -p interactive --pty --mem 8000 -t 12:00 /bin/bash
For all batch jobs, sbatch
should be used instead.
Note: It is not recommended to submit srun
commands from within an interactive session. These commands will be executed on the same resource allocation as the interactive job. If you want to submit other jobs from within an interactive job, use sbatch
commands instead.
Within sbatch
scripts
With SLURM, srun
commands can be used in scripts to denote job steps. When monitoring your job, each command prepended with srun
will be shown as a different step. Additionally, if your job fails, job steps will help you troubleshoot better by narrowing down which command had the issue.
Here is an example script that will give the hostname of the node it was executed on, and return information about the partitions and nodes in O2.
#!/bin/bash #SBATCH -c 1 # 1 core #SBATCH -t 0-00:05 # Runtime of 5 minutes, in D-HH:MM format #SBATCH -p short # Run in short partition #SBATCH -o hostname_sinfo_%j.out # File to which STDOUT + STDERR will be written, including job ID in filename #SBATCH --mail-type=FAIL # ALL email notification type #SBATCH --mail-user=abc123@hms.harvard.edu # Email to which notifications will be sent srun hostname srun sinfo
If we save this script as srun_in_sbatch.sh
, it can be submitted by sbatch srun_in_sbatch.sh
. After the job completes, you can see the job statistics (which will be broken down by job step) by running sacct -j <jobid>
.
Getting Job Information
There are multiple ways to find information about your jobs, such as using the squeue
command (works on currently pending/running jobs only), looking at emailed job reports (contain less detail than those from Orchestra), and running sacct
(works for running and completed jobs). More information about job monitoring is detailed in Using Slurm Basic.
Monitoring pending or running jobs
squeue
gives information about jobs currently in the scheduler (pending, running, etc.). Unlike bjobs
, though, by default it shows jobs by ALL users (equivalent to bjobs -u all
). So you will usually want to run squeue -u <userid>
.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
279 short blast.sb ak150 R 0:02 1 compute-a-16-28
The ST
column gives job state: R
means running, PD
means pending, etc. A full list of job states can be found here in the SLURM squeue documentation.
If you have pending jobs and you'd like to know approximately when they will start, use squeue
--start -u <userid>
. The START_TIME
column reports the estimated time your job will begin. However, this estimate assumes that all other jobs on the cluster will run to the maximum runtime limits that users input (with sbatch -t
). So often your job will begin well before the reported START_TIME
.
JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON)
1289470 long wrap kmk34 PD 2017-05-20T17:19:58 1 compute-a-16-96 (Resources)
Useful squeue options:
squeue option | bjobs eqiuvalent | Notes |
---|---|---|
-j | <jobid> | Report on a specific job ID (or comma-separated set of job IDs) |
-n | -J | Job name |
-o / -O | Lets you ask for specific types of information about the job. See the squeue man page | |
-p | -q | Only jobs in a given partition |
--start | Report expected start time of pending jobs | |
-t | -r, -p, -d | Show jobs in a particular state (e.g. `-t R` is like `bjobs -r`) or comma-separated set of states |
-u | -u | Note that you can also give a comma-separated list of IDs |
-w | -m | Show jobs on a particular node |
Job completion emails or output/error files
--mail-type
and optionally --mail-user
in your job submission to receive these notifications. Also, your job output will not be emailed to you, even if you request email notifications. Slurm will automatically put job error and output in a file called slurm-<jobid>.out
in the directory you submitted the job from. If you would prefer, you can direct the output by specifying the -o
(stdout) and -e
(stderr) options in the job submission command.sacct
queries for more detailed information. Job accounting data using sacct
Emailed job reports only report the exit code, status, and run time of your job. More information for a job can be found by running sacct -j
<jobid>
. By default, sacct
only outputs a limited amount of information. You can specify additional output fields using the --format
option. You can see a full list of possible fields by running sacct --helpformat
.
Here is an example command to obtain accounting information for a completed job:
sacct -j <jobid> --format JobId,NNodes,Partition,NCPUs,State,ReqMem,MaxRSS,Elapsed,CPUTime,TimeLimit,ExitCode,Start,End
If you are contacting Research Computing because a job did not behave as expected, it's often helpful to include sacct
output in your request. Just attach it to the email, or paste it into the form.
Job Arrays
There are two ways to run job arrays. You can use a job array script just like a typical script, but submit 30 copies:
sbatch --array=1-30 submit.sh
Or you can insert this line in the script instead:
#SBATCH --array=1-30
(It may be desired to use %a
and %A
in output filenames in this case.)
If files are cleverly numbered, you can reference them with ${SLURM_ARRAY_TASK_ID}
which fetches the array ID, e.g. you need to process 30 fastq files, and they are named (something like) fastq1.fastq, fastq2.fastq, etc.:
<command that processes fastq files> /path/to/fastq"${SLURM_ARRAY_TASK_ID}".fastq
A different input file will go to each job in the array, mapped 1-to-1.
The full set of environment variables when a job array is submitted:
ENV_VAR | function |
---|---|
SLURM_JOB_ID | the jobID of each job in the array (distinct). |
SLURM_ARRAY_JOB_ID | the jobID of the whole array (the same for every job in the array; equal to the SLURM_JOB_ID of the first job dispatched in the array). Passable with %A . |
SLURM_ARRAY_TASK_ID | index of the job in the array (distinct). Passable with |
To control how many jobs can be executed at a time, specify this inside the --array
flag with %
. To modify the above sbatch
command to only allow 5 running jobs in the array at a time:
sbatch --array=1-30%5 submit.sh
This also works with the #SBATCH
directive.
To cancel specific jobs in the array:
# cancel job array 20, entries 5-10 scancel 20_[5-10] # cancel job array 20, entries 15 and 17 scancel 20_15 20_17 # Cancel the current job or job array element (if job array) if [[-z $SLURM_ARRAY_JOB_ID]]; then scancel $SLURM_JOB_ID else scancel ${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID} fi
Job Dependencies
If you need to submit a job that is reliant on a previous job(s) (e.g. <jobid> must complete successfully first), use
sbatch --dependency=afterok:<jobid>[:<jobid>...] submit.sh
Other dependency parameters:
parameter | usage |
---|---|
after:jobid[:jobid...] | asynchronous execution (begin after <jobid>(s) has begun |
afterany:jobid[:jobid...] | begin after <jobid>(s) has terminated (EXIT or DONE) |
afterok:jobid[:jobid...] | begin after <jobid>(s) have successfully finished with exit code of 0 |
afternotok:jobid[:jobid...] | begin after <jobid>(s) has failed |
singleton | begin after any jobs with the same name and user have terminated |
Using ?
with a dependency allows it to be satisfied no matter what. It is possible to chain multiple dependencies together:
sbatch --dependency=afterok:1:2,afterany:3:4,?afternotok:5 submit.sh
This job will submit only after jobs 1 and 2 have completed successfully AND after 3 and 4 have terminated, and after 5 has failed (or not failed). If any requirement is not met (e.g. job 1 fails), the job will never submit. The squeue
command will show these jobs with unmet requirements as PENDING
status with DependencyNeverSatisfied
for the reason why the job is waiting for execution.
To allow Slurm to manage dependencies for you, provide the following flag:
--kill-on-invalid-dep=<yes|no>
If yes, then if a dependency is failed, the job will automatically cancel itself. Our current configuration already removes jobs whose dependencies will never be met, but it is probably best to always include this flag when submitting a job with dependencies (in case our configuration changes in the future).