NOTICE: FULL O2 Cluster Outage, January 3 - January 10th

O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10

  • on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
  • on Jan 3 (6:00 PM): O2 systems will start being powered off.

This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.

Specifically:

  • The O2 Cluster will be completely offline, including O2 Portal.
  • All data on O2 will be inaccessible.
  • Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
  • Websites on O2 will be completely offline, including all web content.

More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation

Over Efficient Jobs

If you received an email with the subject line "O2 Overefficient Jobs" and you're not sure why or how, then you're in the right place. 

Quick Overefficiency Example

An overefficient job uses more CPUs or Threads than you requested for the job. For example, 

#!/bin/bash #SBATCH -p short #SBATCH -t 0-01:00 #SBATCH -c 4 bowtie -p 10 -S hg19 -1 read1.fastq -2 read2.fastq myalign.sam

In the above sbatch script, the multithreading option in the bowtie command (-p 10) does not match the number of requested threads for the sbatch job (#SBATCH -c 4).

X bowtie -p 10 -S hg19 -1 read1.fastq -2 read2.fastq myalign.sam



In order to fix it, you can simply adjust the value of the argument -p to match the value specified in #SBATCH -c

✔ bowtie -p 4 -S hg19 -1 read1.fastq -2 read2.fastq myalign.sam



NOTE: The particular letter or word used for the multithreading option depends on the particular software tool. It's -p for bowtie and -num_threads for blast. Some programs can't  multithread at all, in which case requesting multiple cores from SLURM will simply wasting resources and increase your pending time.

What is Job Efficiency?

Job efficiency is the amount of CPU resources a job uses compared to the amount of CPU resources you requested for the job (with #SBATCH -c or srun -c). Ideally, these will be equal, and the efficiency will be approximately 1.

Why is Job Efficiency Important?

O2 has dozens of cores per node. In general, each core can run a single process/thread. The SLURM scheduler dispatched (assigns) jobs to nodes based on the number of cores requested, as well as the amount of memory required (plus other criteria specified in a user's sbatch command). Multiple jobs by different users often run on the same node, so one user's job may negatively impact another user's job.

Efficiency Categories

A SLURM job can fall in one of three categories 

  1. Efficient - If the number of active threads started within the job matches the number of requested CPU cores and the efficiency is close to 100%, the process and its threads are efficiently using the allocated resources. 


  2. Overefficient - If the process starts more threads than the number of cores allocated. The efficiency won't exceed 100% because a linux kernel feature called cgroups forces all the parallel threads to access only the allocated cores. However, the parallel threads are now fighting for the same, under-allocated CPU resource. Therefore any parallelization advantage is lost, and the job is actually running slower because it's paying the negative cost of parallelization without receiving any positive benefit. In addition, overbooking the CPU resources with extra threads causes the entire node to overload, and impacts performance of all users running on the same node. 


  3. Underefficient - If the efficiency is <75%, you are likely using less resources than you've allocated. This is occasionally unavoidable, for example for single core jobs that are performing very intense input/output operations. However, if you allocated multiple cores and your job is expected to run with parallelization (multithreading), a low CPU efficiency usually indicates that the job is not leveraging all the allocated resources in an optimal way.   



SLURM Variable

After submitting an sbatch job, the SLURM scheduler creates a variable ( $SLURM_CPUS_PER_TASK) with the number of requested threads. That variable can be used as a value for a multithreading option. For example,

#!/bin/bash #SBATCH -p short #SBATCH -t 0-01:00 #SBATCH -c 4 bowtie -p $SLURM_CPUS_PER_TASK -S hg19 -1 read1.fastq -2 read2.fastq myalign.sam

In the above example, the SLURM variable was used to specify the value (4) for the multithreading option (-p) in the bowtie command.

Troubleshooting Overefficient and Underefficient Jobs 

The most common cause of overefficient and underefficient jobs is requesting a certain number of cores from SLURM and requesting a different number of cores from the program you're running, as in the example at the top of the page. This can happen for several reasons:

  • Not explicitly requesting a specific number of cores from SLURM, in which case it will allocate only 1 core.

  • Not explicitly requesting a specific number of cores from the program (with a multithreading option). Some programs will then use a single core; others (expecting to be run on a 4-core laptop) will try to use all of the cores on the node, which is guaranteed to lead to an overefficient job.

  • Asking for multiple cores from SLURM when the program has no multithreading option. Some people think that asking for more cores will always make a program run faster, but in this case, it will simply increase your pending time and run on a single core; however SLURM will reserve multiple cores while your job runs, which will be wasted.

R package doParallel

In the registerDoParalllel function, please set the cores argument to nthreads-1, where "nthreads" is the number of requested threads in the sbtatch job. For example in R, 

library(doParallel) nthreads <- as.numeric(Sys.getenv('SLURM_CPUS_PER_TASK')) registerDoParallel(cores = nthreads-1)

If using doParallel, you must subtract one thread in the registerDoParallel function to avoid overefficient jobs on O2. doParallel is an mpi-like package, which requires an extra process in the background to orchestrate the other tasks. The subtraction (nthread-1) reserve a thread for that background process and maintain the average load for the job within the allocated resources.

R package BiocParallel

Please set the workers parameters to nthreads-2, where "nthreads" is the number of requested threads for SLURM job. For example, you add the following lines to dynamically adjust the workers value based on the requested resourceses:

If using BiocParallel, you must subtract two to the value of the workers argument in the MulticoreParam function to avoid overefficient jobs on O2.