NOTICE: FULL O2 Cluster Outage, January 3 - January 10th

O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10

  • on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
  • on Jan 3 (6:00 PM): O2 systems will start being powered off.

This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.

Specifically:

  • The O2 Cluster will be completely offline, including O2 Portal.
  • All data on O2 will be inaccessible.
  • Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
  • Websites on O2 will be completely offline, including all web content.

More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation

Understanding O2 Slurm Accounts/Associations and Unix Accounts/Groups



The purpose of this page is to illustrate the different components controlling authentication and job submission in the O2 cluster.

There are four main components: Unix account, Unix groups, Slurm accounts and Slurm associations.

  • Note that the term "HMS Account" is equivalent to what was formerly called "eCommons ID"

Unix account/user

used for O2 login access (your HMS Account)

Any user must have a unix account in order to access the O2 cluster. The unix account is the "identity" of the user, and it is involved in the authentication process when connecting to the cluster. Some key informations are also associated with the unix account such as the default linux shell to be used or the $HOME path. You can use the command  getent passwd $USER to see more about your unix account.

O2 authentication is based on the HMS Accounts, formerly known as Ecommons Accounts. Therefore your unix account name is the same as your HMS Account name and your password for the O2 unix account is the same as your HMS Account password.

Unix group 

Each unix account is in at least one unix group. This "personal user group" has the same name as the unix account.

A unix account may also be a member of other unix groups to provide access to shared storage space or special computational resources on O2.

To see which groups are associated with your unix account, run the command groups.

Slurm Account

Slurm accounts are different than Unix accounts and are used to track cluster utilization, control access to resources, and enforce limits for groups of users (unix accounts). 

Each Lab or Core that uses O2, at HMS or Affiliated Institutions, is assigned a unique Slurm account. Typically the Slurm account name is composed of the PI's last name and his/her HMS account username, like park_pjp8 (or less often just park).

The PI is responsible for the cluster utilization done by all the users associated with his/her Slurm account.

Slurm Association

In order to use O2 and submit jobs to the Slurm scheduler any user (unix account) must be associated with at least one Slurm Account. Typically each user will have a Slurm association with the Slurm Account assigned to his/her Lab/PI.

If a user is working for multiple Labs he/she might have multiple associations with the corresponding Slurm accounts. A simple way to see your association is to run the command sshare -u $USER -U, the first column reports the Slurm account associations. 

For example, the following user is associated with two SLURM accounts (i.e., rccg and lab1).

$ sshare -u $USER -U Account User RawShares NormShares RawUsage EffectvUsage FairShare -------------------- ---------- ---------- ----------- ----------- ------------- ---------- rccg ab123 1 0.000241 0 0.000000 1.000000 lab1 ab123 1 0.000241 21252 0.000066 0.968913

If the user wants to submit jobs using the lab1 SLURM account, then

  • Interactive Job (Resources requested: use SLURM account lab1, 30 minutes of walltime, 2 threads, and 2GB of memory)

    • srun --account=lab1 -t 30:00 --pty -p interactive -c 2 --mem=2G bash

  • Non-interactive or sbatch job

    • #SBATCH --account=lab1

    • See below for an example using it within the script called sbatch_R_example.slurm

 

$ cat sbatch_R_example.slurm #!/bin/bash #SBATCH --account=lab1 #SBATCH -p short # Partition to submit to #SBATCH -t 0-00:01 # Time in minutes DD-HH:MM; DD-HH; MM:SS #SBATCH -c 1 # Number of cores requested #SBATCH -N 1 # Ensure that all cores are on one machine #SBATCH --mem=2G # Memory total in GB #SBATCH -o hostname.%j.out # Standard out goes to this file #SBATCH -e hostname.%j.err # Standard err goes to this file # Commands below module load gcc/6.2.0 module load R/3.6.1 # To run a R script called my_r_script.R Rscript my_r_script.R # Submit non-interactive job $ sbatch sbatch_R_example.slurm