NOTICE: FULL O2 Cluster Outage, January 3 - January 10th

O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10

  • on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
  • on Jan 3 (6:00 PM): O2 systems will start being powered off.

This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.

Specifically:

  • The O2 Cluster will be completely offline, including O2 Portal.
  • All data on O2 will be inaccessible.
  • Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
  • Websites on O2 will be completely offline, including all web content.

More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation

How to use HMS RC Desktop App

This app will start an interactive Linux Desktop on one of the O2 cluster compute nodes. This allows you to run graphical Linux tools, which may be helpful if you don’t have a Linux desktop available. Also, the O2 compute nodes have more RAM (memory) and cores available than many desktops, which you can reserve as shown below.

Several fields, described below, are available to customize your job submission.

Account:

This is the Slurm Account associated with your Slurm User. You can find your Slurm account by running the command sshare -U -u $USER from a shell within the O2 cluster. It’s usually related to your lab name.

Partition:

This is the partition you want to use to submit the job, you can use any of the standard O2 partition except "interactive"; we recommend using the "priority" partition, unless you’re running other jobs there already. 

Number of cores:

This is the number of CPU cores you want to allocate for this job. You probably need only 1 core, unless you expect to be running multiple simultaneous tools within the desktop, or you are running a multi-core program.

GPUs (optional):

If you selected a GPU partition, then specify the GPUs to allocate for the job.

GPU card type (optional):

If you selected a GPU partition, then select a GPU card type

Wall Time requested in hours:

This is the desired time, in hours, you want to allocate for the OOD job. The maximum value allowed depends on the partition you selected. 

Total Memory in GB:

This is the amount of memory (RAM) in GB you want to allocate for your job. 

Filesystem to make available

By default only your home directory (like /home/abc123, aka $HOME) and the O2 application filesystems are available. To start the Desktop with additional filesystems (/n/groups, /n/data1, ...) check the desired box(es). For example, you may want to access your lab’s group folder.

Slurm Custom Arguments

This is an optional text field that can be used to pass additional flags to the Slurm scheduler when submitting the job.

After setting the above fields, click on the Launch button, which will submit the job.

While your job is pending on the queue you should  see a page like:

The Session ID highlighted link can be used to see the log files created for the current jobs in a new OOD browser tab.

When the job is dispatched and ready to run you should see a screen like:

To open the Desktop application, click on the Launch Desktop Mate button.

A new tab should open displaying the Desktop interface like:

Once you finish using the app, you need to close the Desktop browser tab and click the Delete button from the OpenOnDemand Interactive session in the previous tab. 

 

Note: Closing the browser will not terminate active applications. Your OOD job will keep running until it reaches the requested Wall Time limit or the "Delete" button is used.

 

How to troubleshoot problems

If the job does not work properly, please make sure to record the actual O2 jobid printed at the top of the interactive app window

( 29091140 in this example) and click on the Session ID highlighted link which should open a web browser page like:

 

To troubleshoot your problem you can start by checking the output log in the file output.log. Select the file and click the "View" button to see the contents of the file.

If you need additional help you can reach out to rchelp@hms.harvard.edu, make sure to include the full path listed on the OOD file page along with any content printed in the output.log file.