NOTICE: FULL O2 Cluster Outage, January 3 - January 10th

O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10

  • on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
  • on Jan 3 (6:00 PM): O2 systems will start being powered off.

This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.

Specifically:

  • The O2 Cluster will be completely offline, including O2 Portal.
  • All data on O2 will be inaccessible.
  • Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
  • Websites on O2 will be completely offline, including all web content.

More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation

Graphical User Interface App

This app can be used to launch your favorite GUI or start an interactive shell with GUI support on one of the O2 compute nodes.

After clicking on the HMS RC Graphical User Interface application, you should see the page:

 

where you can select several parameters for your GUI job:

Slurm Account:

This is the Slurm Account associated with your Slurm User. You can find your Slurm account by running the command sshare -U -u $USER from a shell within the O2 cluster.

Partition:

This is the partition you want to use to submit the job. 

Wall Time requested in hours:

This is the desired time, in hours, you want to allocate for the OOD job. The maximum value admissible depends on the partition you select. 

Number of cores:

This is the number of CPU cores you want to allocate for this job.

Number of GPU cards:

This is the number of GPU cards you want to allocate for this job. If you want to allocate one or more GPU card make sure to select a partition which supports GPU jobs. Leave this field blank if you do not need a GPU card

GPU card type:

Here you can select a particular type of GPU card. If you request a specific type of GPU card make sure to select a partition which includes the GPU type you are requesting.

Total Memory in GB:

This is the amount of memory (RAM) in GB you want to allocate for your job. 

Additional modules to be preloaded

You can enter here O2 modules to be preloaded before the GUI command is started; for example, this could be a module for the GUI tool you want to run.

Command to start the GUI application

You must enter here the shell command that will start your GUI. You can use the Additional modules form or the Command required to set a customized environment form to properly configure your shell for running the desired GUI command.

The default command is xterm which will start a standard O2 shell (terminal) interactive job, with GUI support. You can also start your custom GUI tool from the xterm shell.

Commands required to set a customized environment

This is an optional text field that can be used to set your required environment for running the GUI command. For example, you can use this form to activate a Python or Conda environment.

Note that commands you enter here can also be executed directly in the xterm shell.

Slurm Custom Arguments

This is an optional text field that can be used to pass additional flags to the Slurm scheduler when submitting the job.

 

After setting the above fields, click on the Launch button, which will submit the job.

While your job is pending on the queue, you should  see a page like:

The Session ID highlighted link can be used to see the log files created for the current jobs on a new OOD browser tab.

When the job is dispatched and ready to run you should see a screen like:

You can control the compression and quality of the graphics with the two control bars. To open the application GUI click on the Launch Graphical User Interface button.

A new tab should open with the GUI; if you are using xterm the tab should look like this:

When done, you need to close the GUI browser tab and click the Delete button from the OpenOnDemand Interactive session. 

Note: Closing OpenOnDemand browser will not terminate active applications. Your OOD job will keep running until it reaches the requested Wall Time limit or the "Delete" button is used.

How to debug problems

If something does not work properly, please make sure to record the actual O2 jobid  printed at the top of the interactive app window

 

( 11452350 in the example) and click on the Session ID highlighted link, which should open the OOD file editor in the folder where the job’s log files are written.

To debug your problem, you can start by checking the output log in the file output.log.

If you need additional help, you can reach out to rchelp@hms.harvard.edu; make sure to include the full path listed on the OOD file page along with any content printed in the output.log file and the jobid number.

What to do if you accidentally minimize the GUI windows

If you accidentally minimize the GUI window, you can bring it back using the VNC Alt option and the tab key on your keyboard.  Your session is still active and can be brought back by first clicking on the VNC menu bar

then select the "A" option and the "Alt" button. 

 

Finally, using the "Tab" on your local keyboard, you can bring back all active windows in the sessions:

and by clicking with the mouse the desired GUI can be brought back to the screen.

After resuming your GUI, remember to unselect the Alt button before resuming your work; leaving the VNC Alt button selected will be equivalent to adding Alt to anything you type.