Using O2 GPU resources

GPU Resources in O2
There are 27 GPU nodes with a total of 133 GPU cards available on the O2 cluster. The nodes are accessible in three gpu partitions: gpu, gpu_quad, gpu_requeue.
The gpu partition includes 32 double precision GPU cards: 16 Tesla K80, 8 Tesla M40 and 8 Tesla V100.
The gpu_quad partition includes 71 GPUs: 47 single precision RTX 8000 cards and 24 double precision Tesla V100s cards.
The gpu_requeue partition includes 30 GPUs: 28 single precision RTX 6000 cards and 2 double precision Tesla M40 cards.
To list current information about all the nodes and cards available for a specific partition use the command sinfo --Format=nodehost,available,memory,statelong,gres:20 -p <partition> for example:

login02:~ sinfo --Format=nodehost,available,memory,statelong,gres:20 -p gpu,gpu_quad,gpu_requeue HOSTNAMES AVAIL MEMORY STATE GRES compute-g-16-175 up 257548 mixed gpu:teslaM40:4 compute-g-16-176 up 257548 mixed gpu:teslaM40:4 compute-g-16-177 up 257548 mixed gpu:teslaK80:8 compute-g-16-194 up 257548 mixed gpu:teslaK80:8 compute-g-16-254 up 373760 mixed gpu:teslaV100:4 compute-g-16-255 up 373760 mixed gpu:teslaV100:4 compute-g-17-145 up 770000 idle gpu:rtx8000:10 compute-g-17-146 up 770000 idle gpu:rtx8000:10 compute-g-17-147 up 383000 idle gpu:teslaV100s:4 compute-g-17-148 up 383000 idle gpu:teslaV100s:4 compute-g-17-149 up 383000 idle gpu:teslaV100s:4 compute-g-17-150 up 383000 idle gpu:teslaV100s:4 compute-g-17-151 up 383000 idle gpu:teslaV100s:4 compute-g-17-152 up 383000 idle gpu:teslaV100s:4 compute-g-17-153 up 383000 idle gpu:rtx8000:3 compute-g-17-154 up 383000 idle gpu:rtx8000:3 compute-g-17-155 up 383000 idle gpu:rtx8000:3 compute-g-17-156 up 383000 idle gpu:rtx8000:3 compute-g-17-157 up 383000 idle gpu:rtx8000:3 compute-g-17-158 up 383000 idle gpu:rtx8000:3 compute-g-17-159 up 383000 idle gpu:rtx8000:3 compute-g-17-160 up 383000 idle gpu:rtx8000:3 compute-g-17-161 up 383000 idle gpu:rtx8000:3 compute-gc-17-247 up 383000 mixed gpu:rtx6000:8 compute-g-16-197 up 257548 idle gpu:teslaM40:2 compute-gc-17-245 up 383000 idle gpu:rtx6000:10 compute-gc-17-246 up 383000 idle gpu:rtx6000:10

GPU Partition
The gpu partition is open to all O2 users; to run jobs on the gpu partition use the flag -p gpu
GPU_QUAD and GPU_MPI_QUAD Partitions
The gpu_quad partition is open to any users working for a PI with a primary or secondary appointment in a pre-clinical department; to run jobs on the gpu_quad partition use the flag -p gpu_quad. If you work at an affiliate institution but are collaborating with an on-Quad PI, please contact Research Computing to gain access.
The gpu_mpi_quad partition can support GPU jobs using distributed memory parallelization, if you believe your jobs can benefit from this partition please reach out to rchelp@hms.harvard.edu to gain access.

GPU_REQUEUE Partition
The O2 cluster includes several contributed GPU cards, purchased and owned directly by HMS Labs. When idle, those GPU resources are made available to every user in O2 under our gpu_requeue partition. However, if a member of a purchasing lab submits a job, your job may be killed and resubmitted at any time.
For detailed information about the gpu_requeue see O2 GPU Re-Queue Partition.

GPU Partition Limits
The following limits are applied only to the gpu partition in order to facilitate a fair use of the limited resources:
GPU hours
The amount of GPU resources that can be used by each user at a given time is measured in terms of GPU hours / user. Currently there is an active limit of 160 GPU hours for each user.
For example, at any time each user can allocate* at most 2 GPU cards for 80 hours,16 GPU cards for 10 hours or any other combination that does not exceed the total GPU hours limit. (If you use just 1 GPU card, the partition maximum wall time will limit you to 120 hours.)
* as resources allow
Memory
Each user can have a total of up to 420 GiB of memory allocated for all currently running GPU jobs
CPU cores
Each user can have a total of up to 34 cores allocated for all currently running GPU jobs

Those limits will be adjusted as our GPU capacity evolves. If those limits are reached by running jobs, any remaining pending jobs will display AssocGrpGRESRunMinutes in the NODELIST(REASON) field.
The gpu_quad and gpu_requeue partition are not affected by those limits.
How to submit a GPU job
Most GPU application will require access to CUDA Toolkit libraries, so before submitting a job you will likely need to load one of the available CUDA modules, for example:

login01:~ module load gcc/6.2.0 cuda/10.1

Note that if you are running a precompiled GPU application, for example a pip-installed Tensorflow, you will need to load the same version of CUDA that was used to compile your application (Tensorflow==2.2.0 was compiled using CUDA 10.1)
To submit a GPU job on O2, use one of the available partition: gpu, gpu_quad or gpu_requeue, and add a flag like --gres=gpu:1 to request a GPU resource. The example below starts an interactive bash job requesting 1 CPU core and 1 GPU card. This starts a session on one of the GPU-containing nodes, where you can test and debug programs that use GPU.

login01:~ srun -n 1 --pty -t 1:00:00 -p gpu --gres=gpu:1 bash srun: job 6900282 queued and waiting for resources srun: job 6900282 has been allocated resources compute-g-16-176:~

While this other example submits a batch job requesting 2 GPU cards and 4 CPU cores:

login01:sbatch gpujob.sh Submitted batch job 6900310 where gpujob.sh contains #----------------------------------------------------------------------------------------- #!/bin/bash #SBATCH -c 4 #SBATCH -t 6:00:00 #SBATCH -p gpu_quad #SBATCH --gres=gpu:2 module load gcc/6.2.0 module load cuda/9.0 ./deviceQuery #this is just an example #-----------------------------------------------------------------------------------------

It is also possible to request a specific type of GPU card by using the --gres flag. For example --gres=gpu:teslaM40:3 can be used to request 3 GPU Tesla M40 cards.
Currently the GPU flags available are: teslaK80, teslaM40, teslaV100, teslaV100s, rtx6000, rtx8000 however each partitions might only have a subset of those card types, as indicated in the first paragraph.
How to compile and run Cuda programs
In most cases a cuda library and compiler module must be loaded in order to compile cuda programs. To see which cuda modules are available use the command module spider cuda, then use the command module load to load the desired version. Currently the latest version of Cuda toolkit available is 10.2.

login04:~ module spider cuda ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cuda: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Versions: cuda/8.0 cuda/9.0 cuda/10.0 cuda/10.1 cuda/10.2 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- For detailed information about a specific "cuda" module (including how to load the modules) use the module's full name. For example: $ module spider cuda/9.0 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- login01:~ module load gcc/6.2.0 cuda/10.1

Note: you will likely still need to load a Cuda module to run any precompiled cuda/GPU software.
How to run double precision GPU jobs
GPU partitions are now composed of both single and double precision GPU nodes, if you are certain that your GPU job requires a double precision card (i.e. Tesla), you can add to your submission line the flag --constraint=gpu_doublep to ensure that you job will be dispatched on a double precision GPU node.