Using O2 GPU resources

1 GPU Resources in O2
2 GPU Partition Limits
3 How to submit a GPU job
4 How to compile and run Cuda programs
5 How to run double precision GPU jobs
6 How to log the job's GPU utilization.

GPU Resources in O2

There are 53 GPU nodes with a total of 240 GPU cards available on the O2 cluster. The nodes are accessible in three gpu partitions: gpu, gpu_quad, gpu_requeue .

The gpu partition currently includes 44 GPU cards: 8 Tesla V100 with 16GB of VRAM and 36 L40s with 48GB of VRAM.

The gpu_quad partition includes 140 GPUs: 44 single precision RTX 8000 cards with 48GB of VRAM, 8 A40 single precisions cards 48GB of VRAM, 52 L40S single precisions cards 48GB of VRAM, 24 double precision Tesla V100s cards with 32GB of VRAM, 4 double precision A100 cards with 80G of VRAM and 8 A100 MIG cards with 40G of VRAM.

The gpu_requeue partition includes 88 GPUs: 28 single precision RTX 6000 cards with 24GB of VRAM, 36 single precision L40S cards, 2 single precision L40 cards, 2 A100 cards with 40GB of VRAM, and 22 A100 cards with 80GB of VRAM.

To list current information about all the nodes and cards available for a specific partition, use the command sinfo --Format=nodehost,available,memory,statelong,gres:40 -p <partition> for example:

GPU Partition

sinfo --Format=nodehost,available,memory,statelong,gres:40 -p gpu,gpu_quad,gpu_requeue 
HOSTNAMES AVAIL MEMORY STATE GRES 
compute-g-17-169 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-16-254 up 373000 mixed gpu:teslaV100:4,vram:no_consume:16G 
compute-g-16-255 up 373000 mixed gpu:teslaV100:4,vram:no_consume:16G 
compute-g-17-168 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-169 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-170 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-200 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-201 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-202 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-203 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-204 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-205 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-145 up 770000 mixed gpu:rtx8000:10,vram:no_consume:48G 
compute-g-17-146 up 770000 mixed gpu:rtx8000:10,vram:no_consume:48G 
compute-g-17-147 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-148 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-149 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-150 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-151 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-152 up 383000 mixed gpu:teslaV100s:4,vram:no_consume:32G 
compute-g-17-153 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-154 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-156 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-157 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-158 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-159 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-160 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-161 up 383000 mixed gpu:rtx8000:3,vram:no_consume:48G 
compute-g-17-162 up 500000 mixed gpu:a40:4,vram:no_consume:48G 
compute-g-17-163 up 500000 mixed gpu:a40:4,vram:no_consume:48G 
compute-g-17-164 up 500000 mixed gpu:a100:4,vram:no_consume:80G
compute-g-17-165 up 500000 mixed gpu:a100.mig:8,vram:no_consume:40G 
compute-g-17-166 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-167 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-g-17-171 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-gc-17-211 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-gc-17-241 up 1000000 mixed gpu:l40s:4,vram:no_consume:48G
compute-gc-17-242 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-gc-17-243 up 500000 mixed gpu:l40s:4,vram:no_consume:48G
compute-gc-17-244 up 1030000 mixed gpu:l40:2,vram:no_consume:45G 
compute-gc-17-245 up 383000 mixed gpu:rtx6000:10,vram:no_consume:24G 
compute-gc-17-246 up 383000 mixed gpu:rtx6000:10,vram:no_consume:24G 
compute-gc-17-247 up 383000 mixed gpu:rtx6000:8,vram:no_consume:24G 
compute-gc-17-249 up 1000000 mixed gpu:a100:2,vram:no_consume:40G
compute-gc-17-252 up 1000000 mixed gpu:a100:4,vram:no_consume:80G
compute-gc-17-254 up 1000000 mixed gpu:a100:4,vram:no_consume:80G
compute-gc-17-206 up 500000 allocated gpu:l40s:4,vram:no_consume:48G
compute-gc-17-207 up 500000 allocated gpu:l40s:4,vram:no_consume:48G
compute-gc-17-208 up 500000 allocated gpu:l40s:4,vram:no_consume:48G
compute-gc-17-209 up 500000 allocated gpu:l40s:4,vram:no_consume:48G
compute-gc-17-210 up 500000 allocated gpu:l40s:4,vram:no_consume:48G
compute-gc-17-253 up 1000000 allocated gpu:a100:4,vram:no_consume:80G

The gpu partition is open to all O2 users; to run jobs on the gpu partition use the flag -p gpu

GPU_QUAD and GPU_MPI_QUAD Partitions

The gpu_quad partition is open to any users working for a PI with a primary or secondary appointment in a pre-clinical department; to run jobs on the gpu_quad partition use the flag -p gpu_quad. If you work at an affiliate institution but are collaborating with an on-Quad PI, please contact Research Computing to gain access.

The gpu_mpi_quad partition can support GPU jobs using distributed memory parallelization, if you believe your jobs can benefit from this partition please reach out to rchelp@hms.harvard.edu to gain access.

GPU_REQUEUE Partition

The O2 cluster includes several contributed GPU cards, purchased and owned directly by HMS Labs. When idle, those GPU resources are made available in O2 under our gpu_requeue partition. However, if a member of a purchasing lab submits a job, your job may be killed and resubmitted at any time.

From July 1st 2021 the gpu_requeue partition is available only to users working for a PI with a primary or secondary appointment in a pre-clinical department.

For detailed information about the gpu_requeue see O2 GPU Re-Queue Partition.

GPU Partition Limits

The following limits are applied only to the gpu partition in order to facilitate a fair use of the limited resources:

GPU hours

The amount of GPU resources that can be used by each user at a given time is measured in terms of GPU hours / user. Currently there is an active limit of 200 GPU hours for each user.

For example, at any time each user can allocate* at most 2 GPU cards for 100 hours,20 GPU cards for 10 hours, or any other combination that does not exceed the total GPU hours limit. (If you use just 1 GPU card, the partition maximum wall time will limit you to 120 hours.)

* as resources allow

Memory

Each user can have a total of up to 420 GiB of memory allocated for all currently running GPU jobs

CPU cores

Each user can have a total of up to 34 cores allocated for all currently running GPU jobs

Those limits will be adjusted as our GPU capacity evolves. If those limits are reached by running jobs, any remaining pending jobs will display AssocGrpGRESRunMinutes in the NODELIST(REASON) field.

The gpu_quad and gpu_requeue partition are not affected by those limits.

How to submit a GPU job

You must not reassign the variable CUDA_VISIBLE_DEVICES;

the Slurm scheduler presets the correct value for CUDA_VISIBLE_DEVICES and alterining the preset value will likely cause your job to run without a GPU card.

Most GPU application will require access to CUDA Toolkit libraries, so before submitting a job you will likely need to load one of the available CUDA modules, for example:

login01:~ module load gcc/9.2.0 cuda/11.7

Note that if you are running a precompiled GPU application, for example a pip-installed Tensorflow, you will need to load the same version of CUDA that was used to compile your application (Tensorflow==2.2.0 was compiled using CUDA 10.1)

To submit a GPU job on O2, use one of the available partition: gpu, gpu_quad or gpu_requeue, and add a flag like --gres=gpu:1 to request a GPU resource. The example below starts an interactive bash job requesting 1 CPU core and 1 GPU card. This starts a session on one of the GPU-containing nodes, where you can test and debug programs that use GPU.

login01:~ srun -n 1 --pty -t 1:00:00 -p gpu --gres=gpu:1 bash

srun: job 6900282 queued and waiting for resources
srun: job 6900282 has been allocated resources
compute-g-16-176:~

While this other example submits a batch job requesting 2 GPU cards and 4 CPU cores:

login01:sbatch gpujob.sh
Submitted batch job 6900310


where gpujob.sh contains


#-----------------------------------------------------------------------------------------
#!/bin/bash
#SBATCH -c 4
#SBATCH -t 6:00:00
#SBATCH -p gpu_quad
#SBATCH --gres=gpu:2

module load gcc/9.2.0
module load cuda/11.7

./deviceQuery  #this is just an example 


#-----------------------------------------------------------------------------------------

It is possible to request a specific type of GPU card by using the --gres flag. For example --gres=gpu:teslaM40:3 can be used to request 3 GPU Tesla M40 cards.

Currently the GPU flags available are: teslaK80, teslaM40, teslaV100, teslaV100s, rtx6000, rtx8000, a100 however each partitions might only have a subset of those card types, as indicated in the first paragraph.

It is also possible to request a minimum amount of VRAM available on the GPU card to be allocated for the job. This can be done using the gres vram. For example using the flag --gres=gpu:1,vram:15G would request a GPU card that has at least 15G of VRAM. To see the VRAM of each card type in O2 you can use the Slurm command sinfo -p gpu,gpu_quad,gpu_requeue --Format=nodehost,gres:40

How to compile and run Cuda programs

In most cases a cuda library and compiler module must be loaded in order to compile cuda programs. To see which cuda modules are available use the command module spider cuda, then use the command module load to load the desired version.

login04:~ module spider cuda

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  cuda:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     Versions:
        cuda/8.0
        cuda/9.0
        cuda/10.0
        cuda/10.1
        cuda/10.2
        cuda/11.2
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "cuda" module (including how to load the modules) use the module's full name.
  For example:

     $ module spider cuda/9.0
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------




login01:~ module load gcc/6.2.0 cuda/10.1

Note: you will likely still need to load a Cuda module to run any precompiled cuda/GPU software.

How to run double precision GPU jobs

GPU partitions are now composed of both single and double precision GPU nodes, if you are certain that your GPU job requires a double precision card (i.e. Tesla), you can add to your submission line the flag --constraint=gpu_doublep to ensure that you job will be dispatched on a double precision GPU node.

How to log the job's GPU utilization.

The Slurm scheduler is not able to capture the percent of GPU resources actually used by the processes running within an O2 GPU job. To provide some basic information about the GPU utilization we created the script job_gpu_monitor.sh that users can run within their jobs and that provides utilization data stored in a file called <jobid>.gpulog. This file is created in the job's default working directory and contains the following entries: Timestamp GPU_utilization(%) GPU_VRAM(%) GPU_VRAM recorded with a 5 minutes interval.

Timestamp is the time when the usage is measured, GPU_utilization(%) is the percent of the GPU card utilization as reported by nvidia-smi, GPU_VRAM(%) is the percent of the total memory on the card used by the job, GPU_VRAM is the amount of memory used in MiB.

To collect information about the actual GPU utilization add the line /n/cluster/bin/job_gpu_monitor.sh & in your sbatch script right before the job's commands.

For example:

#!/bin/bash
#SBATCH -c 4
#SBATCH -t 6:00:00
#SBATCH -p gpu_quad
#SBATCH --gres=gpu:2

module load gcc/9.2.0
module load cuda/11.7

/n/cluster/bin/job_gpu_monitor.sh &

./deviceQuery  #this is just an example