Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

The gpu_quad partition includes 71 GPUs: 47 single precision RTX 8000 cards with 48GB of VRAM, 8 A40 single precisions cards 48GB of VRAM, 24 double precision Tesla V100s cards with 32GB of VRAM, and 4 double precision A100 cards with 80G of VRAM and 8 A100 MIG cards with 40G of VRAM.

The gpu_requeue partition includes 44 GPUs: 28 single precision RTX 6000 cards with 24GB of VRAM, 2 double precision Tesla M40 cards, 2 A100 cards with 40GB of VRAM, and 12 A100 cards with 80GB of VRAM.

To list current information about all the nodes and cards available for a specific partition, use the command sinfo  --Format=nodehost,available,memory,statelong,gres:40 -p <partition> for example:

Code Block
languagetext
login02:~ sinfo  --Format=nodehost,available,memory,statelong,gres:40 -p gpu,gpu_quad,gpu_requeue
HOSTNAMES           AVAIL               MEMORY              STATE               GRES
compute-g-16-175    up                  257548              mixed               gpu:teslaM40:4,vram:24G              
  
compute-g-16-176    up                  257548              mixed               gpu:teslaM40:4,vram:12G  
              
compute-g-16-177    up                  257548              idlemixed                gpu:teslaK80:8,vram:12G  
              
compute-g-16-194    up                  257548              mixed               gpu:teslaK80:8,vram:12G
compute-g-16-254    up             compute-g-16-197    up 373760                 257548              idle mixed               gpu:teslaM40teslaV100:24,vram:12G                 
16G
compute-g-16-254255    up                  373760              mixed               gpu:teslaV100:4,vram:16G 
              
compute-g-1617-255145    up                  373760770000              mixed               gpu:teslaV100rtx8000:410,vram:16G                48G
compute-g-17-145146    up                  770000              mixed               gpu:rtx8000:10,vram:48G              
  
compute-g-17-146147    up                  770000383000              mixed               gpu:rtx8000teslaV100s:104,vram:48G                 
32G
compute-g-17-147148    up                  383000              mixed               gpu:teslaV100s:4,vram:32G
compute-g-17-149    up             compute-g-17-148    up                  383000              mixed               gpu:teslaV100s:4,vram:32G            
  
compute-g-17-149150    up                  383000              mixed               gpu:teslaV100s:4,vram:32G               
compute-g-17-150151    up                  383000              mixed               gpu:teslaV100s:4,vram:32G
              
compute-g-17-151152    up                  383000              mixed               gpu:teslaV100s:4,vram:32G
              
compute-compute-g-17-152153    up                  383000              mixed               gpu:teslaV100srtx8000:43,vram:32G               
48G
compute-g-17-153154    up                  383000              mixed               gpu:rtx8000:3,vram:48G
                 
compute-g-17-154155    up                  383000              mixed               gpu:rtx8000:3,vram:48G
compute-g-17-156    up              compute-g-17-155    up383000              mixed    383000              mixed               gpu:rtx8000:3,vram:48G                 

compute-g-17-156157    up                  383000              mixed               gpu:rtx8000:3,vram:48G
                 
compute-g-17-157158    up                  383000              mixed               gpu:rtx8000:3,vram:48G
                 
compute-g-17-158159    up                  383000              mixed               gpu:rtx8000:3,vram:48G
                 
compute-g-17-159160    up                  383000              mixed               gpu:rtx8000:3,vram:48G                
 
compute-g-17-160161    up                  383000              mixed               gpu:rtx8000:3,vram:48G
                 
compute-g-17-161162    up                  383000500000              mixed               gpu:rtx8000a40:34,vram:48G
compute-g-17-163    up              compute-g-17-162    up                      500000              mixed               gpu:a40:4,vram:48G         
            
compute-g-17-163164    up                  500000              mixed               gpu:a40a100:4,vram:48G                      
80G
compute-g-17-164165    up                  500000              mixed               gpu:a100.mig:48,vram:80G40G
compute-g-16-197    up                  257548              mixed               gpu:teslaM40:2,vram:12G
compute-gc-17-245   up                  383000              idlemixed                gpu:rtx6000:10,vram:24G
                
compute-gc-17-246   up                  383000              idlemixed                gpu:rtx6000:10,vram:24G
                
compute-gc-17-247   up                  383000              idlemixed                gpu:rtx6000:8,vram:24G                 

compute-gc-17-249   up                  1000000             allocatedmixed               gpu:a100:2,vram:40G               
     
compute-gc-17-252   up                  1000000             idle         1000000       gpu:a100:4,vram:80G      mixed               gpu:a100:4,vram:80G
compute-gc-17-253   up                  1000000             mixed    allocated           gpu:a100:4,vram:80G         
           
compute-gc-17-254   up                  1000000             mixed               gpu:a100:4,vram:80G                     

GPU Partition

The gpu partition is open to all O2 users; to run jobs on the gpu partition use the flag -p gpu

...

The gpu_quad partition is open to any users working for a PI with a primary or secondary appointment in a pre-clinical department; to run jobs on the gpu_quad partition use the flag -p gpu_quad. If you work at an affiliate institution but are collaborating with an on-Quad PI, please contact Research Computing to gain access.

...

Code Block
login01:~ module load gcc/69.2.0 cuda/1011.17


Note that if you are running a precompiled GPU application, for example a pip-installed Tensorflow, you will need to load the same version of CUDA that was used to compile your application (Tensorflow==2.2.0 was compiled using CUDA 10.1)

...

Code Block
languagetext
login01:sbatch gpujob.sh
Submitted batch job 6900310


where gpujob.sh contains


#-----------------------------------------------------------------------------------------
#!/bin/bash
#SBATCH -c 4
#SBATCH -t 6:00:00
#SBATCH -p gpu_quad
#SBATCH --gres=gpu:2

module load gcc/69.2.0
module load cuda/911.07

./deviceQuery  #this is just an example 


#-----------------------------------------------------------------------------------------

...

Code Block
languagetext
#!/bin/bash
#SBATCH -c 4
#SBATCH -t 6:00:00
#SBATCH -p gpu_quad
#SBATCH --gres=gpu:2

module load gcc/69.2.0
module load cuda/911.07

/n/cluster/bin/job_gpu_monitor.sh &

./deviceQuery  #this is just an example 

...