Table of Contents
...
The gpu_quad partition includes 71 GPUs: 47 single precision RTX 8000 cards with 48GB of VRAM, 8 A40 single precisions cards 48GB of VRAM, 24 double precision Tesla V100s cards with 32GB of VRAM, and 4 double precision A100 cards with 80G of VRAM and 8 A100 MIG cards with 40G of VRAM.
The gpu_requeue partition includes 44 GPUs: 28 single precision RTX 6000 cards with 24GB of VRAM, 2 double precision Tesla M40 cards, 2 A100 cards with 40GB of VRAM, and 12 A100 cards with 80GB of VRAM.
To list current information about all the nodes and cards available for a specific partition, use the command sinfo --Format=nodehost,available,memory,statelong,gres:40 -p <partition> for example:
Code Block | ||
---|---|---|
| ||
login02:~ sinfo --Format=nodehost,available,memory,statelong,gres:40 -p gpu,gpu_quad,gpu_requeue HOSTNAMES AVAIL MEMORY STATE GRES compute-g-16-175 up 257548 mixed gpu:teslaM40:4,vram:24G compute-g-16-176 up 257548 mixed gpu:teslaM40:4,vram:12G compute-g-16-177 up 257548 idlemixed gpu:teslaK80:8,vram:12G compute-g-16-194 up 257548 mixed gpu:teslaK80:8,vram:12G compute-g-16-254 up compute-g-16-197 up 373760 257548 idle mixed gpu:teslaM40teslaV100:24,vram:12G 16G compute-g-16-254255 up 373760 mixed gpu:teslaV100:4,vram:16G compute-g-1617-255145 up 373760770000 mixed gpu:teslaV100rtx8000:410,vram:16G 48G compute-g-17-145146 up 770000 mixed gpu:rtx8000:10,vram:48G compute-g-17-146147 up 770000383000 mixed gpu:rtx8000teslaV100s:104,vram:48G 32G compute-g-17-147148 up 383000 mixed gpu:teslaV100s:4,vram:32G compute-g-17-149 up compute-g-17-148 up 383000 mixed gpu:teslaV100s:4,vram:32G compute-g-17-149150 up 383000 mixed gpu:teslaV100s:4,vram:32G compute-g-17-150151 up 383000 mixed gpu:teslaV100s:4,vram:32G compute-g-17-151152 up 383000 mixed gpu:teslaV100s:4,vram:32G compute-compute-g-17-152153 up 383000 mixed gpu:teslaV100srtx8000:43,vram:32G 48G compute-g-17-153154 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-154155 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-156 up compute-g-17-155 up383000 mixed 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-156157 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-157158 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-158159 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-159160 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-160161 up 383000 mixed gpu:rtx8000:3,vram:48G compute-g-17-161162 up 383000500000 mixed gpu:rtx8000a40:34,vram:48G compute-g-17-163 up compute-g-17-162 up 500000 mixed gpu:a40:4,vram:48G compute-g-17-163164 up 500000 mixed gpu:a40a100:4,vram:48G 80G compute-g-17-164165 up 500000 mixed gpu:a100.mig:48,vram:80G40G compute-g-16-197 up 257548 mixed gpu:teslaM40:2,vram:12G compute-gc-17-245 up 383000 idlemixed gpu:rtx6000:10,vram:24G compute-gc-17-246 up 383000 idlemixed gpu:rtx6000:10,vram:24G compute-gc-17-247 up 383000 idlemixed gpu:rtx6000:8,vram:24G compute-gc-17-249 up 1000000 allocatedmixed gpu:a100:2,vram:40G compute-gc-17-252 up 1000000 idle 1000000 gpu:a100:4,vram:80G mixed gpu:a100:4,vram:80G compute-gc-17-253 up 1000000 mixed allocated gpu:a100:4,vram:80G compute-gc-17-254 up 1000000 mixed gpu:a100:4,vram:80G |
GPU Partition
The gpu partition is open to all O2 users; to run jobs on the gpu partition use the flag -p gpu
...
The gpu_quad partition is open to any users working for a PI with a primary or secondary appointment in a pre-clinical department; to run jobs on the gpu_quad partition use the flag -p gpu_quad. If you work at an affiliate institution but are collaborating with an on-Quad PI, please contact Research Computing to gain access.
...
Code Block |
---|
login01:~ module load gcc/69.2.0 cuda/1011.17 |
Note that if you are running a precompiled GPU application, for example a pip-installed Tensorflow, you will need to load the same version of CUDA that was used to compile your application (Tensorflow==2.2.0 was compiled using CUDA 10.1)
...
Code Block | ||
---|---|---|
| ||
login01:sbatch gpujob.sh Submitted batch job 6900310 where gpujob.sh contains #----------------------------------------------------------------------------------------- #!/bin/bash #SBATCH -c 4 #SBATCH -t 6:00:00 #SBATCH -p gpu_quad #SBATCH --gres=gpu:2 module load gcc/69.2.0 module load cuda/911.07 ./deviceQuery #this is just an example #----------------------------------------------------------------------------------------- |
...
Code Block | ||
---|---|---|
| ||
#!/bin/bash #SBATCH -c 4 #SBATCH -t 6:00:00 #SBATCH -p gpu_quad #SBATCH --gres=gpu:2 module load gcc/69.2.0 module load cuda/911.07 /n/cluster/bin/job_gpu_monitor.sh & ./deviceQuery #this is just an example |
...