Page Comparison

Table of Contents

...

The gpu_quad partition is open to any users working for a PI with a primary or secondary appointment in a pre-clinical department; to run jobs on the gpu_quad partition use the flag -p gpu_quad. If you work at an affiliate institution but are collaborating with an on-Quad PI, please contact Research Computing to gain access.

...

Note
Starting from July 1st 2021 the gpu_requeue partition will be available only to users working for a PI with a primary or secondary appointment in a pre-clinical department.

For detailed information about the gpu_requeue see O2 GPU Re-Queue Partition.

...

GPU partitions are now composed of both single and double precision GPU nodes, if you are certain that your GPU job requires a double precision card (i.e. Tesla), you can add to your submission line the flag --constraint=gpu_doublep to ensure that you job will be dispatched on a double precision GPU node.

How to log the job's GPU utilization.

The Slurm scheduler is not able to capture the percent of GPU resources actually used by the processes running within an O2 GPU job. To provide some basic information about the GPU utilization we created the script job_gpu_monitor.sh that users can run within their jobs and that provides utilization data stored in a file called <jobid>.gpulog. This file is created in the job's default working directory and contains the following entries: Timestamp GPU_utilization(%) GPU_VRAM(%) GPU_VRAM recorded with a 5 minutes interval.

Timestamp is the time when the usage is measured, GPU_utilization(%) is the percent of the GPU card utilization as reported by nvidia-smi, GPU_VRAM(%) is the percent of the total memory on the card used by the job, GPU_VRAM is the amount of memory used in MiB.

To collect information about the actual GPU utilization add the line /n/cluster/bin/job_gpu_monitor.sh & in your sbatch script right before the job's commands.

For example:

Code Block

language	text

#!/bin/bash
#SBATCH -c 4
#SBATCH -t 6:00:00
#SBATCH -p gpu_quad
#SBATCH --gres=gpu:2

module load gcc/6.2.0
module load cuda/9.0

/n/cluster/bin/job_gpu_monitor.sh &

./deviceQuery  #this is just an example

Versions Compared

Old Version 6

New Version 7

Key

How to log the job's GPU utilization.