Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
sinfo --Format=nodehost,cpusstate,memory,statelong,gres -p gpu_requeue
HOSTNAMES           CPUS(A/I/O/T)       MEMORY              STATE               GRES
compute-g-16-197    0/20/0/20           257548              idle                gpu:teslaM40:2
compute-gc-17-245   0/48/0/48           385218              idle                gpu:rtx6000:10
compute-gc-17-246   0/48/0/48           385218              idle                gpu:rtx6000:10
compute-gc-17-247   0/48/0/48           385218              idle                gpu:rtx6000:8

How Preemption Works

The labs that purchased these nodes have preemption priority on their own hardware. If the nodes are full and a researcher from one of those labs submits a job, one or more GPU jobs running on the gpu_requeue partition might be killed and re-queued in order to free resources for the Lab's job. That is, the gpu_requeue job will be cancelled, as if you ran the scancel command, and re-submitted (as long as you initially submitted with the flag --requeue).

...