Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Job Reason

Meaning

AssocGrpMemLimit

The job cannot run because you are currently using the maximum amount of memory allowed overall per user (12TiB). A similar reason will be seen if you have hit the maximum amount of cores allowed to be used at one type per user (1500 cores).

AssocGrpCPURunMinutesLimit

The Lab might have reached its allocatable CPU hour limit; this might happen if few users in the Lab have allocated thousands of medium jobs (2~5 days) or hundreds of longer jobs.

Dependency

The job can't start until a job dependency finishes.

JobHeldAdmin

The job will stay pending, as it has been held by an administrator.

JobHeldUser

The job will stay pending, as it has been held by the user.

NodeDown

A node that the job requires is in "down" state, meaning that the node can't be currently used.

Priority

Your job has lower priority than others in the queue. The jobs with higher priority must dispatch first.

QOSMaxJobsPerUserLimit

This job is unable to run because you have submittted more jobs of a certain type (e.g. >2 jobs in interactive partition, or >2 jobs in priority partition) than are allowed to run at one time. The "QOS" refers to "quality of service", through which these number of concurrent jobs are limited. For example, you will see this reason if you try to have more than two jobs running one time in the interactive partition.

ReqNodeNotAvail

A node that the job requests using cannot currently accept jobs. ReqNodeNotAvail is a generic reason; in the simplest case, it means that the node is fully in use and is unable to run any more jobs. In other scenarios, this reason can indicate that there is a problem with the node.

ReqNodeNotAvail, UnavailableNodes:

This job reason is most commonly seen when there is an upcoming reservation for a maintenance window. Reservations are used to ensure that the required resources are available during a specific time frame. RC uses reservations to reserve all the nodes in the cluster during times when maintenance will be done, so no user jobs will be affected.

Resources

The required resources for running this job are not yet available.

...