Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
login05:~ O2squeue
JOBID     PARTITION     STATE       TIME_LIMIT     TIME           NODELIST(REASON)         ELIGIBLE_TIME         START_TIME            TRES_ALLOC
1304733021801263  mediuminteractive   RUNNING     PENDING     1-0012:00:00     0:00  2:09:52         (Resources)compute-a-16-160         2020-11-09T11:35:49      2018-04-27T14:13:05   N/A                   cpu=2,mem=20002020-11-09T11:36:19   cpu=1,mem=2G,node=1,billing=1


The field STATE describes the states of your jobs and it will normally be either PENDING or RUNNING.  When a job is pending NODELIST(REASON) describes the reason why the job is pending, most common reasons are:

...

Code Block
login05:~ O2sacct
          JobID    Partition        State   State          Nodelist     NodeList                Start      Timelimit      TimeLimit              Elapsed       CPUTimeCPUefficiency_%             TotalCPU                 AllocTRES  AllocTRES   MaxRSSMaxMemoryUsed
------------ ---------------------------------------------------------- -------------- ---------------------- -------------------- -------------- -------------- ---------- ---------- ------------------------- ----------
13033303       21769333 interacti+    priority  COMPLETED  COMPLETED     compute-a-16-160162       20182020-0411-27T1009T00:14:1515:11             0600:0005:00             00:07:3000:43              00:07:30  00:05.00767.44         billing=1,cpu=1,mem=0.98G,node=1
     0.06G
1304099921769333.batch       priority         FAILED COMPLETED      compute-a-16-164162       20182020-0411-27T1309T00:27:4915:11                           00:20:00       00:00:0743              00:00:07  00:00.00767.44                   cpu=1,mem=1G0.98G,node=1
13040999.ba+           0.53G
21769333.extern               FAILED  COMPLETED     compute-a-16-164162       20182020-0411-27T1309T00:27:4915:11                                  00:00:0745                 00:0.00:07  00:00.007       billing=1,cpu=1,mem=1G0.98G,node=1      0.00G
13041024         0
       21775057 priority    priority  COMPLETED  COMPLETED     compute-a-16-161168       20182020-0411-27T1309T01:29:1417:10             00:2005:00             00:00:2450                00:58.00:24  00:10.063       billing=1,cpu=1,mem=1G0.98G,node=1
 1304102421775057.ba+batch                 COMPLETED       compute-a-16-161168       20182020-0411-27T1309T01:29:1417:10                                  00:00:2450                00:00:24  00:10.06358.00                   cpu=1,mem=1G0.98G,node=1       13047330    0.50G
21775057.extern    medium   CANCELLED by +        COMPLETED  None assigned  2018compute-a-04-27T14:13:3016-168       12020-11-0009T01:00:0017:10                                  00:00:50                 0.00         billing=1,cpu=1,mem=0.98G,node=1           00:00:00    00:00:000


The field CPUTime indicates the total amount of CPU hours that were reserved by the job, calculated as the product of Elapsed time and number of cores.  The field TotalCPU indicates the amount of CPU hours that were actually used by the job. For an efficient job those two values should be very close, If TotalCPU is significantly smaller (<1/2)  than CPUTime CPUefficiency_% indicates how efficiently the job used the CPU cores allocated. If this number is less than 75% and the job is requesting more than one cpu core, then your job is probably requesting more cores than it usescan use.

AllocTRES reports the total amount of resources (cpu, memory, etc.) allocated for the job. 

MaxRSS MaxMemoryUsed reports the maximum amount of memory used by the job, if this value is significantly smaller than the allocated memory reported by AllocTRES you should reduce the memory requested by your job. (Note: for mpi jobs this is the max amount of memory used in each node)

...

Possible job states are:

CA = job cancelled

CD = job completed

F = job failed

NF = job failed due to Node failure

TO = job timeout

R = job running

OOM = job out of memory

PD = job pending

PR = job preempted 


-------------------

Examples:

...