...
Code Block |
---|
login05:~ O2squeue JOBID PARTITION STATE TIME_LIMIT TIME NODELIST(REASON) ELIGIBLE_TIME START_TIME TRES_ALLOC 1304733021801263 mediuminteractive RUNNING PENDING 1-0012:00:00 0:00 2:09:52 (Resources)compute-a-16-160 2020-11-09T11:35:49 2018-04-27T14:13:05 N/A cpu=2,mem=20002020-11-09T11:36:19 cpu=1,mem=2G,node=1,billing=1 |
The field STATE describes the states of your jobs and it will normally be either PENDING or RUNNING. When a job is pending NODELIST(REASON) describes the reason why the job is pending, most common reasons are:
...
Code Block |
---|
login05:~ O2sacct JobID Partition State State Nodelist NodeList Start Timelimit TimeLimit Elapsed CPUTimeCPUefficiency_% TotalCPU AllocTRES AllocTRES MaxRSSMaxMemoryUsed ------------ ---------------------------------------------------------- -------------- ---------------------- -------------------- -------------- -------------- ---------- ---------- ------------------------- ---------- 13033303 21769333 interacti+ priority COMPLETED COMPLETED compute-a-16-160162 20182020-0411-27T1009T00:14:1515:11 0600:0005:00 00:07:3000:43 00:07:30 00:05.00767.44 billing=1,cpu=1,mem=0.98G,node=1 0.06G 1304099921769333.batch priority FAILED COMPLETED compute-a-16-164162 20182020-0411-27T1309T00:27:4915:11 00:20:00 00:00:0743 00:00:07 00:00.00767.44 cpu=1,mem=1G0.98G,node=1 13040999.ba+ 0.53G 21769333.extern FAILED COMPLETED compute-a-16-164162 20182020-0411-27T1309T00:27:4915:11 00:00:0745 00:0.00:07 00:00.007 billing=1,cpu=1,mem=1G0.98G,node=1 0.00G 13041024 0 21775057 priority priority COMPLETED COMPLETED compute-a-16-161168 20182020-0411-27T1309T01:29:1417:10 00:2005:00 00:00:2450 00:58.00:24 00:10.063 billing=1,cpu=1,mem=1G0.98G,node=1 1304102421775057.ba+batch COMPLETED compute-a-16-161168 20182020-0411-27T1309T01:29:1417:10 00:00:2450 00:00:24 00:10.06358.00 cpu=1,mem=1G0.98G,node=1 13047330 0.50G 21775057.extern medium CANCELLED by + COMPLETED None assigned 2018compute-a-04-27T14:13:3016-168 12020-11-0009T01:00:0017:10 00:00:50 0.00 billing=1,cpu=1,mem=0.98G,node=1 00:00:00 00:00:000 |
The field CPUTime indicates the total amount of CPU hours that were reserved by the job, calculated as the product of Elapsed time and number of cores. The field TotalCPU indicates the amount of CPU hours that were actually used by the job. For an efficient job those two values should be very close, If TotalCPU is significantly smaller (<1/2) than CPUTime CPUefficiency_% indicates how efficiently the job used the CPU cores allocated. If this number is less than 75% and the job is requesting more than one cpu core, then your job is probably requesting more cores than it usescan use.
AllocTRES reports the total amount of resources (cpu, memory, etc.) allocated for the job.
MaxRSS MaxMemoryUsed reports the maximum amount of memory used by the job, if this value is significantly smaller than the allocated memory reported by AllocTRES you should reduce the memory requested by your job. (Note: for mpi jobs this is the max amount of memory used in each node)
...
Possible job states are:
CA = job cancelled
CD = job completed
F = job failed
NF = job failed due to Node failure
TO = job timeout
R = job running
OOM = job out of memory
PD = job pending
PR = job preempted
-------------------
Examples:
...