|
Errors in submitting jobs
...
To prevent those errors, you could remove the srun command or submit the sbatch+srun jobs from a login node instead of an interactive job.
Jobs fail with the message: srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.
To reproduce the error:
srun --pty -p interactive --mem-per-cpu 1G -t 0-01:00 /bin/bash
sbatch -p short --mem 2G -t 10 --wrap="srun hostname"
tail slurm-jobID.out
...
srun: fatal: SLURM_MEM_PER_CPU, SLURM_MEM_PER_GPU, and SLURM_MEM_PER_NODE are mutually exclusive.sbatch -p partition -t DD-HH:MM --wrap="srun <your command>"
So the first srun command sets up variable SLURM_MEM_PER_CPU
. That variable causes conflict for the second srun command. To avoid it:
srun --pty -p interactive --mem-per-cpu 1G -t 0-01:00 /bin/bash
unset
SLURM_MEM_PER_CPU SLURM_MEM_PER_GPU SLURM_MEM_PER_NODE
sbatch -p short --mem 2G -t 10 --wrap="srun hostname"
Slurm Job States
Your job will report different states before, during, and after execution. The most common ones are seen below, but this is not an exhaustive list. Look at Job State Codes in the squeue
manual or this section in the sacct
manual for more detail.
...