Sometimes the problem is from the program itself, and has nothing to do with SLURM. Usually, you will get an error message in the file(s) you wrote with the -o and/or -e parameters. You can also look at the accounting information (through sacct) for your job. If you are unable to figure out why the program exited, you can contact Research Computing. Depending on the program and error, we might or might not be able to diagnose and/or fix the problem.

Jobs fail with the message: Unable to satisfy CPU bind request

The current version of Slurm (23.02.3) does not allow running sbatch jobs that contain srun, mpirun, or mpiexec commands such as

sbatch -p partition -t DD-HH:MM --wrap="srun <your command>"

or an equivalent sbatch script if those jobs are submitted from within an interactive srun job. In this case, a conflict of variables would cause the job to fail with an error like:

srun: error: CPU binding outside of job step allocation, allocated CPUs are: 0x001A800.
srun: error: Task launch for StepId=12345.0 failed on node compute-e-16-182: Unable to satisfy cpu bind request
srun: error: Application launch failed: Unable to satisfy cpu bind request
srun: Job step aborted

Slurm Job States

Your job will report different states before, during, and after execution. The most common ones are seen below, but this is not an exhaustive list. Look at Job State Codes in the squeue manual or this section in the sacct manual for more detail.

...

Versions Compared

Old Version 38

New Version 39

Key

Jobs fail with the message: Unable to satisfy CPU bind request

Slurm Job States

Page Comparison

Versions Compared

Old Version 38

New Version 39

Key

Jobs fail with the message: Unable to satisfy CPU bind request

Slurm Job States