...
Slurm allows you to connect, with a standard terminal shell
, to the nodes where your jobs are running. From those shell
connections you will be able to use the same computational resources allocated for your jobs. You can also run a command, directly from login nodes a command , against resources allocated to a separate already running job.
...
monitoring in real time your sbatch running jobs (you can use commands like "top" or “ps” to check your processes or look at the content of local /tmp folders on the compute node)
using resources allocated to a running sbatch job that you know might be idles idle at a given time. For example use GPU or CPU computing power when temporarily idle.
Get immediate access to some computational resources if you are in a urgent need. For example, you might be running a GPU jobs and still have enough VRAM (GPU memory) free on that card that could be used to run a separate process.
...
Once the sbatch job starts running, it is possible to start a shell
as a slurm Slurm jobstep using the same resources allocated for the sbatch job (10610731 in this example).
...
Everything executed within that srun shell will run sharing the same resource resources already allocated for the sbatch job.
It is possible to connect multiple times to the same sbatch job, as long as the sbatch job is in “RUNNING” state. Note that However, it is not possible to have concurrent srun --jobid
connections to the same sbatch running job.
...