How to connect to your sbatch running job
Slurm allows you to connect, with a standard terminal shell
, to the nodes where your jobs are running. From those shell
connections you will be able to use the same computational resources allocated for your jobs. You can also run a command, directly from login nodes, against resources allocated to a separate already running job.
This feature can be useful for:
monitoring in real time your sbatch running jobs (you can use commands like "top" or “ps” to check your processes or look at the content of local /tmp folders on the compute node)
using resources allocated to a running sbatch job that you know might be idle at a given time. For example use GPU or CPU computing power when temporarily idle.
Get immediate access to some computational resources if you are in a urgent need. For example, you might be running a GPU jobs and still have enough VRAM (GPU memory) free on that card that could be used to run a separate process.
The syntax to use is srun --jobid=<jobid_number> ...
followed by your desired command.
The shell or the commands started via srun --jobid
will be constrained to use only the resources available to the jobid
job.
This below is an example of how you can connect from a login node to a running sbatch job.
First, we had submitted a standard sbatch job, in this example called my_sbatch_job.sh:
#!/bin/bash
#SBATCH -c 1 # Number of cores requested
#SBATCH -t 4:00:00 # Wall-time
#SBATCH -p priority # Partition
#SBATCH --mem=2G # memory per node
# your sbatch job commands here
python3 python_sleep.py