Get more informative slurm email notification and logs through rcbio/1.2

 

This page shows you how to run submit a bash script to Slurm. The runSingle script, accessible through the rcbio/1.2 module, converts an input bash script with sbatch command and submits a single job to the Slurm scheduler for you.

Features of the new way of submitting job:

  • Informative email notifications are sent when job fails or succeeds.

  • Get a better log with memory and CPU usage. 

  • Auto re-run in case node fails.

  • When re-running on the same data folder, the user is asked to confirm to re-run or not if it was done successfully earlier.

Please read below for an example.

Log on to O2

If you need help connecting to O2, please review the Using Slurm Basic and the How to Login to O2 wiki pages.

From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your eCommons ID instead of user123:

1 ssh user123@o2.hms.harvard.edu

Start interactive job, and create working folder

For example, for user abc123, the working directory will be

1 2 3 srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -c 1 /bin/bash mkdir /n/scratch3/users/${USER:0:1}/${USER}/test cd /n/scratch3/users/${USER:0:1}/${USER}/test
1 2 # This will setup the path and environmental variables for the pipeline module load rcbio/1.2

The bash script

1 2 3 4 5 6 7 8 9 10 11 12 13 14 cp /n/app/rcbio/1.2/bin/bashScript.sh . # Run cat command to show the content of bashScript.sh cat bashScript.sh #!/bin/sh for i in A B; do echo John >> John.txt; echo Mike >> Mike.txt echo Nick >> Nick.txt; echo Julia >> Julia.txt done cat John.txt Mike.txt Nick.txt Julia.txt > all.txt

Submit the script to Slurm:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 runSingleJob "bash bashScript.sh" "sbatch -p short -t 10:0 -c 1" # Below is the output Running flag/slurmPipeLine.202103311551.run.sh module list: Currently Loaded Modules: 1) gcc/6.2.0 2) python/2.7.12 3) rcbio/1.2 depend on no job sbatch -p short -t 10 -c 1 --mail-type=FAIL --nodes=1 -J Job_bash_bashScript.sh -o /home/ld32/rcbio/flag/Job_bash_bashScript.sh.out -e /home/ld32/rcbio/flag/Job_bash_bashScript.sh.out /home/ld32/rcbio/flag/Job_bash_bashScript.sh.sh # Submitted batch job 43976

Monitoring the jobs

You can use the command:

1 squeue -u $USER

To see the job status (running, pending, etc.). You also get two emails for each step, one at the start of the step, one at the end of the step.

Check job log

You can use the command:

1 ls -l flag

This command list all the logs created by the pipeline runner. *.sh files are the slurm scripts for each step, *.out files are output files for each step, *.success files means job successfully finished for each step and *.failed means job failed for each steps.

You also get two emails for each step, one at the start of the step, one at the end of the step.

Re-run the job

You can rerun this command in the same folder

1 runSingleJob bashScript.sh "sbatch -p short -t 10:0 -c 1"

This command will check if the earlier run is finished or not. If not, ask user to kill the running jobs or not, then ask user to rerun the successfully finished steps or not. Click 'y', it will rerun, directly press 'enter' key, it will not rerun. 

To run your own script as Slurm pipeline

In the example above, we run a bash script. You can run any command you like:  

1 2 3 4 runSingleJob "python pythonScript.py" "sbatch -p short -t 10:0 -c 1" or runSingleJob "module load bowtie/1.2.2; bowtie -x /n/groups/shared_databases/bowtie_indexes/hg19 -p 2 -1 read1.fq -2 read2.fq --sam > out.bam" "sbatch -p short -t 1:0:0 -c 2 -mem 8G"

Let us know if you have any questions. Please include your working folder and commands used in your email. Any comments and suggestions are welcome!