...
...
...
...
|
This page shows you how to run a regular bash script as a pipeline. The runAsPipeline
script, accessible through the rcbio/1.0
module, converts an input bash script to a pipeline that easily submits jobs to the Slurm scheduler for you.
Features of the new pipeline:
Submit each step as a cluster job using
sbatch
.Automatically arrange dependencies among jobs.
Email notifications are sent when each job fails or succeeds.
If a job fails, all its downstream jobs automatically are killed.
When re-running the pipeline on the same data folder, if there are any unfinished jobs, the user is asked to kill them or not.
When re-running the pipeline on the same data folder, the user is asked to confirm to re-run or not if a step was done successfully earlier.
Please read below for an example.
...
If you need help connecting to O2, please review the Using Slurm Basic wiki page.
From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.
From a Mac Terminal, use the ssh
command, inserting your eCommons ID instead of user123:
Code Block | |
---|---|
true | ssh user123@o2.hms.harvard.edu |
Start interactive job, and create working folder
For example, for user abc123, the working directory will be
Code Block | ||
---|---|---|
| ||
srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -n 1 /bin/bash mkdir /n/scratch3/users/a/abc123/testRunBashScriptAsSlurmPipeline cd /n/scratch3/users/a/abc123/testRunBashScriptAsSlurmPipeline |
...
Load the pipeline related modules
Code Block | |
---|---|
true | # This will setup the path and environmental variables for the pipeline module load rcbio/1.0 |
Build some testing data in the current folder
Code Block | ||
---|---|---|
| ||
echo -e "John Paul\nMike Smith\nNick Will\nJulia Johnson\nTom Jones" > universityA.txt cp universityA.txt universityB.txt |
Take a look at the example files
Code Block | |
---|---|
true | # this command shows the content of file universityA.txt cat universityA.txt # Below is the content of universityA.txt John Paul Mike Smith Nick Will Julia Johnson Tom Jones |
Code Block |
---|
Code Block | |
---|---|
true | # this command shows the content of file universityB.txt cat universityB.txt # below is the content of universityB.txt John Paul Mike Smith Nick Will Julia Johnson Tom Jones |
The original bash script
Code Block | |
---|---|
true | cp /n/app/rcbio/1.0/bin/bash_script_v1.sh . cat bash_script_v1.sh # Below is the conten of bash_script_v1.sh #!/bin/sh for i in A B; do u=university$i.txt grep -H John $u >> John.txt; grep -H Mike $u >> Mike.txt grep -H Nick $u >> Nick.txt; grep -H Julia $u >> Julia.txt done cat John.txt Mike.txt Nick.txt Julia.txt > all.txt |
...
The modified bash script
Code Block | ||
---|---|---|
| ||
cp /n/app/rcbio/1.0/bin/bash_script_v2.sh . cat bash_script_v2.sh # Below is the conten of bash_script_v2.sh #!/bin/sh #loopStart,i for i in A B; do u=university$i.txt #@1,0,find1,u,sbatch -p short -n 1 -t 50:0 grep -H John $u >> John.txt; grep -H Mike $u >> Mike.txt #@2,0,find2,u,sbatch -p short -n 1 -t 50:0 grep -H Nick $u >> Nick.txt; grep -H Julia $u >> Julia.txt #loopEnd done #@3,1.2,merge cat John.txt Mike.txt Nick.txt Julia.txt > all.txt |
...
Notice that there are a few things added to the script here:
before the loop starts,
#loopStart,i
was added (line 7 above). Here the variablei
is looping variable, which will be recognized by the pineline runner.before the loop ends,
#loopEnd
was added (line 17 above). This will be recognized by the pineline runner.Step 1 is denoted by
#@1,0,find1,u,sbatch -p short -n 1 -t 50:0 (line 11 above), which
means this is step 1 that depends on no other step, is named find1, and file $u
needs to be copied to the/tmp
directory. The sbatch command tells the pipeline runner the sbatch command to run this step.Step 2 is denoted by
#@2,0,find2,u (line 14 above),
which means this is step2 that depends on no other step, is named find2, and file $u needs to be copy to/tmp
directory. The sbatch command tells the pipeline runner the sbatch command to run this step.Step 3 is denoted by
#@3,1.2,merge, which
means that this is step3 that depends on step1 and step2, and the step is named merge. Notice, there is no sbatch here, so the pipeline runner will use default sbatch command (see below).
Notice the format of step annotaion is #@stepID,dependIDs,stepName,reference,sbatchOptions. Reference is optional, which allows the pineline runner to copy data (file or folder) to local /tmp folder on the computing node to speed up the software. sbatchOptions is also optional, and when it is missing, the pipeline runner will use the default sbatch command given from command line (see below).
...
Test run the modified bash script as a pipeline
Code Block | linenumbers | true
---|
runAsPipeline bash_script_v2.sh "sbatch -p short -t 10:0 -n 1" useTmp |
...
Note that only step 2 used -t 50:0
, and all other steps used the default -t 10:0
. The default walltime limit was set in the runAsPipeline
command, and the walltime parameter for step 2 was set in the bash_script_v2.sh
script.
Code Block |
---|
runAsPipeline bash_script_v2.sh "sbatch -p short -t 10:0 -n 1" useTmp # Below is the output: converting bash_script_v2.sh to flag/slurmPipeLine.201801161424.sh find loopStart: #loopStart,i find job marker: #@1,0,find1,u: find job: grep -H John $u >> John.txt; grep -H Mike $u >> Mike.txt find job marker: #@2,0,find2,u,sbatch -p short -n 1 -t 50:0 sbatch options: sbatch -p short -n 1 -t 50:0 find job: grep -H Nick $u >> Nick.txt; grep -H Julia $u >> Julia.txt find loopend: #loopEnd find job marker: #@3,1.2,merge: find job: cat John.txt Mike.txt Nick.txt Julia.txt > all.txt flag/slurmPipeLine.201801161424.sh bash_script_v2.sh is ready to run. Starting to run ... Running flag/slurmPipeLine.201801161424.sh bash_script_v2.sh --------------------------------------------------------- step: 1, depends on: 0, job name: find1, flag: find1.A reference: .u depend on no job sbatch -p short -t 10:0 -n 1 --nodes=1 -J 1.0.find1.A -o /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.A.out -e /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.A.out /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.A.sh # Submitted batch job 123 step: 2, depends on: 0, job name: find2, flag: find2.A reference: .u depend on no job sbatch -p short -n 1 -t 50:0 --nodes=1 -J 2.0.find2.A -o /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.A.out -e /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.A.out /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.A.sh # Submitted batch job 123 step: 1, depends on: 0, job name: find1, flag: find1.B reference: .u depend on no job sbatch -p short -t 10:0 -n 1 --nodes=1 -J 1.0.find1.B -o /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.B.out -e /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.B.out /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/1.0.find1.B.sh # Submitted batch job 123 step: 2, depends on: 0, job name: find2, flag: find2.B reference: .u depend on no job sbatch -p short -n 1 -t 50:0 --nodes=1 -J 2.0.find2.B -o /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.B.out -e /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.B.out /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/2.0.find2.B.sh # Submitted batch job 123 step: 3, depends on: 1.2, job name: merge, flag: merge reference: depend on multiple jobs sbatch -p short -t 10:0 -n 1 --nodes=1 --dependency=afterok:123:123:123:123 -J 3.1.2.merge -o /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/3.1.2.merge.out -e /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/3.1.2.merge.out /n/scratch2/kmk34/testRunBashScriptAsSlurmPipeline/flag/3.1.2.merge.sh # Submitted batch job 123 all submitted jobs: job_id depend_on job_flag 123 null 1.0.find1.A 123 null 2.0.find2.A 123 null 1.0.find1.B 123 null 2.0.find2.B 123 ..123.123..123.123 3.1.2.merge --------------------------------------------------------- |
...
You can use the command:
Code Block | ||
---|---|---|
| ||
squeue -u $USER |
To see the job status (running, pending, etc.). You also get two emails for each step, one at the start of the step, one at the end of the step.
...
You can use the command:
Code Block | ||
---|---|---|
| ||
ls -l flag |
This command list all the logs created by the pipeline runner. *.sh files are the slurm scripts for eash step, *.out files are output files for each step, *.success files means job successfully finished for each step and *.failed means job failed for each steps.
...
You can rerun this command in the same folder
Code Block | ||
---|---|---|
| ||
runAsPipeline bash_script_v2.sh "sbatch -p short -t 10:0 -n 1" useTmp run |
...
If you have a bash script with multiple steps and you wish to run it as Slurm pipeline, modify your old script and add the notation to mark the start and end of any loops, and the start of any step for which you want to submit as an sbatch
job. Then you can use runAsPipeline
with your modified bash script, as as detailed above.
How does it work
Code Block |
---|
In case you wonder how it works, here is a simple example to expain:
For each step per loop, the pipeline runner reates a file looks like this (here it is named flag.sh):
Code Block | ||
---|---|---|
| ||
#!/bin/bash srun -n 1 bash -c "{ echo I am running...; hostname; otherCommands; } && touch flag.success" sleep 5 export SLURM_TIME_FORMAT=relative echo Job done. Summary: sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID sendJobFinishEmail.sh flag [ -f flag.success ] && exit 0 || exit 1 |
Then submit with:
Code Block | linenumbers | true
---|
sbatch -p short -t 10:0 -o flag.out -e flag.out flag.sh |
...