Here is an example (Note: '-n 1' is needed for srun command. It make sure only run one copy of the commands.):
Create a file myJob.sh with ( you can replace "echo firstCommand; echo secondCommand' with your own commands):
Code Block | |
---|---|
true | #!/bin/bash srun -n 1 -t $SRUNTIME --mem $SRUNMEM bash -c "{ echo I am running on:; hostname; echo firstCommand; echo secondCommand; } && touch myJob.success" sleep 5 # wait slurm get the job status into its database echo Job done. Summary: sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID sh sendJobFinishEmail.sh myJob [ -f myJob.success ] && exit 0 || exit 1 |
Create a file sendJobFinishEmail.sh with:
Code Block | linenumbers | true
---|
#!/bin/bash to=`cat ~/.forward` flag=$1 minimumsize=9000 actualsize=`wc -c $flag.out` [ ! -f $flag.success ] && s="Subject: Failed: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" || s="Subject: Success: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" stat=`tail -n 1 $flag.out` [[ "$stat" == *COMPLETED* ]] && echo *Notice the sacct report above: while the main job is still running for sacct command, user task is completed. >> $flag.out if [ "${actualsize% *}" -ge "$minimumsize" ]; then toSend=`echo Job script content:; cat $flag.sh` toSend="$s\n$toSend\nOutput is too big for email. Please find output in: $flag.out" toSend="$toSend\n...\n`tail -n 6 $flag.out`" else toSend=`echo Job script content:; cat $flag.sh; echo Job output:; cat $flag.out` toSend="$s\n$toSend" fi echo -e "$toSend" | sendmail $to |
Then submit with (notice here, SRUNTIME Is 1 minute less than sbatch time and SRUNMEM is 1M less than sbatch mem.This is to make sure srun will not use more all the resource, so sacct and email commands can run.):
Code Block | linenumbers | true
---|
export SRUNTIME=0:9:0; export SRUNMEM=500M; sbatch -p short -t 0:10:0 --mem 501M -o myJob.out -e myJob.out myJob.sh |
...