Get more informative slurm email notification

Here is an example (Note: '-n 1' is needed for srun command. It make sure only run one copy of the commands.):

Create a file with ( you can replace "echo firstCommand; echo secondCommand' with your own commands):

#!/bin/bash srun -n 1 -t $SRUNTIME --mem $SRUNMEM bash -c "{ echo I am running on:; hostname; echo firstCommand; echo secondCommand; } && touch myJob.success" sleep 5 # wait slurm get the job status into its database echo Job done. Summary: sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID sh myJob [ -f myJob.success ] && exit 0 || exit 1

Create a file with: 

#!/bin/bash to=`cat ~/.forward` flag=$1 minimumsize=9000 actualsize=`wc -c $flag.out` [ ! -f $flag.success ] && s="Subject: Failed: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" || s="Subject: Success: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" stat=`tail -n 1 $flag.out` [[ "$stat" == *COMPLETED* ]] && echo *Notice the sacct report above: while the main job is still running for sacct command, user task is completed. >> $flag.out if [ "${actualsize% *}" -ge "$minimumsize" ]; then toSend=`echo Job script content:; cat $` toSend="$s\n$toSend\nOutput is too big for email. Please find output in: $flag.out" toSend="$toSend\n...\n`tail -n 6 $flag.out`" else toSend=`echo Job script content:; cat $; echo Job output:; cat $flag.out` toSend="$s\n$toSend" fi echo -e "$toSend" | sendmail $to

Then submit with (notice here, SRUNTIME Is 1 minute less than sbatch time and SRUNMEM is 1M less than sbatch mem.This is to make sure srun will not use more all the resource, so sacct and email commands can run.): 

export SRUNTIME=0:9:0; export SRUNMEM=500M; sbatch -p short -t 0:10:0 --mem 501M -o myJob.out -e myJob.out

Let us know if you have any questions. Please include your working folder and commands used in your email. Any comment and suggestion are welcome!