Get more informative slurm email notification
Here is an example (Note: '-n 1' is needed for srun command. It make sure only run one copy of the commands.):
Create a file myJob.sh with ( you can replace "echo firstCommand; echo secondCommand' with your own commands):
#!/bin/bash
srun -n 1 -t $SRUNTIME --mem $SRUNMEM bash -c "{ echo I am running on:; hostname; echo firstCommand; echo secondCommand; } && touch myJob.success"
sleep 5 # wait slurm get the job status into its database
echo Job done. Summary:
sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID
sh sendJobFinishEmail.sh myJob
[ -f myJob.success ] && exit 0 || exit 1
Create a file sendJobFinishEmail.sh with:
#!/bin/bash
to=`cat ~/.forward`
flag=$1
minimumsize=9000
actualsize=`wc -c $flag.out`
[ ! -f $flag.success ] && s="Subject: Failed: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" || s="Subject: Success: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n"
stat=`tail -n 1 $flag.out`
[[ "$stat" == *COMPLETED* ]] && echo *Notice the sacct report above: while the main job is still running for sacct command, user task is completed. >> $flag.out
if [ "${actualsize% *}" -ge "$minimumsize" ]; then
toSend=`echo Job script content:; cat $flag.sh`
toSend="$s\n$toSend\nOutput is too big for email. Please find output in: $flag.out"
toSend="$toSend\n...\n`tail -n 6 $flag.out`"
else
toSend=`echo Job script content:; cat $flag.sh; echo Job output:; cat $flag.out`
toSend="$s\n$toSend"
fi
echo -e "$toSend" | sendmail $to
Then submit with (notice here, SRUNTIME Is 1 minute less than sbatch time and SRUNMEM is 1M less than sbatch mem.This is to make sure srun will not use more all the resource, so sacct and email commands can run.):
export SRUNTIME=0:9:0; export SRUNMEM=500M; sbatch -p short -t 0:10:0 --mem 501M -o myJob.out -e myJob.out myJob.sh
Let us know if you have any questions. Please include your working folder and commands used in your email. Any comment and suggestion are welcome!