Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Here is an example (Note: '-n 1' is needed for srun command. It make sure only run one copy of the commands.):

Create a file myJob.sh with ( you can replace "echo firstCommand; echo secondCommand' with your own commands):

linenumbers
Code Block
true
#!/bin/bash 
srun -n 1 -t $SRUNTIME --mem $SRUNMEM bash -c "{ echo I am running on:; hostname; echo firstCommand; echo secondCommand; } && touch myJob.success" 
sleep 5 # wait slurm get the job status into its database
echo Job done. Summary: 
sacct --format=JobID,Submit,Start,End,State,Partition,ReqTRES%30,CPUTime,MaxRSS,NodeList%30 --units=M -j $SLURM_JOBID 
sh sendJobFinishEmail.sh myJob 
[ -f myJob.success ] && exit 0 || exit 1 

Create a file sendJobFinishEmail.sh with: 

true
Code Block
linenumbers
#!/bin/bash
to=`cat ~/.forward`
flag=$1
minimumsize=9000
actualsize=`wc -c $flag.out`
[ ! -f $flag.success ] && s="Subject: Failed: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n" ||  s="Subject: Success: job id:$SLURM_JOBID name:$SLURM_JOB_NAME\n"
stat=`tail -n 1 $flag.out`
[[ "$stat" == *COMPLETED* ]] && echo *Notice the sacct report above: while the main job is still running for sacct command, user task is completed. >> $flag.out

if [ "${actualsize% *}" -ge "$minimumsize" ]; then
   toSend=`echo Job script content:; cat $flag.sh`
   toSend="$s\n$toSend\nOutput is too big for email. Please find output in: $flag.out"  
   toSend="$toSend\n...\n`tail -n 6 $flag.out`"
else
   toSend=`echo Job script content:; cat $flag.sh; echo Job output:; cat $flag.out`
   toSend="$s\n$toSend"
fi
echo -e "$toSend" | sendmail $to 


Then submit with (notice here, SRUNTIME Is 1 minute less than sbatch time and SRUNMEM is 1M less than sbatch mem.This is to make sure srun will not use more all the resource, so sacct and email commands can run.): 

true
Code Block
linenumbers
export SRUNTIME=0:9:0; export SRUNMEM=500M; sbatch -p short -t 0:10:0  --mem 501M -o myJob.out -e myJob.out myJob.sh

...