|
This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline
script, accessible through the rcbio/1.0
module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.
...
The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-somatic-snvs-indels
Jumpstart
Here are the commands to test out the workflow using example data. The whole run needs a few hours if the cluster is not busy.
Code Block |
---|
ssh user123@o2.hms.harvard.edu
# set up screen software: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715
cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc
screen
srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash
mkdir -p /n/scratch3/users/${USER:0:1}/$USER/testGATK4
cd /n/scratch3/users/${USER:0:1}/$USER/testGATK4
module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3
export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/home/ld32/rcbioDev/bin:/opt/singularity/bin:$PATH
# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh
cp /n/shared_db/singularity/hmsrc-gatk/scripts/* .
buildSampleFoldersFromSampleSheet.py sampleSheet.xlsx
runAsPipeline fastqToBam.sh "sbatch -p short --mem 4G -t 1:0:0 -n 1" noTmp run 2>&1 | tee output.log
# check email or use this command to see the workflow progress
squeue -u $USER -o "%.18i %.9P %.28j %.8u %.8T %.10M %.9l %.6D %R %S"
# After all jobs finish run, run this command to start database, and keep it running in background
runDB.sh &
java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group1/in.json 2>&1 | tee -a group1.log
java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group2/in.json 2>&1 | tee -a group2.log
setupJson.sh
java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run mutect2.wdl -i unmappedBams/exon.json && findVCF.sh
# Stop database
killall runDB.sh |
...