Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline script, accessible through the rcbio/1.0 module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.

...

The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-data-processing and  https://github.com/gatk-workflows/gatk4-somatic-snvs-indels

Jumpstart

Here are the commands to test out the workflow using example data. The whole run needs a few hours if the cluster is not busy. 

Code Block
ssh user123@o2.hms.harvard.edu

# set up screen software: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715
cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc

screen

srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash

mkdir -p /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4  

cd /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4

module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3

export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/opt/singularity/bin:$PATH

# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh


cp /n/shared_db/singularity/hmsrc-gatk/scripts/* .

buildSampleFoldersFromSampleSheet.py sampleSheet.xlsx

runAsPipeline fastqToBam.sh "sbatch -p short --mem 4G -t 1:0:0 -n 1" noTmp run 2>&1 | tee output.log

# check email or use this command to see the workflow progress
squeue -u $USER -o "%.18i %.9P %.28j %.8u %.8T %.10M %.9l %.6D %R %S"

# After all jobs finish run, run this command to start database, and keep it running in background
runDB.sh  &     

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group1/in.json 2>&1 | tee -a group1.log

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group2/in.json 2>&1 | tee -a group2.log

setupJson.sh

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run mutect2.wdl -i unmappedBams/exon.json && findVCF.sh

# Stop database
killall runDB.sh

...

Code Block
srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash

mkdir -p /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4  
cd /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4

...