Page Comparison

Table of Contents

This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline script, accessible through the rcbio/1.0 module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.

...

The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-somatic-snvs-indels

Jumpstart

Here are the commands to test out the workflow using example data. The whole run needs a few hours if the cluster is not busy.

Code Block

ssh user123@o2.hms.harvard.edu

# set up screen software: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715
cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc

screen

srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash

mkdir -p /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4  

cd /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4

module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3

export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/home/ld32/rcbioDev/bin:/opt/singularity/bin:$PATH

# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh


cp /n/shared_db/singularity/hmsrc-gatk/scripts/* .

buildSampleFoldersFromSampleSheet.py sampleSheet.xlsx

runAsPipeline fastqToBam.sh "sbatch -p short --mem 4G -t 1:0:0 -n 1" noTmp run 2>&1 | tee output.log

# check email or use this command to see the workflow progress
squeue -u $USER -o "%.18i %.9P %.28j %.8u %.8T %.10M %.9l %.6D %R %S"

# After all jobs finish run, run this command to start database, and keep it running in background
runDB.sh  &     

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group1/in.json 2>&1 | tee -a group1.log

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group2/in.json 2>&1 | tee -a group2.log

setupJson.sh

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run mutect2.wdl -i unmappedBams/exon.json && findVCF.sh

# Stop database
killall runDB.sh

...

From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your eCommons HMS ID instead of user123:

Code Block

ssh user123@o2.hms.harvard.edu

# set up screen software: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715
cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc

screen  # start screen session. For detail: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715

...

Code Block
srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash mkdir -p /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4 cd /n/scratch3scratch/users/${USER:0:1}/$USER/testGATK4

...

Code Block

# This will setup the path and environmental variables for the pipeline
module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3
export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/home/ld32/rcbioDev/bin:$PATH



# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh

...

Note that only step 2 used -t 50:0, and all other steps used the default -t 10:0. The default walltime limit was set in the runAsPipeline command, and the walltime parameter for step 2 was set in the bash_script_v2.sh

Run the pipeline

Thus far in the example, we have not actually submitted any jobs to the scheduler. To submit the pipeline, you will need to append the run parameter to the command. If run is not specified, test mode will be used, which does not submit jobs and gives the placeholder of 123 for jobids in the command's output.

...

Versions Compared

Old Version 30

New Version Current

Key

Jumpstart

Run the pipeline