Page Comparison

Table of Contents

...

The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-data-processing and https://github.com/gatk-workflows/gatk4-somatic-snvs-indels

Jumpstart

Here are the commands to test out the workflow using example data. The whole run needs a few hours if the cluster is not busy.

Code Block

ssh user123@o2.hms.harvard.edu

# set up screen software: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715
cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc

screen

srun --pty -p interactive -t 0-12:0:0 --mem 16000MB -n 2 /bin/bash

mkdir -p /n/scratch3/users/${USER:0:1}/$USER/testGATK4  

cd /n/scratch3/users/${USER:0:1}/$USER/testGATK4

module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3

export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/home/ld32/rcbioDev/bin:/opt/singularity/bin:$PATH

# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh


cp /n/shared_db/singularity/hmsrc-gatk/scripts/* .

buildSampleFoldersFromSampleSheet.py sampleSheet.xlsx

runAsPipeline fastqToBam.sh "sbatch -p short --mem 4G -t 1:0:0 -n 1" noTmp run 2>&1 | tee output.log

# check email or use this command to see the workflow progress
squeue -u $USER -o "%.18i %.9P %.28j %.8u %.8T %.10M %.9l %.6D %R %S"

# After all jobs finish run, run this command to start database, and keep it running in background
runDB.sh  &     

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group1/in.json 2>&1 | tee -a group1.log

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run processing-for-variant-discovery-gatk4.wdl -i unmappedBams/group2/in.json 2>&1 | tee -a group2.log

setupJson.sh

java -XX:+UseSerialGC -Dconfig.file=your.conf -jar /n/shared_db/singularity/hmsrc-gatk/cromwell-43.jar run mutect2.wdl -i unmappedBams/exon.json && findVCF.sh

# Stop database
killall runDB.sh

...

From Windows, use the graphical PuTTY program to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

...

Code Block

# This will setup the path and environmental variables for the pipeline
module load gcc/6.2.0 python/2.7.12 rcbio/1.3.3
export PATH=/n/shared_db/singularity/hmsrc-gatk/bin:/home/ld32/rcbioDev/bin:$PATH



# setup database. Only need run this once. It will setup database in home, so make sure you have at least 5G free space at home.
setupDB.sh

...

Note that only step 2 used -t 50:0, and all other steps used the default -t 10:0. The default walltime limit was set in the runAsPipeline command, and the walltime parameter for step 2 was set in the bash_script_v2.sh

Run the pipeline

Thus far in the example, we have not actually submitted any jobs to the scheduler. To submit the pipeline, you will need to append the run parameter to the command. If run is not specified, test mode will be used, which does not submit jobs and gives the placeholder of 123 for jobids in the command's output.

...

Versions Compared

Old Version 30

New Version 31

Key

Jumpstart

Run the pipeline