Page Comparison

...

Table of Contents

This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline script, accessible through the rcbio/1.0 module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.

Features of the this pipeline:

Given a sample sheet, generate folder structure for data processing
Submit each step as a cluster job using sbatch.
Automatically arrange dependencies among jobs.
Email notifications are sent when each job fails or succeeds.
If a job fails, all its downstream jobs automatically are killed.
When re-running the pipeline on the same data folder, if there are any unfinished jobs, the user is asked to kill them or not.
When re-running the pipeline on the same data folder, the user is asked to confirm to re-run or not if a step was done successfully earlier.

The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels and modified to work on O2 slurm cluster.

Notice the original workflow uses reference and annotation files listed in this file:

https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels/blob/master/gatk4-rna-germline-variant-calling.inputs.json

We download the genome reference and all annotation files from:

https://console.cloud.google.com/storage/browser/genomics-public-data/references/Homo_sapiens_assembly19_1000genomes_decoy/ except for the gtf file, which is downloaded from here: https://console.cloud.google.com/storage/browser/gatk-test-data/intervals?project=broad-dsde-outreach

We then modified the json file to this one:

/n/shared_db/singularity/hmsrc-gatk/scripts/gatk4-rna-germline-variant-calling.inputs.template.json

...

Versions Compared

Old Version 19

New Version 20

Key