Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...


Table of Contents

This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline script, accessible through the rcbio/1.0 module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.

Features of the this pipeline:

  • Given a sample sheet, generate folder structure for data processing
  • Submit each step as a cluster job using sbatch.
  • Automatically arrange dependencies among jobs.
  • Email notifications are sent when each job fails or succeeds.
  • If a job fails, all its downstream jobs automatically are killed.
  • When re-running the pipeline on the same data folder, if there are any unfinished jobs, the user is asked to kill them or not.
  • When re-running the pipeline on the same data folder, the user is asked to confirm to re-run or not if a step was done successfully earlier.

The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels and modified to work on O2 slurm cluster.

Notice the original workflow uses reference and annotation files listed in this file: 

https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels/blob/master/gatk4-rna-germline-variant-calling.inputs.json 

We download the genome reference and all annotation files from: 

https://console.cloud.google.com/storage/browser/genomics-public-data/references/Homo_sapiens_assembly19_1000genomes_decoy/ except for the gtf file, which is downloaded from here: https://console.cloud.google.com/storage/browser/gatk-test-data/intervals?project=broad-dsde-outreach

We then modified the json file to this one: 

/n/shared_db/singularity/hmsrc-gatk/scripts/gatk4-rna-germline-variant-calling.inputs.template.json

...