...
...
...
...
...
...
...
...
...
...
...
...
|
This page shows you how to run GATK4 using our recently installed Singularity GATK4 container. The runAsPipeline
script, accessible through the rcbio/1.0
module, converts the bash script into a pipeline that easily submits jobs to the Slurm scheduler for you.
Features of the this pipeline:
- Given a sample sheet, generate folder structure for data processing
- Submit each step as a cluster job using
sbatch
. - Automatically arrange dependencies among jobs.
- Email notifications are sent when each job fails or succeeds.
- If a job fails, all its downstream jobs automatically are killed.
- When re-running the pipeline on the same data folder, if there are any unfinished jobs, the user is asked to kill them or not.
- When re-running the pipeline on the same data folder, the user is asked to confirm to re-run or not if a step was done successfully earlier.
The workflows are downloaded from: https://github.com/gatk-workflows/gatk4-rnaseq-germline-snps-indels and modified to work on O2 slurm cluster.
Notice the original workflow uses reference and annotation files listed in this file:
We download the genome reference and all annotation files from:
https://console.cloud.google.com/storage/browser/genomics-public-data/references/Homo_sapiens_assembly19_1000genomes_decoy/ except for the gtf file, which is downloaded from here: https://console.cloud.google.com/storage/browser/gatk-test-data/intervals?project=broad-dsde-outreach
We then modified the json file to this one:
/n/shared_db/singularity/hmsrc-gatk/scripts/gatk4-rna-germline-variant-calling.inputs.template.json
...