Fastx, Bowtie2 and MACS2 for ChipSeq

Start an interactive job, with a walltime of 2 hours, 2000MB of memory.

srun --pty -p interactive -t 0-02:0:0 --mem 2000MB -n 1 /bin/bash

Create a working directory on scratch3 and change into the newly-created directory. For example, for user abc123, the working directory will be

mkdir /n/scratch3/users/a/abc123/chipSeq
cd  /n/scratch3/users/a/abc123/chipSeq

Build a barcode file for demultiplexing library:

The first column give sample name. 1Tr for sample 1 treatment, 1Co for sample 1 control.

nano barcode.txt 


1Tr       ATCACG
2Tr       CGATGT
3Tr       TTAGGC
4Tr       TGACCA
5Tr       ACAGTG
1Co       TAGCTT
2Co       GGCTAC
3Co       CTTGTA
4Co       ATATAGGA
5Co       AACCGTGT

Load rcbio module and copy the example bowtie2 and macs2 bash script:

module load rcbio/1.3.3
cp /n/app/rcbio/1.3.3/bin/fastxBowtie2.sh /n/app/rcbio/3.3/bin/macs2.sh .

Now you can modify the command options as needed. To edit the script:

nano fastxBowtie2.sh


nano macs2.sh

To test the pipeline run the following command. Jobs will not be submitted to the scheduler.

runAsPipeline "fastxBowtie2.sh -r hg38" "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp

#Or if you want to use your own bowtie2 index:


runAsPipeline "fastxBowtie2.sh -b /n/scratch3/users/a/abc123/index/hg38GenomeWithChr11Report" "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp


# this is a test run

To run the fastxBowtie2 pipeline:

runAsPipeline "fastxBowtie2.sh -r hg38" "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp run 2>&1 | tee output.log

#Or if you want to use your own bowtie2 index:

runAsPipeline "fastxBowtie2.sh -b /n/scratch3/users/a/abc123/index/hg38GenomeWithChr11Report" "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp run 2>&1 | tee output.log

# notice here 'run 2>&1 | tee output.log' is added to the command

After fastxBowtie2 workflow finishes, dry run macs2 to make sure it is fine:

runAsPipeline macs2.sh "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp

# this is a test run
# The script looks for file bam/sampleNameTr.sorted.bam and matched file bam/sampleNameCo.sorted.bam. If both of them exist, it run macs2 on them. If you are missing bam/sampleNameTr.sorted.bam, macs2 will not run. To use other control sample, for example, anotherSampleNameCo.sorted.bam for bam/sampleNameTr.sorted.bam, you can create a link: 
cd bam
ln -s anotherSampleNameCo.sorted.bam sampleNameCo.sorted.bam

To run the macs2 pipeline:

runAsPipeline macs2.sh "sbatch -p short --mem 6G -t 2:0:0 -n 1" noTmp run 2>&1 | tee output.log

To understand how 'runAsPipeline' works, how to check output, how to re-run the pipeline, please visit: Run Bash Script As Slurm Pipeline

Now you are ready to run an rcbio workflow

To instead run the workflow on your own data, transfer the sample sheet to your local machine following this wiki page and modify the sample sheet. Then you can transfer it back to O2 under your account, then go to the build folder structure step.