spp (1.15.2) for Chip-Seq

1 1 Log on to O2
2 2 Start interactive job with 4G memory and 2 CPU cores , and create working folder
3 3 Load bowtie2 module and run mapping. (note: I already run mapping. Copy them if you don't want to re-run bowtie2: /n/groups/shared_databases/rcbio/spp/input)
4 4 Start R and load spp and bioMaRt packages
5 5 Read in bowtie alignments
6 6 Identify binding characteristics
7 7 Select informative tags based on the binding characteristics
8 8 Remove singular positions with very high tag counts
9 9 Calculate genome-wide tag density and tag enrichment/depletion profiles
10 10 Detecting point binding positions
11 11 Comparing Binding Sites to Annotations Using the biomaRt package

SPP is a R package especially designed for the analysis of Chip-Seq data. The package was developed by Peter Park's group from Harvard Medical School. Here is the the original nature paper:http://www.nature.com/nbt/journal/v26/n12/full/nbt.1508.html

Here is the original tutorial created by Peter Kharchenk from the same group: http://compbio.med.harvard.edu/Supplements/ChIP-seq/tutorial.html

Based paper and tutorial above, we created a guide to use the package on O2 . Please notice: part of this guild was from the material mentioned above.

We will calculate smoothed read enrichment profile and identify sites that significantly enriched compared to control. IGB or IGV can be used to browse the data.

We will also use the biomaRt package to download annotation and identify genes within 2kb of the transcription start site (TSS).

Note: You can copy and paste all the text to your Linux command line to run. Anything with "#" is comment, and will be IGNORED by Linux or R. I have run the pipeline on testing data. All the output files can be found in /n/groups/shared_databases/rcbio/spp/output

1 Log on to O2

If you need help connecting to O2, please review the How to login to O2 wiki page.

From Windows, use MobaXterm (preferred) or PuTTY to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your HMS ID instead of user123:

ssh user123@o2.hms.harvard.edu

2 Start interactive job with 4G memory and 2 CPU cores , and create working folder

srun --pty -p interactive -t 0-12:0:0 --mem 4G -n 2 /bin/bash
mkdir /n/scratch/users/${USER:0:1}/$USER/sppTest && cd /n/scratch/users/${USER:0:1}/$USER/sppTest

3 Load bowtie2 module and run mapping. (note: I already run mapping. Copy them if you don't want to re-run bowtie2: /n/groups/shared_databases/rcbio/spp/input)

# This will setup the path and environmental variables for the pipeline
module load gcc/6.2.0 bowtie2/2.2.9 samtools/0.1.19 

read_file=/n/groups/shared_databases/rcbio/spp/input/SRR1002328_1.fastq
bowtie2 -p 2 -x /n/groups/shared_databases/bowtie2_indexes/d_melanogaster_fb5_22 -U $read_file | samtools view -bS - > SRR1002328.bam

read_file=/n/groups/shared_databases/rcbio/spp/input/SRR1002329_1.fastq
bowtie2 -p 2 -x /n/groups/shared_databases/bowtie2_indexes/d_melanogaster_fb5_22 -U $read_file | samtools view -bS - > SRR1002329.bam

4 Start R and load spp and bioMaRt packages

5 Read in bowtie alignments

6 Identify binding characteristics

7 Select informative tags based on the binding characteristics

8 Remove singular positions with very high tag counts

9 Calculate genome-wide tag density and tag enrichment/depletion profiles

10 Detecting point binding positions

11 Comparing Binding Sites to Annotations Using the biomaRt package

Let us know if you have any questions. Please include your working folder and commands used in your email. Any comment and suggestion are welcome!