NOTICE: FULL O2 Cluster Outage, January 3 - January 10th

O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10

  • on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
  • on Jan 3 (6:00 PM): O2 systems will start being powered off.

This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.

Specifically:

  • The O2 Cluster will be completely offline, including O2 Portal.
  • All data on O2 will be inaccessible.
  • Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
  • Websites on O2 will be completely offline, including all web content.

More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation

spp (1.15.2) for Chip-Seq



SPP is a R package especially designed for the analysis of Chip-Seq data. The package was developed by Peter Park's group from Harvard Medical School. Here is the the original nature paper:http://www.nature.com/nbt/journal/v26/n12/full/nbt.1508.html

Here is the original tutorial created by Peter Kharchenk from the same group: http://compbio.med.harvard.edu/Supplements/ChIP-seq/tutorial.html

Based paper and tutorial above, we created a guide to use the package on O2 . Please notice: part of this guild was from the material mentioned above.

We will calculate smoothed read enrichment profile and identify sites that significantly enriched compared to control. IGB or IGV can be used to browse the data.

We will also use the biomaRt package to download annotation and identify genes within 2kb of the transcription start site (TSS).

Note: You can copy and paste all the text to your Linux command line to run. Anything with "#" is comment, and will be IGNORED by Linux or R. I have run the pipeline on testing data. All the output files can be found in /n/groups/shared_databases/rcbio/spp/output

1 Log on to O2

If you need help connecting to O2, please review the How to login to O2 wiki page.

From Windows, use MobaXterm (preferred) or PuTTY to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your HMS ID instead of user123:

ssh user123@o2.hms.harvard.edu

2 Start interactive job with 4G memory and 2 CPU cores , and create working folder

srun --pty -p interactive -t 0-12:0:0 --mem 4G -n 2 /bin/bash mkdir /n/scratch/users/${USER:0:1}/$USER/sppTest && cd /n/scratch/users/${USER:0:1}/$USER/sppTest

3 Load bowtie2 module and run mapping. (note: I already run mapping. Copy them if you don't want to re-run bowtie2: /n/groups/shared_databases/rcbio/spp/input)

# This will setup the path and environmental variables for the pipeline module load gcc/6.2.0 bowtie2/2.2.9 samtools/0.1.19 read_file=/n/groups/shared_databases/rcbio/spp/input/SRR1002328_1.fastq bowtie2 -p 2 -x /n/groups/shared_databases/bowtie2_indexes/d_melanogaster_fb5_22 -U $read_file | samtools view -bS - > SRR1002328.bam read_file=/n/groups/shared_databases/rcbio/spp/input/SRR1002329_1.fastq bowtie2 -p 2 -x /n/groups/shared_databases/bowtie2_indexes/d_melanogaster_fb5_22 -U $read_file | samtools view -bS - > SRR1002329.bam

4 Start R and load spp and bioMaRt packages

5 Read in bowtie alignments

6 Identify binding characteristics

7 Select informative tags based on the binding characteristics

8 Remove singular positions with very high tag counts

9 Calculate genome-wide tag density and tag enrichment/depletion profiles

10 Detecting point binding positions

11 Comparing Binding Sites to Annotations Using the biomaRt package

Let us know if you have any questions. Please include your working folder and commands used in your email. Any comment and suggestion are welcome!