Public Databases


We started using /n/shared_db for public databases with the launch of the O2 cluster. The databases are organized in this folder structure:


For example:



  • Some databases were assembled by developers and their original folder structure was kept, as in:

    • igenome/03032016/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
  • bcbio databases are maintained by the Harvard Chan Bioinformatics Core , and installed this way: 

    • bcbio/biodata/genomes/Hsapiens


This folder was created for an earlier cluster (Orchestra), but is still in use. The structure is like this: 



  • There are some exceptions to the structure, such as: ignome and blastdb

A quick way to find the desired databases: 

We have create a text file containing all the database names and paths. You can directly search the species and software: 

$ grep -i bowtie2  /n/shared_db/ | grep -i hg19 | less

You should see (uk means unknown here): 

Many of our modules also point to the relevant databases. You can use the module spider command to identify a module of interest, and then to find the appropriate databases. Here is an example for cellranger: