Public Databases


We started using /n/shared_db for public databases with the launch of the O2 cluster. The databases are organized in this folder structure:


For example:



  • Some databases were assembled by developers and their original folder structure was kept, as in:

    • igenome/03032016/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
  • bcbio databases are maintained by the Harvard Chan Bioinformatics Core , and installed this way: 

    • bcbio/biodata/genomes/Hsapiens


This folder was created for an earlier cluster (Orchestra), but is still in use. The structure is like this: 



  • There are some exceptions to the structure, such as: ignome and blastdb

A quick way to find the desired databases: 

We have create a text file containing all the database names and paths. You can directly search the species and software: 

$ grep bowtie2  /n/shared_db/ | grep hg19 | less

You should see (uk means unknown here): 

/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.ump /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.chrlist /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.rev.2.bt2 /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.fa /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.n2g.idx.fa /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.3.bt2 /n/groups/shared_databases/rsem_indexes_bowtie2/hg19.transcripts.fa.fai ... /n/groups/shared_databases/bowtie2_indexes/hg19.rev.2.bt2 /n/groups/shared_databases/bowtie2_indexes/hg19.fa /n/groups/shared_databases/bowtie2_indexes/hg19.3.bt2 ... /n/groups/shared_databases/bowtie2_indexes/hg19.1.bt2 /n/groups/shared_databases/bowtie2_indexes/hg19.fai /n/shared_db/hg19/uk/bowtie2 /n/shared_db/hg19/uk/bowtie2/2.2.9 /n/shared_db/hg19/uk/bowtie2/2.2.9/chrUn_gl000233.fa ... /n/shared_db/hg19/uk/bowtie2/2.2.9/chr9_gl000198_random.fa /n/shared_db/hg19/uk/bowtie2/2.2.9/chr6_apd_hap1.fa /n/shared_db/hg19/uk/bowtie2/2.2.9/chr3.fa /n/shared_db/hg19/uk/bowtie2/2.2.9/chr2.fa

Many of our modules also point to the relevant databases. You can use the module spider command to identify a module of interest, and then to find the appropriate databases. Here is an example for cellranger:

# look for available cellranger modules $ module spider cellranger # will report all cellranger modules on O2 # get more information on a given cellranger module, # including where to find the relevant databases: $ module spider cellranger/6.0.0 ... Help: For detailed instructions, go to: Currently available references: Human reference (GRCh38) is currently found at: /n/shared_db/GRCh38/uk/cellranger/6.0.0/6.0.0/refdata-gex-GRCh38-2020-A Human reference (hg19) is currently found at: /n/shared_db/hg19/uk/cellranger/6.0.0/6.0.0/refdata-cellranger-hg19-3.0.0 Mouse reference (mm10) is currently found at: /n/shared_db/mm10/uk/cellranger/6.0.0/6.0.0/refdata-gex-mm10-2020-A Human/Mouse hybrid reference (GRCh38/mm10) is currently found at: /n/shared_db/GRCh38-mm10/uk/cellranger/6.0.0/6.0.0/refdata-gex-GRCh38_and_mm10-2020-A Human/Mouse hybrid reference (hg19/mm10) is currently found at: /n/shared_db/hg19-mm10/uk/cellranger/6.0.0/6.0.0/refdata-cellranger-hg19-and-mm10-3.0.0 ERCC reference is currently found at: /n/shared_db/misc/cellranger/6.0.0/6.0.0/refdata-cellranger-ercc92-1.2.0 Other references will be added as they release. These paths may be subject to change, so please ensure that they are valid paths before running cellranger against them. If any are missing, please contact