Public Databases



/n/shared_db

We started using /n/shared_db for public databases with the launch of the O2 cluster. The databases are organized in this folder structure:

Genome/software/Version/database

For example:

mm10/rsem/1.3.0/mm10

Exceptions:

  • Some databases were assembled by developers and their original folder structure was kept, as in:

    • igenome/03032016/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
  • bcbio databases are maintained by the Harvard Chan Bioinformatics Core , and installed this way: 

    • bcbio/biodata/genomes/Hsapiens

/n/groups/shared_databases

This folder was created for an earlier cluster (Orchestra), but is still in use. The structure is like this: 

software/genomeVersion/database

Exceptions:

  • There are some exceptions to the structure, such as: ignome and blastdb

A quick way to find the desired databases: 

We have create a text file containing all the database names and paths. You can directly search the species and software: 

$ grep -i bowtie2  /n/shared_db/allDatabases.no.bcbio.singularity.rcbio.txt | grep -i hg19 | less

You should see (uk means unknown here): 


Many of our modules also point to the relevant databases. You can use the module spider command to identify a module of interest, and then to find the appropriate databases. Here is an example for cellranger: