/
Public Databases

Public Databases



/n/shared_db

We started using /n/shared_db for public databases with the launch of the O2 cluster. The databases are organized in this folder structure:

Genome/software/Version/database

For example:

mm10/rsem/1.3.0/mm10

Exceptions:

  • Some databases were assembled by developers and their original folder structure was kept, as in:

    • igenome/03032016/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome
  • bcbio databases are maintained by the Harvard Chan Bioinformatics Core , and installed this way: 

    • bcbio/biodata/genomes/Hsapiens

/n/groups/shared_databases

This folder was created for an earlier cluster (Orchestra), but is still in use. The structure is like this: 

software/genomeVersion/database

Exceptions:

  • There are some exceptions to the structure, such as: ignome and blastdb

A quick way to find the desired databases: 

We have create a text file containing all the database names and paths. You can directly search the species and software: 

$ grep -i bowtie2  /n/shared_db/allDatabases.no.bcbio.singularity.rcbio.txt | grep -i hg19 | less

You should see (uk means unknown here): 


Many of our modules also point to the relevant databases. You can use the module spider command to identify a module of interest, and then to find the appropriate databases. Here is an example for cellranger:





Related content

RC workflows
RC workflows
Read with this
Starfish Zones and Tags Guide
Starfish Zones and Tags Guide
More like this
User Training
User Training
Read with this
Submitting data to GEO
Submitting data to GEO
More like this
Personal R Packages
Personal R Packages
Read with this
UCSC Genome Browser and IGV access for data on O2
UCSC Genome Browser and IGV access for data on O2
More like this