/n/shared_db

We started using /n/shared_db for public databases with the launch of the O2 cluster. The databases are organized in this folder structure:

Genome/software/Version/database

For example:

mm10/rsem/1.3.0/mm10

Exceptions:

/n/groups/shared_databases

This folder was created for an earlier cluster (Orchestra), but is still in use. The structure is like this: 

software/genomeVersion/database

Exceptions:

A quick way to find the desired databases: 

We have create a text file containing all the database names and paths. You can directly search the species and software: 

$ grep -i bowtie2  /n/shared_db/allDatabases.no.bcbio.singularity.rcbio.txt | grep -i hg19 | less

You should see (uk means unknown here): 

/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.ump
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.chrlist
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.rev.2.bt2
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.fa
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.n2g.idx.fa
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.3.bt2
/n/groups/shared_databases/rsem_indexes_bowtie2/hg19.transcripts.fa.fai
...
/n/groups/shared_databases/bowtie2_indexes/hg19.rev.2.bt2
/n/groups/shared_databases/bowtie2_indexes/hg19.fa
/n/groups/shared_databases/bowtie2_indexes/hg19.3.bt2
...
/n/groups/shared_databases/bowtie2_indexes/hg19.1.bt2
/n/groups/shared_databases/bowtie2_indexes/hg19.fai
/n/shared_db/hg19/uk/bowtie2
/n/shared_db/hg19/uk/bowtie2/2.2.9
/n/shared_db/hg19/uk/bowtie2/2.2.9/chrUn_gl000233.fa
...
/n/shared_db/hg19/uk/bowtie2/2.2.9/chr9_gl000198_random.fa
/n/shared_db/hg19/uk/bowtie2/2.2.9/chr6_apd_hap1.fa
/n/shared_db/hg19/uk/bowtie2/2.2.9/chr3.fa
/n/shared_db/hg19/uk/bowtie2/2.2.9/chr2.fa


Many of our modules also point to the relevant databases. You can use the module spider command to identify a module of interest, and then to find the appropriate databases. Here is an example for cellranger:

# look for available cellranger modules
$ module spider cellranger
# will report all cellranger modules on O2

# get more information on a given cellranger module,
# including where to find the relevant databases:
$ module spider cellranger/6.0.0
...
    Help:
      For detailed instructions, go to:
         https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger

         Currently available references:
         Human reference (GRCh38) is currently found at: /n/shared_db/GRCh38/uk/cellranger/6.0.0/6.0.0/refdata-gex-GRCh38-2020-A
         Human reference (hg19) is currently found at:   /n/shared_db/hg19/uk/cellranger/6.0.0/6.0.0/refdata-cellranger-hg19-3.0.0
         Mouse reference (mm10) is currently found at:   /n/shared_db/mm10/uk/cellranger/6.0.0/6.0.0/refdata-gex-mm10-2020-A
         Human/Mouse hybrid reference (GRCh38/mm10)
               is currently found at:                    /n/shared_db/GRCh38-mm10/uk/cellranger/6.0.0/6.0.0/refdata-gex-GRCh38_and_mm10-2020-A
         Human/Mouse hybrid reference (hg19/mm10)
               is currently found at:                    /n/shared_db/hg19-mm10/uk/cellranger/6.0.0/6.0.0/refdata-cellranger-hg19-and-mm10-3.0.0
         ERCC reference is currently found at:           /n/shared_db/misc/cellranger/6.0.0/6.0.0/refdata-cellranger-ercc92-1.2.0

         Other references will be added as they release.

         These paths may be subject to change, so please ensure that they are valid paths before running cellranger against them. If any are missing,
         please contact rchelp@hms.harvard.edu.