Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Have had a few users report that actual ram usage from latest dataset is much lower than 128. Affects jobs so much that priority drops.

ColabFold (https://github.com/sokrypton/ColabFold ) is an emerging protein folding prediction tool based on Google DeepMind’s Alphafold (see Using AlphaFold on O2 ). LocalColabFold (https://github.com/YoshitakaMo/localcolabfold ) is a packaging of ColabFold for use on local machines; we provide instructions on how to leverage LocalColabFold on O2 below. LocalColabFold uses MMseqs2 (conditionally faster than jackhmmer), and runs AlphaFold2 for single protein modeling and AlphaFold-Multimer for protein complex modeling. If you are unsure about which to use, feel free to try both tools and compare results.

...

Code Block
#!/bin/bash

#SBATCH -c 4                                 # Requested cores
#SBATCH --time=0-12:00                    # Runtime in D-HH:MM format
#SBATCH --partition=short                    # Partition to run in
#SBATCH --mem=128GB24GB                           # Requested Memory
#SBATCH -o %j.out                            # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                            # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL                      # ALL email notification type
#SBATCH --mail-user=<email_address>          # Email to which notifications will be sent

module load gcc/9.2.0 localcolabfold/1.5.2

colabfold_search \
--db-load-mode 2 \
--mmseqs mmseqs \
--use-env 1 \
--use-templates 0 \
--threads 4 \
/PATH/TO/INPUT.fasta /n/shared_db/misc/mmseqs2/14-7e284 /PATH/TO/OUTPUT/DIRECTORY

...

LocalColabFold is a repackaging of ColabFold for local use. This means that LocalColabFold requires all the same local hardware resources and connections that ColabFold would require (but without the Google Colab soft dependency). This includes the allowing of shipping the protein sequence to a remote server maintained by the ColabFold developers for processing during the alignment step. This server is shared by all users of ColabFold, and is not an HPC environment to our knowledge. This means that LARGE BATCHES OF PROTEIN ALIGNMENTS MUST BE GENERATED LOCALLY USING MMSEQS2, regardless of whether you are using the O2 module or your own installation on O2. At this time, the developers define large as “a few thousand” sequences. This could change, and is at the discretion of the system administrators maintaining the remote server. Please be considerate of other ongoing analysis on O2 when submitting large queries.

...