Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ColabFold (https://github.com/sokrypton/ColabFold ) is an emerging protein folding prediction tool based on Google DeepMind’s Alphafold (see Using AlphaFold on O2 ). LocalColabFold (https://github.com/YoshitakaMo/localcolabfold )is a packaging of ColabFold for use on local machines; we provide instructions on how to leverage LocalColabFold on O2 below. LocalColabFold uses MMseqs2 (conditionally faster than jackhmmer), and runs AlphaFold2 for single protein modeling and AlphaFold-Multimer for protein complex modeling. If you are unsure about which to use, feel free to try both tools and compare results.

...

Code Block
$ module load localcolabfold/.latest

A snapshot of the help text follows:

Code Block
$ module help localcolabfold/.latest

--------------------------- Module Specific Help for "localcolabfold/.latest" ----------------------------For detailed instructions, go to:
https://github.com/YoshitakaMo/localcolabfold

This module was last installed on February 9, 2022 using the latest colabfold commit as of ~4:30pm ET.

Due to frequency in development updates, this module may be reinstalled any time.
Please refer to the timestamp above for most recent installation time.

This module currently requires gcc/9.2.0 to be loaded due to requiring external cuda libraries.
If you are working under a different compiler stack (e.g. gcc/6.2.0), you may want to install this yourself
until we offer an updated version of the cuda module under a different compiler. Visit the repository website
for more information about how to install this yourself.

...

Code Block
module load gcc/9.2.0 cuda/11.2 localcolabfold/.latest

or, if LocalColabFold has been installed locally, please make sure it is visible in your PATH variable for loading. Once you have loaded these modules, you’ll want to submit your job to the gpu (or gpu_quad if you have access) partition so that you can leverage GPU resources (Using O2 GPU resources ). Parameters for using LocalColabFold through the command colabfold_batch can be shown by loading the modules above in an interactive session (https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#Interactive-Sessions ) and running:

...

Code Block
#!/bin/bash

#SBATCH --partition=gpu                      # Partition to run in
#SBATCH --gres=gpu:1                         # GPU resources requested
#SBATCH -c 1                                 # Requested cores
#SBATCH --time=0-12:00                    # Runtime in D-HH:MM format
#SBATCH --mem=25GB                           # Requested Memory
#SBATCH -o %j.out                            # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                            # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL                      # ALL email notification type
#SBATCH --mail-user=<email_address>          # Email to which notifications will be sent

module load gcc/6.2.0 cuda/11.2 localcolabfold/.latest

colabfold_batch --num-recycle 5 \
--model-type AlphaFold2-multimer \
--rank ptmscore \
/PATH/TO/INPUT.fasta \
/PATH/TO/OUTPUT/DIRECTORY

...

LocalColabFold is a repackaging of ColabFold for local use. This means that LocalColabFold requires all the same local hardware resources and connections that ColabFold would require (but without the Google Colab soft dependency). This includes the allowing of shipping the protein sequence to a remote server maintained by the ColabFold developers for processing during the alignment step. This server is shared by all users of ColabFold, and is not an HPC environment to our knowledge. This means that SUBMITTING LARGE BATCHES OF PROTEINS IS NOT RECOMMENDED AT THIS TIME, regardless of whether you are using the O2 module or your own installation on O2. Do note that we are unable to quantify “large”, as this is to the discretion of the system administrators maintaining the remote server.

...

Research Computing is working on a method to perform the alignment step locally, but it may result in longer processing times per protein. This page will be updated when this method is ready for consumption.

Troubleshooting/FAQ

...

Errors with using --amber or --templates

As noted above, occasionally jobs will fail if either of the above flags are enabled - this is a known issue and requires action from the ColabFold developers. For now, simply resubmit the job without these flags, or if these functions are required for your work, you can also try submitting your sequences against Alphafold (and adjust your resource requirements accordingly).

Please contact rchelp@hms.harvard.edu with any questions regarding the module or troubleshooting the installation process that this section does not address or addresses insufficiently. Depending on the question, we may need to refer you to the developers, but we will do our best to assist.