Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ColabFold (https://github.com/sokrypton/ColabFold ) is an emerging protein folding prediction tool based on Google DeepMind’s Alphafold (see Using AlphaFold on O2 ). LocalColabFold (https://github.com/YoshitakaMo/localcolabfold ) is a packaging of ColabFold for use on local machines; we provide instructions on how to leverage LocalColabFold on O2 below. LocalColabFold uses MMseqs2 (conditionally faster than jackhmmer), and runs AlphaFold2 for single protein modeling and AlphaFold-Multimer for protein complex modeling. If you are unsure about which to use, feel free to try both tools and compare results.

...

These commands can be combined with a sbatch (https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#The-sbatch-command ) script. The resources required to complete a LocalColabFold job may vary by structure and complexity. It is generally best to start with a relatively conservative request for resources, then increase as needed based on information from past jobs. This information can be found using commands like O2sacct (Get information about current and past jobs O2_jobs_report (https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1601699912/Get+information+about+current+and+past+jobs#O2_jobs_report ). Below is a simplified example of an sbatch script that runs the file INPUT.fasta against colabfold_search on the short partition:

...

LocalColabFold is a repackaging of ColabFold for local use. This means that LocalColabFold requires all the same local hardware resources and connections that ColabFold would require (but without the Google Colab soft dependency). This includes the allowing of shipping the protein sequence to a remote server maintained by the ColabFold developers for processing during the alignment step. This server is shared by all users of ColabFold, and is not an HPC environment to our knowledge. This means that LARGE BATCHES OF PROTEIN ALIGNMENTS MUST BE GENERATED LOCALLY USING MMSEQS2, regardless of whether you are using the O2 module or your own installation on O2. At this time, the developers define large as “a few thousand” sequences. This could change, and is at the discretion of the system administrators maintaining the remote server. Please be considerate of other ongoing analysis on O2 when submitting large queries.

...