Page Comparison

...

Note: If you’re new to Slurm or O2, please see Using Slurm Basic for lots of information on submitting jobs.

AlphaFold is also offered by BioGrids as part of their software suite (Using Software Provided by BioGrids ). If you need assistance with using the BioGrids offering, please contact help@biogrids.org.

Before starting: if you are working with a single protein, check to see if it has not already been previously computed. A full database can be found at https://alphafold.ebi.ac.uk/ . It may save you a lot of time!

How to load and use the AlphaFold module

Here are the instructions to run AlphaFold in an interactive session. Since AlphaFold takes hours to run, you will more likely want to submit a batch job, which is described later.

The following flags are mandatory for invoking AlphaFold:

--fasta_paths specifies the location of your fasta files (this cannot be a directory, but it can be a comma-separated list of full paths.
--max_template_date specifies the latest date to reference when matching against templates. As of 2.2.0, there is no way to “turn off” templates - you can simply provide a very early date to make sure no templates survive the date filter, e.g. --max_template_date=1950-01-01 .
--output_dir specifies the directory where your output will be written to.

The --data_dir flag is not mandatory, but it will point to /n/shared_db/alphafold by default (for versions before 2.3.1), where RC has centrally downloaded the databases. If you would rather use your own (not recommended due to requiring approximately 2T of free space), feel free to set this flag with the corresponding location.

Note
If you are using version 2.3.1, please use `/n/shared_db/alphafold-2.3`for `--data-dir`, as the model parameters have also changed (as well as some databases) for this release. If you are copy/pasting any of the below templates, please make sure to edit them accordingly.

You can invoke alphafold.py -h for more information about these, and other optional flags and their options.

The following is an example invocation of alphafold.py with a placeholder output path, including the module load step:

Code Block
$ module load alphafold/2.2.0 alphafold.py --fasta_paths=/path/to/fastafile --max_template_date=2020-05-14 --db_preset=full_dbs --output_dir=/path/to/output --data_dir=/n/shared_db/alphafold/

Note
You MUST pass the `--data_dir=/n/shared_db/alphafold/` flag as in the above example
Note
You As mentioned above, you MUST provide full paths for any fasta files passed to alphafold.py.

Example Submission Template

...

Code Block

#!/bin/bash

#SBATCH --partition=<INSERT NAME OF GPU PARTITION HERE>
#SBATCH --gres=gpu:21
#SBATCH -c 8
#SBATCH --time=5-0:00:00
#SBATCH --mem=50G
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<YOUR_EMAIL_ADDRESS>

###

module load alphafold/2.1.1 gcc/6.2.0 cuda/11.2

alphafold.py --fasta_paths=</PATH/TO/>INPUT.fasta \
--is_prokaryote_list=false \
--max_template_date=2022-01-01 \
--db_preset=full_dbs \
--model_preset=multimer \
--output_dir=</PATH/TO/OUTPUT/DIRECTORY> \
--data_dir=/n/shared_db/alphafold/

Note
AlphaFold does NOT support multiple GPUs. Please refrain from requesting more than one GPU per `alphafold.py` invocation, as this will not speed up your run time, and will inhibit your ability to have your job dispatched in a timely manner.

For more sbatch customization options, you can refer to https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#sbatch-options-quick-reference . You can then submit this script via sbatch SUBMIT.sh from the terminal on O2.

...

The number of threads/cores was hardcoded within the AlphaFold internal code, you should request ~8 CPU cores when submitting AlphaFold jobs. Requesting more cores will not improve performance, and it will make your job pend longer before it starts.
The required database was downloaded into a centralized location (i.e., /n/shared_db/alphafold/ for < 2.3.1, and /n/shared_db/alphafold-2.3 for 2.3.1) for the benefit of all O2 users. Please don’t don't download the 2 terabytes yourself, as that would be a waste of space.
Submit your AlphaFold jobs to a GPU partition. For more about the GPU partitions, please visit our wiki page - Using O2 GPU resources.
For version 2.2.0, submissions will assume you are running on a GPU by default. If for some reason you desire to run explicitly on CPU, please specify the --use_cpu flag.
As of version 2.2.0, the base implementation has changed how the amber relaxation step is requested by the user. If you would like to run your analysis with WITHOUT the relaxed models with the 2.2.0 module, please include the --no_run_relax flag (as well as the .
There is a known issue with Alphafold (all versions) not being able to successfully perform the relaxation step if requested via GPU (the default option); we can only recommend that users refrain from utilizing the relaxation option until the developers address this. We have received user reports that using CPU relaxation is successful, however (--enable_gpucpu_relax flag if you are submitting to a GPU partition).For version 2.2.0, submissions will assume you are running on a GPU by default. If for some reason you desire to run explicitly on CPU, you will have to submit to a partition that does not have GPUs (e.g. short, etc.).), so users can attempt to use this flag if relaxation is required.
AlphaFold jobs may fail with Out of Memory included in the .out or .err of the job. This is referring to VRAM or ram automatically allocated on GPU cards. We could try running these larger complexes using cpu-only. This will make them run slower, but they won't be bottle-necked by the maximum VRAM available on a GPU node. To view the amount of VRAM available on any one card try a command like:
Code Block
$ sinfo --Format=nodehost,available,memory,statelong,gres:40 -p gpu,gpu_quad

If you have any questions, you can email us at rchelp@hms.harvard.edu.

Versions Compared

Old Version 18

New Version Current

Key

How to load and use the AlphaFold module

Example Submission Template