...
Note: If you’re new to Slurm or O2, please see Using Slurm Basic for lots of information on submitting jobs.
AlphaFold is also offered by BioGrids as part of their software suite (Using Software Provided by BioGrids ). If you need assistance with using the BioGrids offering, please contact help@biogrids.org.
Before starting: if you are working with a single protein, check to see if it has not already been previously computed. A full database can be found at https://alphafold.ebi.ac.uk/ . It may save you a lot of time!
How to load and use the AlphaFold module
Here are the instructions to run AlphaFold in an interactive session. Since AlphaFold takes hours to run, you will more likely want to submit a batch job, which is described later.
The following flags are mandatory for invoking AlphaFold:
--fasta_paths
specifies the location of your fasta files (this cannot be a directory, but it can be a comma-separated list of full paths.--max_template_date
specifies the latest date to reference when matching against templates. As of 2.2.0, there is no way to “turn off” templates - you can simply provide a very early date to make sure no templates survive the date filter, e.g.--max_template_date=1950-01-01
.--output_dir
specifies the directory where your output will be written to.
The --data_dir
flag is not mandatory, but it will point to /n/shared_db/alphafold
by default (for versions before 2.3.1), where RC has centrally downloaded the databases. If you would rather use your own (not recommended due to requiring approximately 2T of free space), feel free to set this flag with the corresponding location.
Note |
---|
If you are using version 2.3.1, please use |
You can invoke alphafold.py -h
for more information about these, and other optional flags and their options.
The following is an example invocation of alphafold.py
with a placeholder output path, including the module load step:
Code Block |
---|
$ module load alphafold/2.2.0 alphafold.py --fasta_paths=/path/to/fastafile --max_template_date=2020-05-14 --db_preset=full_dbs --output_dir=/path/to/output --data_dir=/n/shared_db/alphafold/ |
Note |
---|
You MUST pass the |
Note |
You As mentioned above, you MUST provide full paths for any fasta files passed to alphafold.py. |
Example Submission Template
...
Code Block |
---|
#!/bin/bash #SBATCH --partition=<INSERT NAME OF GPU PARTITION HERE> #SBATCH --gres=gpu:21 #SBATCH -c 8 #SBATCH --time=5-0:00:00 #SBATCH --mem=50G #SBATCH --mail-type=ALL #SBATCH --mail-user=<YOUR_EMAIL_ADDRESS> ### module load alphafold/2.1.1 gcc/6.2.0 cuda/11.2 alphafold.py --fasta_paths=</PATH/TO/>INPUT.fasta \ --is_prokaryote_list=false \ --max_template_date=2022-01-01 \ --db_preset=full_dbs \ --model_preset=multimer \ --output_dir=</PATH/TO/OUTPUT/DIRECTORY> \ --data_dir=/n/shared_db/alphafold/ |
Note |
---|
AlphaFold does NOT support multiple GPUs. Please refrain from requesting more than one GPU per |
For more sbatch
customization options, you can refer to https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1586793632/Using+Slurm+Basic#sbatch-options-quick-reference . You can then submit this script via sbatch SUBMIT.sh
from the terminal on O2.
...
The number of threads/cores was hardcoded within the AlphaFold internal code, you should request ~8 CPU cores when submitting AlphaFold jobs. Requesting more cores will not improve performance, and it will make your job pend longer before it starts.
The required database was downloaded into a centralized location (i.e.,
/n/shared_db/alphafold/
for < 2.3.1, and/n/shared_db/alphafold-2.3
for 2.3.1) for the benefit of all O2 users. Please don’t don't download the 2 terabytes yourself, as that would be a waste of space.Submit your AlphaFold jobs to a GPU partition. For more about the GPU partitions, please visit our wiki page - Using O2 GPU resources.
For version 2.2.0, submissions will assume you are running on a GPU by default. If for some reason you desire to run explicitly on CPU, please specify the
--use_cpu
flag.As of version 2.2.0, the base implementation has changed how the amber relaxation step is requested by the user. If you would like to run your analysis with WITHOUT the relaxed models with the 2.2.0 module, please include the
--no_run_relax
flag (as well as the .There is a known issue with Alphafold (all versions) not being able to successfully perform the relaxation step if requested via GPU (the default option); we can only recommend that users refrain from utilizing the relaxation option until the developers address this. We have received user reports that using CPU relaxation is successful, however (
--enable_gpucpu_relax
flag if you are submitting to a GPU partition).For version 2.2.0, submissions will assume you are running on a GPU by default. If for some reason you desire to run explicitly on CPU, you will have to submit to a partition that does not have GPUs (e.g.short
, etc.).), so users can attempt to use this flag if relaxation is required.AlphaFold jobs may fail with
Out of Memory
included in the .out or .err of the job. This is referring to VRAM or ram automatically allocated on GPU cards. We could try running these larger complexes using cpu-only. This will make them run slower, but they won't be bottle-necked by the maximum VRAM available on a GPU node. To view the amount of VRAM available on any one card try a command like:Code Block $ sinfo --Format=nodehost,available,memory,statelong,gres:40 -p gpu,gpu_quad
If you have any questions, you can email us at rchelp@hms.harvard.edu.