...
OpenFold has many features and modes that are more thoroughly described on the repository site. Below, we will focus on a simple (single protein) folding example to show how it can be run on O2. To access the module, run:
Code Block |
---|
$ module load gcc/9.2.0 openfold/1.0.1 |
Once you have loaded this module, you’ll want to submit your job to the gpu
(or gpu_quad
if you have access) partition so that you can leverage GPU resources (Using O2 GPU resources ). To see all the parameters related to OpenFold’s main function try running the following after loading the module:
Code Block |
---|
$ python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py -h |
...
Code Block |
---|
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py -h |
We can submit a job sbatch to generate the msas from input.fasta
:
Code Block |
---|
module load openfold/1.0.1 $ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py \#!/bin/bash #SBATCH -c 8 --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path /n/shared_db/alphafold/pdb70 \ # --uniclust30_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \Number of cores #SBATCH -t 0-8:00 --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \ # Runtime in D-HH:MM format #SBATCH -p short --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \ --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --cpus_per_task 8 \ --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign \ --mmcif_cache /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ # Partition to run in #SBATCH --mem=32G --raise_errors /PATH/TO/INPUT.FASTA /PATH/TO/OUTPUT/DIR/ |
This script should generate a series of files in the output dir, including the msa files. The second method uses mmseqs2
to generate the MSA. Similar to the JackHMMER
method, we can get help information by running:
Code Block |
---|
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py |
Below is an example of a job script that uses mmseqs2
to run the alignment step.
Code Block |
---|
module load openfold/1.0.1 $ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py /PATH/TO/INPUT.FASTA \ /n/shared_db/misc/mmseqs2/14-7e284 \ uniref30_2202_db \ /PATH/TO/OUTPUT/DIR/ \ mmseqs # Memory total (for all cores) #SBATCH --mail-type=ALL # ALL email notification type #SBATCH --mail-user=<email_address> # Email to which notifications will be sent #SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j) #SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j) module load gcc/9.2.0 openfold/1.0.1 python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py \ --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path /n/shared_db/alphafold/pdb70 \ --uniclust30_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \ --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \ --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --cpus_per_task 2 \ --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign \ --mmcif_cache /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ --raise_errors /PATH/TO/INPUT.FASTA /PATH/TO/OUTPUT/DIR/ |
This script should generate a series of files in the output dir, including the msa files. The second method uses mmseqs2
to generate the MSA. Similar to the JackHMMER
method, we can get help information by running:
Code Block |
---|
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py -h |
Below is an example of a job script that uses mmseqs2
to run the alignment step.
Code Block |
---|
#!/bin/bash
#SBATCH -c 1 # Number of cores
#SBATCH -t 0-8:00 # Runtime in D-HH:MM format
#SBATCH -p short # Partition to run in
#SBATCH --mem=128G # Memory total (for all cores)
#SBATCH --mail-type=ALL # ALL email notification type
#SBATCH --mail-user=<email_address> # Email to which notifications will be sent
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j)
module load gcc/9.2.0 openfold/1.0.1 mmseqs2/14-7e284
python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py /PATH/TO/INPUT.FASTA \
/n/shared_db/misc/mmseqs2/14-7e284 \
uniref30_2202_db \
/PATH/TO/OUTPUT/DIR/ \
mmseqs \
--hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
--env_db colabfold_envdb_202108_db \
--pdb70 /n/shared_db/alphafold/pdb70/pdb70
|
These outputs can be used in the next step by adding the option --use_precomputed_alignments
with the path to your msa directory, for example:
Code Block |
---|
#!/bin/bash
#SBATCH --partition=gpu # Partition to run in
#SBATCH --gres=gpu:1 # GPU resources requested
#SBATCH -c 4 # Requested cores
#SBATCH --time=0-8:00 # Runtime in D-HH:MM format
#SBATCH --mem=32GB # Requested Memory
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL # ALL email notification type
#SBATCH --mail-user=<email_address> # Email to which notifications will be sent
module load gcc/9.2.0 openfold/1.0.1
python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \
--skip_relaxation \
--output_dir /PATH/TO/OUTPUT/DIR/ \
--model_device "cuda:0" \
--use_precomputed_alignments /PATH/TO/MSA/DIR \
--config_preset "model_1_ptm" \
--openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
--cpus 4 \
/PATH/TO/INPUT/DIR \
/n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
--uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
--mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path /n/shared_db/alphafold/pdb70 \
--uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08 \
--bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \
--hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \
--hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
--kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign |
Running OpenFold
We have public databases available in /n/shared_db/ (Public Databases) and we can use these in the Openfold run command:
Note |
---|
Note: The input line |
Monomer (single protein) Job Example:
Code Block |
---|
#!/bin/bash #SBATCH --partition=gpu # Partition to run in #SBATCH --gres=gpu:1 # GPU resources requested #SBATCH -c 4 # Requested cores #SBATCH --time=0-8:00 # Runtime in D-HH:MM format #SBATCH --mem=32GB # Requested Memory #SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j) #SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j) #SBATCH --mail-type=ALL # ALL email notification type #SBATCH --mail-user=<email_address> # Email to which notifications will be sent module load openfold/1.0.1 python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \ --skip_relaxation \ --output_dir /PATH/TO/OUTPUT/DIR/ \ --model_device "cuda:0" \ --config_preset "model_1_ptm" \ --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \ --cpus 4 \ /PATH/TO/INPUT/DIR/ \ /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \ --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --hhsearchjackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --env_db colabfold_envdb_202108_dbjackhmmer \ --pdb70 /n/shared_db/alphafold/pdb70/pdb70 |
These outputs can be used in the next step by adding the option --use_precomputed_alignments
with the path to your msa directory, for example:
Code Block |
---|
python3hhblits_binary_path $OPENFOLDDIR/openfold/run_pretrained_openfold.py-conda/bin/hhblits \ --skip_relaxationhhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --outputkalign_binary_dirpath $OPENFOLDDIR/PATHopenfold-conda/TO/OUTPUT/DIR/ \ --model_device "cuda:0" \bin/kalign |
Multimer (protein complex) Job Example:
Code Block |
---|
#!/bin/bash #SBATCH --partition=gpu --use_precomputed_alignments /PATH/TO/MSA/DIR \ --config_preset "model_1_ptm" \ --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \ --cpus 4 \ # Partition to run in #SBATCH --gres=gpu:1 /PATH/TO/INPUT/DIR \ /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ # GPU resources requested #SBATCH -c 4 --pdb70_database_path /n/shared_db/alphafold/pdb70 \ --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08 \ --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \ # Requested cores #SBATCH --time=0-8:00 --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \ --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign |
Running OpenFold
We have public databases available in /n/shared_db/ (Public Databases) and we can use these in the Openfold run command:
Note |
---|
Note: The input line |
Code Block |
---|
$ module load openfold/1.0.1 #Monomer Runs $ python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \ --skip_relaxation \ --output_dir /PATH/TO/OUTPUT/DIR/ \ --model_device "cuda:0" \ --config_preset "model_1_ptm" \ --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \ /PATH/TO/INPUT/DIR/ \ /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \ --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \ --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \ --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign #Multimer Runs $# Runtime in D-HH:MM format #SBATCH --mem=32GB # Requested Memory #SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j) #SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j) #SBATCH --mail-type=ALL # ALL email notification type #SBATCH --mail-user=<email_address> # Email to which notifications will be sent python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \ --skip_relaxation \ --output_dir /PATH/TO/OUTPUT/DIR/ \ --model_device "cuda:0" \ --config_preset "model_1_ptm" \ --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \ --cpus 4 \ --multimer_ri_gap 200 \ /PATH/TO/INPUT/DIR/ \ /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \ --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \ --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \ --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \ --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \ --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \ --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \ --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign |
...