Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: suggested sbatch in appropriate places, removed "$" suggesting interactive/srun steps

...

OpenFold has many features and modes that are more thoroughly described on the repository site. Below, we will focus on a simple (single protein) folding example to show how it can be run on O2. To access the module, run:

Code Block
$ module load gcc/9.2.0 openfold/1.0.1

Once you have loaded this module, you’ll want to submit your job to the gpu (or gpu_quad if you have access) partition so that you can leverage GPU resources (Using O2 GPU resources ). To see all the parameters related to OpenFold’s main function try running the following after loading the module:

Code Block
$ python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py -h

...

Code Block
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py -h

We can submit a job sbatch to generate the msas from input.fasta:

Code Block
module load openfold/1.0.1
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py \#!/bin/bash
#SBATCH -c 8            --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \        --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \     --pdb70_database_path /n/shared_db/alphafold/pdb70 \   #  --uniclust30_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \Number of cores
#SBATCH -t 0-8:00                 --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \          # Runtime in D-HH:MM format
#SBATCH -p short      --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \     --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \     --cpus_per_task 8 \     --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign \
    --mmcif_cache /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \ # Partition to run in
#SBATCH --mem=32G                     --raise_errors     /PATH/TO/INPUT.FASTA /PATH/TO/OUTPUT/DIR/

This script should generate a series of files in the output dir, including the msa files. The second method uses mmseqs2 to generate the MSA. Similar to the JackHMMER method, we can get help information by running:

Code Block
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py

Below is an example of a job script that uses mmseqs2 to run the alignment step.

Code Block
module load openfold/1.0.1
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py /PATH/TO/INPUT.FASTA \
    /n/shared_db/misc/mmseqs2/14-7e284 \
    uniref30_2202_db \
    /PATH/TO/OUTPUT/DIR/ \
    mmseqs # Memory total (for all cores)
#SBATCH --mail-type=ALL                     # ALL email notification type
#SBATCH --mail-user=<email_address>         # Email to which notifications will be sent
#SBATCH -o %j.out                           # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                           # File to which STDERR will be written, including job ID (%j)

module load gcc/9.2.0 openfold/1.0.1

python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments.py \
    --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /n/shared_db/alphafold/pdb70 \
    --uniclust30_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \
    --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \
    --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --cpus_per_task 2 \
    --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign \
    --mmcif_cache /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
    --raise_errors
    /PATH/TO/INPUT.FASTA /PATH/TO/OUTPUT/DIR/

This script should generate a series of files in the output dir, including the msa files. The second method uses mmseqs2 to generate the MSA. Similar to the JackHMMER method, we can get help information by running:

Code Block
$ python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py -h

Below is an example of a job script that uses mmseqs2 to run the alignment step.

Code Block
#!/bin/bash
#SBATCH -c 1                                # Number of cores
#SBATCH -t 0-8:00                           # Runtime in D-HH:MM format
#SBATCH -p short                            # Partition to run in
#SBATCH --mem=128G                          # Memory total (for all cores)
#SBATCH --mail-type=ALL                     # ALL email notification type
#SBATCH --mail-user=<email_address>         # Email to which notifications will be sent
#SBATCH -o %j.out                           # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                           # File to which STDERR will be written, including job ID (%j)

module load gcc/9.2.0 openfold/1.0.1 mmseqs2/14-7e284

python3 $OPENFOLDDIR/openfold/scripts/precompute_alignments_mmseqs.py /PATH/TO/INPUT.FASTA \
    /n/shared_db/misc/mmseqs2/14-7e284 \
    uniref30_2202_db \
    /PATH/TO/OUTPUT/DIR/ \
    mmseqs \
    --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --env_db colabfold_envdb_202108_db \
    --pdb70 /n/shared_db/alphafold/pdb70/pdb70

These outputs can be used in the next step by adding the option --use_precomputed_alignments with the path to your msa directory, for example:

Code Block
#!/bin/bash

#SBATCH --partition=gpu                      # Partition to run in
#SBATCH --gres=gpu:1                         # GPU resources requested
#SBATCH -c 4                                 # Requested cores
#SBATCH --time=0-8:00                        # Runtime in D-HH:MM format
#SBATCH --mem=32GB                           # Requested Memory
#SBATCH -o %j.out                            # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                            # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL                      # ALL email notification type
#SBATCH --mail-user=<email_address>          # Email to which notifications will be sent

module load gcc/9.2.0 openfold/1.0.1

python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \
    --skip_relaxation \
    --output_dir /PATH/TO/OUTPUT/DIR/ \
    --model_device "cuda:0" \
    --use_precomputed_alignments /PATH/TO/MSA/DIR \
    --config_preset "model_1_ptm" \
    --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
    --cpus 4 \
    /PATH/TO/INPUT/DIR \
    /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /n/shared_db/alphafold/pdb70 \
    --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08 \
    --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \
    --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \
    --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign

Running OpenFold

We have public databases available in /n/shared_db/ (Public Databases) and we can use these in the Openfold run command:

Note

Note: The input line /PATH/TO/FASTA/DIR/ must point to a directory that contains only .fasta files.

Monomer (single protein) Job Example:

Code Block
#!/bin/bash

#SBATCH --partition=gpu                      # Partition to run in
#SBATCH --gres=gpu:1                         # GPU resources requested
#SBATCH -c 4                                 # Requested cores
#SBATCH --time=0-8:00                        # Runtime in D-HH:MM format
#SBATCH --mem=32GB                           # Requested Memory
#SBATCH -o %j.out                            # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                            # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL                      # ALL email notification type
#SBATCH --mail-user=<email_address>          # Email to which notifications will be sent

module load openfold/1.0.1


python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \
    --skip_relaxation \
    --output_dir /PATH/TO/OUTPUT/DIR/ \
    --model_device "cuda:0" \
    --config_preset "model_1_ptm" \
    --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
    --cpus 4 \
    /PATH/TO/INPUT/DIR/ \
    /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \
    --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --hhsearchjackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --env_db colabfold_envdb_202108_dbjackhmmer \
    --pdb70 /n/shared_db/alphafold/pdb70/pdb70

These outputs can be used in the next step by adding the option --use_precomputed_alignments with the path to your msa directory, for example:

Code Block
python3hhblits_binary_path $OPENFOLDDIR/openfold/run_pretrained_openfold.py-conda/bin/hhblits \
    --skip_relaxationhhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --outputkalign_binary_dirpath $OPENFOLDDIR/PATHopenfold-conda/TO/OUTPUT/DIR/ \
    --model_device "cuda:0" \bin/kalign

Multimer (protein complex) Job Example:

Code Block
#!/bin/bash

#SBATCH --partition=gpu      --use_precomputed_alignments /PATH/TO/MSA/DIR \     --config_preset "model_1_ptm" \     --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
    --cpus 4 \  # Partition to run in
#SBATCH --gres=gpu:1       /PATH/TO/INPUT/DIR \     /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \     --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \       # GPU resources requested
#SBATCH -c 4         --pdb70_database_path /n/shared_db/alphafold/pdb70 \         --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08 \     --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \        # Requested cores
#SBATCH --time=0-8:00          --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \     --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \        --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign

Running OpenFold

We have public databases available in /n/shared_db/ (Public Databases) and we can use these in the Openfold run command:

Note

Note: The input line /PATH/TO/FASTA/DIR/ must point to a directory that contains only .fasta files.

Code Block
$ module load openfold/1.0.1
#Monomer Runs
$ python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \
    --skip_relaxation \
    --output_dir /PATH/TO/OUTPUT/DIR/ \
    --model_device "cuda:0" \
    --config_preset "model_1_ptm" \
    --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
    /PATH/TO/INPUT/DIR/ \
    /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \
    --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \
    --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \
    --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign

#Multimer Runs
$# Runtime in D-HH:MM format
#SBATCH --mem=32GB                           # Requested Memory
#SBATCH -o %j.out                            # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err                            # File to which STDERR will be written, including job ID (%j)
#SBATCH --mail-type=ALL                      # ALL email notification type
#SBATCH --mail-user=<email_address>          # Email to which notifications will be sent

python3 $OPENFOLDDIR/openfold/run_pretrained_openfold.py \
    --skip_relaxation \
    --output_dir /PATH/TO/OUTPUT/DIR/ \
    --model_device "cuda:0" \
    --config_preset "model_1_ptm" \
    --openfold_checkpoint_path $OPENFOLDDIR/openfold/openfold/resources/openfold_params/finetuning_ptm_2.pt \
    --cpus 4 \
    --multimer_ri_gap 200 \
    /PATH/TO/INPUT/DIR/ \
    /n/shared_db/alphafold/pdb_mmcif/mmcif_files/ \
    --uniref90_database_path /n/shared_db/alphafold/uniref90/uniref90.fasta \
    --mgnify_database_path /n/shared_db/alphafold/mgnify/mgy_clusters_2018_12.fa \
    --pdb70_database_path /n/shared_db/alphafold/pdb70/pdb70 \
    --uniclust30_database_path /n/shared_db/alphafold/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
    --bfd_database_path /n/shared_db/alphafold/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --jackhmmer_binary_path $OPENFOLDDIR/openfold-conda/bin/jackhmmer \
    --hhblits_binary_path $OPENFOLDDIR/openfold-conda/bin/hhblits \
    --hhsearch_binary_path $OPENFOLDDIR/openfold-conda/bin/hhsearch \
    --kalign_binary_path $OPENFOLDDIR/openfold-conda/bin/kalign

...