NOTICE: FULL O2 Cluster Outage, January 3 - January 10th
O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10
- on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
- on Jan 3 (6:00 PM): O2 systems will start being powered off.
This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.
Specifically:
- The O2 Cluster will be completely offline, including O2 Portal.
- All data on O2 will be inaccessible.
- Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
- Websites on O2 will be completely offline, including all web content.
More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation
RoseTTAFold All-Atom on O2
HMS IT has received numerous requests for RoseTTAFold All-Atom to be installed as a global module to be accessed via the module load
system. Unfortunately, due to the way that this application is configured, and its general unfriendliness to shared environments, we are unable to offer it at this time.
Below is an outline for how to install RoseTTAFold All-Atom locally (i.e., into a home directory or group directory). We can make no guarantees as to runtime correctness (or even that it will run at all), but these instructions are confirmed to result in a “complete” installation of RoseTTAFold All-Atom. Any errors related to post-installation configuration may need to be raised with the developers, but users are welcome to contact rchelp@hms.harvard.edu to initially triage any issues with the process.
This document is designed to be read and followed from top to bottom. If you skip around, troubleshooting may become difficult - we recommend users that are stuck mid-way to delete everything and start over from the very beginning of the process and see if their error persists. (The Table of Contents above is provided as a courtesy.)
Prerequisites to the prerequisites
These instructions are adapted from those provided via the RoseTTAFold All-Atom README file.
First, determine where to install RoseTTAFold All-Atom. This location will involve two main components:
the conda environment
a
git clone
of the repository.
We will use /path/to/install
as a placeholder for this location. First, create (if necessary) and navigate to this directory:
mkdir -p /path/to/install
cd /path/to/install
From here, clone the repository and navigate into it:
git clone https://github.com/baker-laboratory/RoseTTAFold-All-Atom.git
cd RoseTTAFold-All-Atom
Now, make sure you have a conda distribution available, with mamba
installed. We will use our miniconda3/23.1.0
module here.
module load miniconda3/23.1.0
Note that if you have additional modules loaded (such as modules that load on login), you may encounter conflicts and errors, so running module purge
to unload all modules first is recommended before loading the miniconda module.
Installation of the conda environment
Currently, you should be in /path/to/install/RoseTTAFold-All-Atom
.
Before creating the environment, you need to open the environment.yaml
file in a text editor, such as nano
, and comment out all instances of tensorflow
. This includes:
tensorflow-base
tensorflow-estimator
tensorflow
Navigate to the lines in the file where these three packages are specified, and insert a #
at the front of each line, then save and exit the text editor.
Now, with access to a mamba
-enabled conda distribution, run the following command:
mamba env create -f environment.yaml
This will by default create a conda environment named RFAA
in $HOME/.conda/envs/RFAA
. If you would rather the environment live elsewhere (such as /path/to/install
, you can run this command instead:
mamba env create -p /path/to/install/RFAA -f environment.yaml
And this will create the conda environment at /path/to/install/RFAA
instead.
Activate the environment:
source activate /path/to/install/RFAA
if installed to /path/to/install
(via the -p
flag) or just
source activate RFAA
if installed to the default location (using the first command).
Next, manually pip3
install the tensorflow components that were commented out of the environment.yaml
file. At the time of writing, these packages are version 2.11.0, so that’s what we install:
pip3 install tensorflow==2.11.0 tensorflow-estimator==2.11.0
You may see that they’ve already been installed (along with the requisite dependencies). You can confirm this by running the following:
python3 -c "from tensorflow import estimator"
This command will verify that 1) tensorflow is installed, and 2) that the estimator module is accessible. tensorflow-base
is just an anaconda-specific method of packaging only the primary components of tensorflow, but since we installed it via pip3
instead, we don’t need to worry about it.
Installation of the prerequisites
Now we install the separated dependencies that are specified by the README.md
file.
We follow the example (at the time of writing) and grab the “fast” variant of the tarball at this location:
https://services.healthtech.dtu.dk/services/SignalP-6.0/
(Click “Downloads”, click “Fast”, then accept the license to download the package.)
Move this package onto O2 into the /path/to/install
location via whatever file transfer protocol operation you are familiar with. We have a page that outlines popular methods of file transfer. In our terminal session, we navigate back to this folder:
cd ..
(if you’ve been doing some other stuff in this terminal, you may need to do cd /path/to/install
explicitly.)
As instructed, you should have a file here that is called something like signalp-6.0h.fast.tar.gz
. Run the following command:
signalp-register signalp-6.0h.fast.tar.gz
Then:
mv $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/distilled_model_signalp6.pt $CONDA_PREFIX/lib/python3.10/site-packages/signalp/model_weights/ensemble_model_signalp6.pt
to rename the weights. Finally, you can run the following:
bash RoseTTAFold-All-Atom/install_dependencies.sh
The model weights are already available centrally and are available at:
/n/shared_db/RoseTTAFold/All-Atom_weights
You may choose to download them yourself anyway, with the following command:
wget http://files.ipd.uw.edu/pub/RF-All-Atom/weights/RFAA_paper_weights.pt
See the Databases section regarding databases.
Finally, follow the instructions to install the specified version of BLAST:
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz
mkdir -p blast-2.2.26
tar -xf blast-2.2.26-x64-linux.tar.gz -C blast-2.2.26
cp -r blast-2.2.26/blast-2.2.26/ blast-2.2.26_bk
rm -r blast-2.2.26
mv blast-2.2.26_bk/ blast-2.2.26
At this point, /path/to/install
should have several directories/files:
the conda environment (
RFAA
), if installed using the-p
flag (otherwise, this lives at$HOME/.conda/envs/RFAA
instead)the git repository (
RoseTTAFold-All-Atom
)a
cs-blast-2.2.3
directory, created byinstall_dependencies.sh
abovea
blast-2.2.26
directory, created by following the instructions aboveRFAA_paper_weights.pt
(if you chose to download this yourself instead of using the one in/n/shared_db
)
Databases
Databases do not need to be manually installed - they are accessible at:
/n/shared_db/RoseTTAFold
Feel free to browse this folder to confirm specific database types.
Post-installation configuration
HMS IT will not be able to offer much guidance here - configuration details will be dependent on the user workflow and data to be processed.
We can provide some suggestions for defaults, however. The main configuration files are located at /path/to/install/RoseTTAFold-All-Atom/rf2aa/config/inference
.
We provide the following suggested paramters for the base.yaml
file specifically:
checkpoint_path: RFAA_paper_weights.pt
should become:
checkpoint_path: "/path/to/install/RFAA_paper_weights.pt"
if downloaded locally, or
checkpoint_path: "/n/shared_db/RoseTTAFold/All-Atom_weights/RFAA_paper_weights.pt"
if using the central file.
sequencedb:
should point to the appropriate database in /shared_db/RoseTTAFold
.
hhdb: "pdb100_2021Mar03/pdb100_2021Mar03"
should become:
hhdb: "/n/shared_db/RoseTTAFold/pdb100_2021Mar03/pdb100_2021Mar03"
It should be noted that the default working directory for invoking the All-Atom workflow appears to be at the root of the GitHub repository (that is, /path/to/install/RoseTTAFold-All-Atom
). The other configuration files may need to be modified accordingly. The above changes are necessary because the default instructions set up everything inside this root level. If you’d rather do this, you may decide to cd
directly into this folder before executing the commands specified in the Installation of the prerequisites section. This will result in the following directories and files existing within /path/to/install/RoseTTAFold-All-Atom
instead of at /path/to/install
as specified previously:
the
cs-blast-2.2.3
directorythe
blast-2.2.26
directoryRFAA_paper_weights.pt
The databases would live here as well, but due to the large storage footprint required, we strongly recommend using the files living at /n/shared_db/RoseTTAFold
instead.
Execution
Note the specifications in base.yaml
for num_cpus
and mem
; this will dictate the SLURM resources requested for O2 jobs. You may also wish to change these values accordingly.
At this point, there are various numbers of ways for things to go horribly wrong; please contact rchelp@hms.harvard.edu with your questions (and please provide terminal output as well as installation locations). We can attempt to assist, but may ultimately point you toward creating an issue on the GitHub repository.