Conda on O2
On O2, we encourage cluster users to install the packages and software they need. One method to install packages and manage environments is to use conda, which is available through our miniconda modules. Conda manages dependencies by default when you install packages, which can make it easier to install software. Packages that can be installed with conda include Python modules, libraries, or executable programs.
Commonly used commands, examples
Command | Meaning |
|---|---|
| shows the versions of conda installed on O2 |
| loads an individual conda module (replace flavor and version with an actual flavor and version) |
| see available conda environments |
| create conda environment named test_env (name the environment whatever you'd like) |
| create conda environment, and install some packages (bwa, bowtie, and star) on the fly |
| "activate" a conda environment located at |
| exit current conda environment |
| delete a conda environment named test_env (may also require full path if installed to nonstandard location) |
| search for a package (replace numpy with the package of your choice) |
| install a package, and must be within a conda environment or this command will fail. (replace numpy with the package of your choice) |
Setup
To install packages on O2 using conda, you must first create a conda environment. Environments are simply directories in ~/.conda/envs/ that contain packages you installed. You "source" an environment to use those packages, and can "deactivate" to exit the environment. You can have multiple environments, and can switch between them.
First let's get into an interactive session, as installing conda packages is resource intensive and should not be done on the login nodes.
mfk8@login01:~$ srun --pty -p interactive -t 0-2 bashNext, load the conda module:
mfk8@compute-a-01-01:~$ module load conda/miniforge3/24.11.3-0Then the conda command will be available:
mfk8@compute-a-01-01:~$ which conda
(with the conda/miniforge3/24.11.3-0 module loaded, this will return a bash function)Running conda info will return information about the current conda installation:
mfk8@compute-a-01-01:~$ conda info
...(snip)
base environment : /n/app/conda/miniforge3/24.11.3-0 (read only)
...(snip)You can see available conda environments with conda info --envs. If you have not created any conda environments yet, the only listing you will see is the root environment in /n/app/miniconda3. Cluster users do not have access to alter this.
mfk8@compute-a-01-01:~$ conda info --envsYou can create your own environment to install packages to. You can change the environment name (specified after -n):
mfk8@compute-a-01-01:~$ conda create -n test_envIf you no longer want an environment, use conda-env remove to delete the environment and any packages installed to it:
mfk8@compute-a-01-01:~$ conda-env remove -n test_env
mfk8@compute-a-01-01:~$ conda info --envs
# test_env will no longer be listedBasic usage
To use the conda environment, it must be activated. Note that your prompt will change:
mfk8@compute-a-01-01:~$ conda activate test_env
(test_env) mfk8@compute-a-01-01:~$To exit an environment you run conda deactivate, and your prompt will return to normal:
(test_env) mfk8@compute-a-01-01:~$ conda deactivate
mfk8@compute-a-01-01:~$As you just exited the environment, any packages installed to that environment will not be able to be used now.
You can create as many conda environments as you need. Environments are independent (changing one environment won't affect another). They can be used for different analyses, or perhaps if you need more than one version of the same tool. You can run conda info --envs to list all of your conda environments.
Usage of conda init
At some point in their interactions with conda, users may be prompted to execute conda init. Executing conda init results in a block of initialization code being added to the bottom of the the executing user’s $HOME/.bashrc profile, causing that conda distribution’s (base) environment to be initialized upon login to O2. This block also enables the use of the conda activate command.
The block looks something like this (depending on distribution):
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/n/app/miniconda3/23.1.0/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/n/app/miniconda3/23.1.0/etc/profile.d/conda.sh" ]; then
. "/n/app/miniconda3/23.1.0/etc/profile.d/conda.sh"
else
export PATH="/n/app/miniconda3/23.1.0/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<Research Computing does not recommend the use of conda init on O2. There are two primary reasons for this:
This fixes the use of the specified version of conda (and all that entails) for the user on login, which may become the cause of package dependency mismatches and errors as a user requires either alternative conda or application versions for future projects and environments.
There have been reports of users experiencing high latency when logging in to O2 when they have
conda initblocks present in their$HOME/.bashrcfiles. This is due to an interaction between the login initialization processes and Research Computing’s login node process watchdog, combined with our existing authentication procedures; this interaction can cause logins can take upwards of minutes to resolve.As of the
conda/miniforge3/24.11.3-0module, theconda activate/deactivatecommands are now available upon loading the module.
Research Computing recommends users either comment out or outright delete the conda init block from their $HOME/.bashrc files entirely.
If a user is leveraging their own personal anaconda/miniconda distribution (i.e., not available via O2’s module system), they can choose to ignore this section at their peril. Users that would like to request assistance in maintaining access to their local distributions without the use of conda init can contact Research Computing at rchelp@hms.harvard.edu.
Installing Packages
To search for available versions of a package that can be installed, use conda search:
(test_env) mfk8@compute-a-01-01:~$ conda search nameofpackage
With your conda environment activate, you can install a package with conda install. Conda will handle dependencies by default. Make sure that you do not install conda packages when on a login node. Only install packages when you have requested dedicated resources beforehand (i.e. you are on a compute node and in a interactive job).
(test_env) mfk8@compute-a-01-01:~$ conda install nameofpackageConda and Python versions
If you want to use a specific version of Python with conda (strongly recommended), you can create a conda environment and install it. For example to create an environment using Python 3.6.5:
mfk8@compute-a-01-01:~$ conda create -n python_3.6.5 python=3.6.5
mfk8@compute-a-01-01:~$ conda activate python_3.6.5
(python_3.6.5) mfk8@compute-a-01-01:~$ which python
~/.conda/envs/python_3.6.5/bin/python
(python_test3) mfk8@compute-a-01-01:~$ python --version
Python 3.6.5A full example
mfk8@login01:~$ srun --pty -p interactive -t 0-2 bash
mfk8@compute-a-01-01:~$ module load conda/miniforge3/24.11.3-0
mfk8@compute-a-01-01:~$ module list
Currently Loaded Modules:
1) miniconda3/23.1.0 (E)
Where:
E: Experimental
mfk8@compute-a-01-01:~$ conda create -n my_env
# truncated
mfk8@compute-a-01-01:~$ conda activate my_env
# install example python package, scipy, which is available through conda:
(my_env) mfk8@compute-a-01-01:~$ conda install scipy
# truncated
# see list of packages available in this conda environment:
(my_env) mfk8@compute-a-01-01:~$ conda list
# truncated
# will report scipy in the list
# test importing scipy in python to verify it is installed correctly
(my_env) mfk8@compute-a-01-01:~$ python -c "import scipy"
(my_env) mfk8@compute-a-01-01:~$
# exit environment
(my_env) mfk8@compute-a-01-01:~$ conda deactivate
mfk8@compute-a-01-01:~$Supported channels
The centralized conda installation, available through the conda modules, includes several channels that we support. Channels are repositories where conda looks for packages. This is done with a centralized .condarc file that contains:
conda-forge
bioconda
The order here matters, as conda will pull packages from channels based upon the channel "priority". For example, the channel listed first in .condarc has the highest priority, and the channel listed last has the lowest priority. This means that if the package you want to install is found in multiple channels in your .condarc, conda will default to installing the version found in the highest priority channel. See here in the conda documentation for more information on channel management.
Conda-forge is a repository of recipes, which are used to build conda packages. Bioconda is a channel geared for bioinformatics packages.
If you wish, you can still maintain your own ~/.condarc file, but we may be unable to assist when using unsupported channels.
In light of recent Terms of Use controversies with the Anaconda company, we caution users against using the defaults channel, and have tried to mitigate against its usage in the module.
Installing difficult packages
We have created a page to document any particularly difficult packages as observed from interacting with the user community, located at https://harvardmed.atlassian.net/wiki/spaces/O2/pages/2557575188 . We will keep this page updated as necessary as we encounter more.