Tensorflow on O2
Due to recent developments in deep learning and related topics, tensorflow and its components have been widely requested by the user community. Due to the nature of the package however, we have decided it is best for the user to manage their own installation to ensure that they can quickly modify or upgrade tensorflow to their own needs without waiting for Research Computing to handle version changes. This page therefore serves to provide basic instructions on how to install tensorflow without elevated privileges, into a local directory that is owned by the user.
Basic Installation
Early on, tensorflow was quite difficult to install into a shared computing environment such as O2. However, the developers have since made it far friendlier to set up for the average user that is not computing locally. All that needs to be done is to invoke pip
to complete the installation.
Tensorflow is compatible with both Python 3 as of the writing of this document. In order to install it, it is strongly recommended to set up a virtual environment.
First, request an interactive session:
$ srun --pty -t 2:0:0 --mem=2G -p interactive bash
If you are planning to use Tensorflow immediately after installing it inside of the interactive session, it may be wise to increase the memory requirement when submitting the request.
Once you're on a compute node, load the prerequisite module:
$ module load gcc/9.2.0
This will expose the available python 3 modules to you. It is strongly recommended to use a version that is at least 3.8; we will use 3.9.14 in this example:
$ module load python/3.9.14
If you are planning to use GPU resources, you also need to load the (latest) cuda module. For example:
Once you have confirmed that gcc
, python, and possibly CUDA, are loaded, create a virtual environment (instructions replicated here from Personal Python Packages):
where /path/to/nameofenv
is your chosen name and location of the environment you'd like to install to. (/path/to
should already exist.) Once the environment is created, you'll want to turn it on.
After this, your prompt should look something like this:
From here, you should be ready to install Tensorflow.
Recent builds of tensorflow package both CPU and GPU components.
If you plan to use the gpu
(or related GPU-enabled) partition, a couple more steps are required for you to set up your code to leverage GPUs.
Tensorflow and the gpu
partition
If you installed tensorflow-gpu
, your first order of business should be to familiarize yourself with Using O2 GPU resources. This page informs on how to submit jobs to the gpu
partition and request GPUs for your jobs. Once you are ready to submit your job, say, model.py
, you need to make sure a couple of additional resources are loaded: namely, CUDA and CuDNN. These are libraries that allow Tensorflow to interface with the GPU and leverage its capabilities. Currently, O2's GPU nodes support CUDA 9.0. When Tensorflow supports CUDA 10.0, we will be upgrading the drivers on the GPU nodes to work with CUDA 10.0.
When you submit your job to the gpu partition, make sure GCC and the correct python module are loaded, and your virtual environment is active. If not, run the corresponding commands in that exact order:
Then, you also need to load the same CUDA module as the one you used to build Tensorflow, e.g.:
(CuDNN is included with each of our CUDA modules.)
If you are in an interactive session (on a GPU node), you can now start running your code. If you plan to submit a batch job, place all of the above commands (with the choice of python module) into your submission script. From here, you should be all set! Make sure you fully understand the Using O2 GPU resources page before submitting to the partition, as there are a limited amount of resources available, and pend time is highly variable depending on current demand.
Installation with (ana
)conda
If you are familiar with the conda
package manager (or the Anaconda environment system), it is also possible to install Tensorflow this way (and depending on your project requirements, this may be the best way to handle your Tensorflow installation). If you have your own ana/conda
installation, skip to after the module load command.
First, create an interactive session as before:
To access the O2 conda module, simply type
Then, it is highly recommended that you create a new environment for Tensorflow (or your project that happens to use Tensorflow):
Now, we activate the environment:
You should see your terminal get modified with the environment name in parentheses just as above using virtualenv
. If this command fails, you may need to specify the full path to the environment (especially if you used --prefix
instead of --name
to create it). From here, you can either use pip
as above, or install Tensorflow with conda
:
Once this completes, you should be ready to use Tensorflow! Keep in mind that all of the above stipulations regarding leverage of hardware resources using the virtualenv
process still apply to this method of installation and usage.
Basic Troubleshooting
Depending on when you installed your copy of Tensorflow, you may see something like this when you run your code:
Loaded runtime CuDNN library: 7.0.4 but source was compiled with: 7.2.1. CuDNN ... Segmentation Fault
This is because the version of Tensorflow you installed used a newer version of CuDNN than the one that was found on the cluster. To fix this, you'll need to download a newer version of CuDNN and place it somewhere in a directory you own. You can submit a ticket with us if you'd like assistance with this issue.
Newer CUDA modules will have the associated newest CuDNN libraries included.