Available Software
Introduction
Installed software is available via the Lmod environment module system. We maintain two distinct sets of applications, as Intel and ARM installations are generally incompatible with each other. Please keep this in mind when submitting jobs to one architecture or the other.
There are two sets of modules available, which can been identified via module avail
. The modules that are located under /cm
are applications that are provided by the NVIDIA DGX repositories; HMS Research Computing was not responsible for installation of these applications. However, users may request additional applications from the NVIDIA repositories; newly added applications will be listed under the /cm
path headings (i.e., /cm/local
or /cm/shared
).
Software installed by Research Computing is organized within the /n/lmod
header, with the entrypoint being /n/lmod/architecture
. Depending on the target architecture jobs are being submitted to, users should first load either dgx
(DGX/Intel) or grace
(Grace Hopper/ARM).
Users wishing to leverage a combination of NVIDIA and Research Computing offerings (or even just NVIDIA offerings) should exercise caution, as the NVIDIA-provided modules do not have built-in hierarchy or dependency resolution. This means that it can be possible to load multiple versions of the same application, which may result in an indeterminate environment state under certain circumstances.
(Research Computing installed) software is not actually available from the login nodes (login0X
in your terminal prompt by default). Modules are made available on the login nodes such that users can prepare environment variable modifications in advance of submitting jobs. To actually use the installed applications, users must be on the appropriate compute node first (e.g., requesting an interactive session, via batch job submission, etc.). NVIDIA-provided software is available, but we strongly recommend accessing applications via compute node due to greater abundance of available resources.
One final thing to note - the login nodes present as Intel, so the modules provided by NVIDIA will be DGX/Intel-based applications. This means that to have the most accurate ARM-based module list, users MUST make sure they are on a Grace Hopper node (gh0X
) before running module avail
.
dgx
: Intel-based Modules
This software is only compatible with the DGX compute nodes (dgx0X
) (submitting to the gpu_dgx
partition).
Recall that modules under /cm
headers are NVIDIA-provided modules, while modules under /n/lmod
are provided by Research Computing.
A snapshot of module avail
after having loaded the dgx
module is as follows (current as of 4/24/2024):
--------------------------- /n/lmod/dgx/Core ---------------------------
gcc/13.2.0
-------------------------- /n/lmod/dgx/Linux ---------------------------
R/4.3.3 miniconda3/24.1.2
------------------------ /cm/local/modulefiles -------------------------
apptainer/1.1.9 freeipmi/1.6.10 module-info
boost/1.81.0 gcc/13.1.0 null
cluster-tools/10.0 ipmitool/1.8.19 openldap
cm-bios-tools lua/5.4.6 python3
cmd luajit python39
cmjob mariadb-libs shared (L)
dot module-git slurm/slurm/23.02.7 (L)
------------------------ /usr/share/modulefiles ------------------------
DefaultModules (L)
------------------------ /cm/shared/modulefiles ------------------------
blacs/openmpi/gcc/64/1.1patch03 hdf5_18/1.8.21
blas/gcc/64/3.11.0 hwloc/1.11.13
bonnie++/2.00a hwloc2/2.8.0
cm-pmix3/3.1.7 iozone/3.494
cm-pmix4/4.1.3 jupyter-eg-kernel-wlm-py39/3.0.2
cuda11.8/blas/11.8.0 jupyter/15.1.2
cuda11.8/fft/11.8.0 lapack/gcc/64/3.11.0
cuda11.8/toolkit/11.8.0 mpich/ge/gcc/64/4.1.1
cuda12.1/blas/12.1.1 mvapich2/gcc/64/2.3.7
cuda12.1/fft/12.1.1 netcdf/gcc/64/gcc/64/4.9.2
cuda12.1/toolkit/12.1.1 netperf/2.7.0
cuda12.3/blas/12.3.1 nvhpc-byo-compiler/23.11
cuda12.3/fft/12.3.1 nvhpc-hpcx-cuda11/23.11
cuda12.3/toolkit/12.3.1 nvhpc-hpcx-cuda12/23.11
cudnn8.6-cuda11.8/8.6.0.163 nvhpc-hpcx/23.11
cudnn8.9-cuda12.1/8.9.6.50 nvhpc-nompi/23.11
default-environment nvhpc-openmpi3/23.11
fftw3/openmpi/gcc/64/3.3.10 nvhpc/23.11
gcc12/12.2.0 openblas/dynamic/0.3.18
gdb/13.1 openmpi/gcc/64/4.1.5
git/2.40.0 openmpi4/gcc/4.1.5
globalarrays/openmpi/gcc/64/5.8 ucx/1.10.1
hdf5/1.14.0
------------------------- /n/lmod/architecture -------------------------
dgx (L) grace
Where:
L: Module is loaded
Module defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.
See https://lmod.readthedocs.io/en/latest/060_locating.html for details.
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules
matching any of the "keys".
grace
: ARM-based Modules
This software is only compatible with the Grace Hopper compute nodes (gh0X
) (submitting to the gpu_grace
partition).
Recall that modules under /cm
headers are NVIDIA-provided modules, while modules under /n/lmod
are provided by Research Computing.
A snapshot of module avail
after having loaded the grace
module is as follows (current as of 4/24/2024):
-------------------------- /n/lmod/grace/Core --------------------------
gcc/13.2.0
------------------------- /n/lmod/grace/Linux --------------------------
miniconda3/24.1.2
------------------------- /n/lmod/architecture -------------------------
dgx grace (L)
------------------------ /cm/local/modulefiles -------------------------
apptainer/1.1.9 dot null
boost/1.81.0 freeipmi/1.6.10 openldap
cluster-tools/10.0 gcc/13.1.0 python3
cm-bios-tools ipmitool/1.8.19 python311
cmake/3.26.3 lua/5.4.6 python39
cmd mariadb-libs shared
cmjob module-git slurm/slurm/23.02.7
cuda-dcgm/3.1.8.1 module-info
------------------------ /cm/shared/modulefiles ------------------------
cm-pmix3/3.1.7 hwloc2/2.8.0
cm-pmix4/4.1.3 jupyter-eg-kernel-wlm-py39/3.0.2
cuda11.8/blas/11.8.0 jupyter/15.1.2
cuda11.8/fft/11.8.0 lapack/gcc/64/3.11.0
cuda11.8/toolkit/11.8.0 mvapich2/gcc/64/2.3.7
cuda12.1/blas/12.1.1 nvhpc-byo-compiler/23.11
cuda12.1/fft/12.1.1 nvhpc-hpcx-cuda11/23.11
cuda12.1/toolkit/12.1.1 nvhpc-hpcx-cuda12/23.11
cuda12.3/blas/12.3.1 nvhpc-hpcx/23.11
cuda12.3/fft/12.3.1 nvhpc-nompi/23.11
cuda12.3/toolkit/12.3.1 nvhpc-openmpi3/23.11
gcc12/12.2.0 nvhpc/23.11
hdf5/1.14.0 ucx/1.10.1
hwloc/1.11.13
------------------------ /usr/share/modulefiles ------------------------
DefaultModules (L)
Where:
L: Module is loaded
Module defaults are chosen based on Find First Rules due to Name/Version/Version modules found in the module tree.
See https://lmod.readthedocs.io/en/latest/060_locating.html for details.
Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules
matching any of the "keys".