Longwood is the newest High-Performance Compute Cluster at HMS. It is located at the Massachusetts Green High Performance Computing Center.
Specifications
Longwood contains a total of 64 H100 GPUs, plus 2 Grace Hopper nodes:
...
This provides a heterogeneous environment with both Intel (DGX) and ARM (Grace Hopper) architectures. Module management is supported through LMOD, allowing easy loading of software suites like the NVIDIA NeMo deep learning toolkit and more.
How to connect
Note |
---|
The cluster is currently only accessible via secure shell (ssh) command line from the HMS network:
Two-factor authentication (DUO) is not required for logins because all connections must originate from an HMS network. Currently, the login server hostname is: login.dgx.rc.hms.harvard.edu |
...
Code Block |
---|
ssh ab123@login.dgx.rc.hms.harvard.edu |
Filesystems
/home
Max: 100GiB
/n/scratch
Created automatically
Max: 25TiB or 2.5 million files
Path:
/n/scratch/users/<first_hms_id_char>/<hms_id>
/n/groups
HMS-RC will be creating group folders in the future
Snapshots
.snapshot
is a feature available on Longwood. This enables recovery of data accidentally deleted by users, daily:14 days
and weekly: 60 days
.
...
Code Block |
---|
cp ~/project1/.snapshot/<select-a-snapshot-directory>/foo.txt ~/project1 |
Scheduler
MGC uses Slurm
The
slurm/23.02.7
module is loaded by default and required to submit jobs
Software and Tools
Several popular tools are available as modules. Use the
module -t spider
command for a list of all modules.Modules are available in two stacks tailored for each architecture:
Intel:
module load dgx
ARM:
module load grace
Modules automatically loaded:
DefaulModules
andslurm
NVIDIA NeMo™ and BioNeMo™ are available in Longwood
Users can also install additional custom tools locally
Singularity Containers are also supported
Containers are located at /n/app/containers/
Partitions
gpu_dgx - the standard partition
gpu_grace - this targets the special Grace Hopper nodes. You’ll need to be using software compiled for ARM
TimeLimit is up to 5 days for both partitions
Info |
---|
At this time it is possible to submit a job to either |
Questions?
Contact Research Computing for support with software and tools - rchelp@hms.harvard.edu
...