Longwood Cluster Status

Longwood Cluster Status

This page captures all service outages for Longwood, including planned and unplanned events.

Everyone with a Longwood account gets subscribed to the HMS email list longwood-cluster-announce , which is used to communicate information about service outages and other Longwood cluster related information. Subscription to this list is compulsory.

Status:

Operational

Scheduled Maintenance and Current Outages:

Date

Service

Issue

Date

Service

Issue

2025-06-04

Maintenance is complete

Two nodes are still offline:

  • Longwood Transfer node (transfer.dgx.rc.hms.harvard.edu)

  • A Gracehopper compute node

Previous Service Outages:

Date

Service

Issue

Date

Service

Issue

2025-06-02 to 2025-06-06

The Longwood High-Performance Compute Cluster will be offline

 

The Longwood High-Performance Compute Cluster at Massachusetts Green High-Performance Computing Center (MGHPCC) will be offline from Monday, June 2, 2025, at 9:00 AM through Friday, June 6, 2025, at 5:00 PM EDT (UTC-4) for annual facility maintenance.

2025-04-27

Longwood (Gen AI) Cluster

At approximately 1:05AM Eastern Daylight time on Sunday, April 27, it was necessary to shut down all non-UPS power to the MGHPCC computer room, due to failure of a chiller. 
Any jobs running when the outage occurred would have been lost.
The cluster is back up and running.

2025-04-25

Longwood (Gen AI) Cluster

DGX nodes are unavailable due to configuration issues on controllers.
Configuration has been updated and nodes are now accepting new jobs.

2025-04-16 - 2025-04-24

Longwood (Gen AI) Cluster

A hardware issue on the headnode occured, which caused many users to experience 'permission denied' errors intermittently.
Vendor forced a switch to secondary headnode while the primary was rebooted. Authentication issues have been resolved.

May 20-25, 2024

Longwood (GenAI) cluster will be offline from Monday May 20, 2024, at 9:00 AM through Saturday, May 25, 2024 at 9:00 AM EDT

  • maintenance was completed by 1:00 PM on May 24.

Complete data center power down cycle and cluster updates.