NOTICE: FULL O2 Cluster Outage, January 3 - January 10th
O2 will be completely offline for a planned HMS IT data center relocation from Friday, Jan 3, 6:00 PM, through Friday, Jan 10
- on Jan 3 (5:30-6:00 PM): O2 login access will be turned off.
- on Jan 3 (6:00 PM): O2 systems will start being powered off.
This project will relocate existing services, consolidate servers, reduce power consumption, and decommission outdated hardware to improve efficiency, enhance resiliency, and lower costs.
Specifically:
- The O2 Cluster will be completely offline, including O2 Portal.
- All data on O2 will be inaccessible.
- Any jobs still pending when the outage begins will need to be resubmitted after O2 is back online.
- Websites on O2 will be completely offline, including all web content.
More details at: https://harvardmed.atlassian.net/l/cp/1BVpyGqm & https://it.hms.harvard.edu/news/upcoming-data-center-relocation
MATLAB Parallel jobs using the custom O2 cluster profile
It is possible to configure MATLABÂ so that it interacts with the SLURM scheduler. This allows MATLABÂ to directly submit parallel jobs to the SLURM scheduler and enables it to leverage cpu and memory resources across different nodes (distributed memory).
To do so you first need to configure the O2 cluster profile in the MATLAB version being used which is done running the command configCluster
Â
NOTE:Â
It is strongly recommended to use MATLAB version 2019a or later when submitting multi-node jobs (mpi partition) with MATLAB O2 cluster profile. Earlier versions of MATLAB are using a mechanism to start the MATLAB workers that is not fully compatible with our existing SLURM epilog and could cause jobs to be killed.
Setting up the O2 MATLABÂ Cluster ProfileÂ
Â
>> configCluster
Must set QueueName and WallTime before submitting jobs to O2. E.g.
>> c = parcluster;
>> c.AdditionalProperties.QueueName = 'queue-name';
>> % 5 hours
>> c.AdditionalProperties.WallTime = '05:00:00';
>> c.saveProfile
Complete. Default cluster profile set to "o2 R2023a".
now your default cluster profile is set to o2 local R2023a and you should be able to verify it by running the command parcluster
>> parcluster
ans =
Generic Cluster
Properties:
Profile: o2 R2023a
Modified: false
Host: compute-a-16-161
NumWorkers: 100000
NumThreads: 1
JobStorageLocation: /home/abc/.matlab/3p_cluster_jobs/o2/R2023a/shared
ClusterMatlabRoot: /n/app/matlab/2023a-v2
OperatingSystem: unix
RequiresOnlineLicensing: false
PreferredPoolNumWorkers: 32
PluginScriptsLocation: /n/app/matlab/support-packages/matlab-parallel-server/scripts/IntegrationScripts/o2
AdditionalProperties: List properties
Associated Jobs:
Number Pending: 0
Number Queued: 0
Number Running: 0
Number Finished: 0
>>
Note 1:Â Â The configCluster command needs to be executed only on time
Note 2:Â Â After running the configCluster command, the default cluster profile is set to the O2 cluster; if you want to go back and use the "local" cluster profile, you can change the default profile using the command parallel.defaultClusterProfile('local')
Note 3:Â Running the configCluster command sets the cluster profile only for the currently used MATLABÂ version. If later on you use a different version of MATLABÂ you will need to run configCluster again.
Setting the submission parameter for the O2 MATLABÂ cluster profileÂ
Define job submission flags for Version ≥ R2017aÂ
In order to use the O2 MATLAB cluster profile it is required to define at least two submission parameters: the partition to be used and the desired wall-time. This can be done assigning the properties directly to a parcluster object c as shown in the example below:
Note that set parameters will not be retained by default and must be re-entered if the c object is deleted. To save permanently the submission parameter you must execute the command c.saveProfile
Important: Use --mem-per-cpu (or the flag c.AdditionalProperties.MemPerCPU
) instead of --mem to request a custom amount of memory when using the mpi partition. The slurm flag --mem is used to request a given amount of memory per node, so unless you are enforcing a balanced distribution of tasks (MATLAB workers) per node, you might end up with too much or not enough memory on a given node, depending on how the tasks are allocated.
Using the O2 MATLABÂ Cluster ProfileÂ
Parpool() command
One way to use the O2 MATLABÂ cluster profile is to request a parallel pool of N_c workers with the command parpool(N_c). MATLABÂ will submit a slurm job request for the requested number of cores N_c and will start the parallel pool once the requested cores are allocated. For example requesting a parallel pool of 3 cores would look like:
Then any parallel part of a script will be executed on the parallel workers allocated with the parpool command, for example:
Note 1: If you run a non interactive parallel job using the parpool() command with the O2 cluster profile you will actually be dispatching two jobs. First a serial job (1 core) to start your MATLAB script (i.e. matlab -nodesktop -r "my_function") and then a second parallel job that will be submitted directly within MATLAB once the execution of  my_function.f reaches the parpool() command. To avoid this double-job condition you can use the command batch described later in this page
Note 2:Â The command parpool cannot be executed if MATLAB has been started on login nodes. Make always sure to start interactive MATLAB sessions from within interactive jobs (on actual compute nodes instead of login nodes)
Batch Command
Similarly it is also possible to dispatch a batch of parallel jobs directly from within MATLABÂ using the command batch. In the example below we will submit to the cluster the simple parallel sleep function:
 starting 3 parallel jobs with 2,5 and 10 cores. In each job running the above function with the input parameter of 20 (i.e. sleep for 20 seconds) and measure the real elapsed time:
Note 1: When using the function batch to run a parallel function you will not need to add explicitly the command parpool inside the parallel function being executed (see the sleep.m example above)
Note 2: the function batch can also be used to submit non-parallel jobs, for example: