Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

Table of Contents


Wait! If you're breaking your job up into 100 pieces and running each one as a totally separate job (aka "embarrassingly parallel") you don't need the information on this page. Just run 100 sbatch commands. If you would like advice on writing a script to do those 100 sbatch, or information on job arrays - one sbatch command that submits 100 separate jobs - contact Research Computing.

O2 supports:

  • shared memory parallelization on a single node, typically multithread, such as OpenMP

  • distributed memory parallelization across multiple nodes, typically multitask such as MPI 

  • a combination of the two.

Very Important: Submitting a parallel job will not make that job run parallel unless the application you are running explicitly supports parallel execution. If you try to run a serial application as parallel you will always waste resources and might end up with a corrupted output.

...

Many standard applications now have flags to run multithreaded. Threaded applications can use multiple cores on a node, but each multithreaded process/job requires access to the same physical memory on the node and it is therefore limited to run on a single node (i.e. shared memory). The most common type of multithread application uses the OpenMP application programming interface http://www.openmp.org

To submit a job that uses multiple cores, you must specify the number of job cores you want to use with the sbatch flag -c Ncores. The maximum number of cores that can be requested with this approach is 20 

The following example submits a job requesting 4 cores  (all on the same node), a total of 10 GB  of memory and a WallTime of 24 hours using the sbatch --wrap="" flag to pass the desired executable. The same job submission can be done using a script to pass slurm's flags and commands to be executed, see the main docs for more information.


actionscript3
Code Block
actionscript3
sbatch -p medium -t 24:00:00 -c 4 --mem=10000 --wrap="your_multithreaded_application"

...

If your workflow does not contain any type of multithread or multitask parallelization but it starts multiple independent and cpu-consuming processes within the same job you should still request an appropriate number of cores using the above -c Ncores flag, (typically 1 core for every process). For example:

actionscript3
Code Block
actionscript3
sbatch -p short -t 4:00:00 -c 2 --wrap="application_1 & application_2; wait"

...

To submit a MPI job on the O2 cluster you first need to load the gcc/6.2.0 and openmpi/2.0.1 or /3.1.0 modules 

Code Block
actionscript3actionscript3
module load gcc/6.2.0

module load openmpi/2.0.1
or
module load openmpi/3.1.0

Similarly to the previous case to submit a multitask job you must specify the number of required tasks (i.e. cores) you want by using the flags sbatch -n Ntasks -p mpi ... . The maximum number of cores that can be requested with this approach is 640.

The following example is used to submit an mpi job requesting 64 cores and 3 days of WallTime.

Code Block
actionscript3actionscript3
$ sbatch -n 64 -p mpi -t 3-00:00:00 --wrap="ulimit -l unlimited; mpirun -np 64 your_mpi_application"

In a similar way it is possible to submit the same job passing the flags and the command with a script as in the example below:

actionscript3
Code Block
actionscript3
$ sbatch myjob.sh

where myjob.sh contains:

Code Block
actionscript3actionscript3
#!/bin/bash
#SBATCH -n 64
#SBATCH -p mpi
#SBATCH -t 3-00:00:00
#SBATCH --mem-per-cpu=4000  ### Request ~4GB of memory for each core, total of ~256GB (64 x ~4GB) 

ulimit -l unlimited
mpirun -np 64 your_mpi_application

...

It is also possible to request a specific number of cores per node using the flag --ntasks-per-node. For example the following sbatch submission is used to request 40 cpu cores distributed over 10 nodes (4 cpus per node).

actionscript3
Code Block
actionscript3
$ sbatch -n 40 --ntasks-per-node=4 --mem-per-cpu=4000 -p mpi -t 3-00:00:00 --wrap="ulimit -l unlimited; mpirun -np 40 your_mpi_application"

It is also possible to run mpi jobs on any of the other non-mpi partitions. In that case the maximum number of cpus that can be requested is 20.


Important: The flag --mem=XYZ is used to request a given amount of XYZ memory per each node used independently by the number of cores that are dispatched on that nodes. Using --mem=XYZ when submitting jobs to the mpi partition can create an uneven memory allocation and consequently a long pending time and/or job failure due to insufficient memory. It is highly recommended to request the desired amount of memory as a "per core" using the flag --mem-per-cpu= when submitting distributed memory jobs as shown in the above examples.  

It is also possible to combine the above flags with the flag --mincpus=<n> which can be used to specify the minimum number of CPU cores that needs to be allocated in each node, so for example the command:

Code Block
actionscript3actionscript3
$ sbatch -n 40 --mincpus=4 --mem-per-cpu=4000 -p mpi -t 3-00:00:00 --wrap="ulimit -l unlimited; mpirun -np 40 your_mpi_application"

would submit a job requesting 40 tasks (CPU cores) with a minimum of 4 CPUs per node.


Note: It is now recommended to add the command ulimit -l unlimited before the actual MPI command in order to fully enable IB connectivity.

...

A limited number of applications support a hybrid parallelization model where the MPI approach is used to distribute a number of tasks Ntasks across different compute nodes and each of these task is then capable of running as a multithread process using Nthreads cores. In this scenario the total number of cores used by a job is the product of Ntasks X Nthreads and cores must be allocated accordingly. The following example shows how to submit a MPI + OpenMP combined job that initiate 10 mpi tasks and where each task runs as a multithreaded application using 4 cores for a total of 40 cores. 

actionscript3
Code Block
actionscript3
$ sbatch -n 10 -c 4 -p mpi -t 3-00:00:00 --wrap="ulimit -l unlimited; mpirun -np 10 your_mpi+openmpi_application"

...