/
How to choose a partition in O2

How to choose a partition in O2

In O2 we minimized the number of available partitions (or queues) to simplify the job submission process.

You can use the flow chart below to determine which partition is best for the jobs you plan to run.







Note 1: all partitions can currently be used to request interactive sessions; the interactive partition has a dedicated set of nodes and higher priority.

Note 2: all partitions can run single or multi-core jobs. For more information about parallel jobs in O2  read our dedicated  wiki page





Details about the available partitions are reported in the table below

Partition

Job Type

Priority

Max cores per job

Max runtime limit

Min runtime limit

Notes

Partition

Job Type

Priority

Max cores per job

Max runtime limit

Min runtime limit

Notes

interactive

interactive

14

20

12 hours

n/a

2 job limit, 20 core/job limit, 250GiB / job memory limit, default memory 4GB

short





batch

&

interactive

12

20

12 hours

n/a

20 core/job limit, 250GiB / job memory limit

medium

6

20

5 days

12 hours

20 core/job limit, 250GiB / job memory limit

long

4

20

30 days

5 days

20 core/job limit, 250GiB / job memory limit

mpi

12

640

5 days

n/a

invite-only. Email rchelp

priority

14

20

30 days

n/a

limit 2 jobs running at once, 20 core per job limit

250GiB / job memory limit

transfer

n/a

4

5 days

n/a

limit of a 5 concurrently cores per user

for transfers between O2 and /n/files

See File Transfer for more information.

Invite-only. Email rchelp.

gpu, gpu_quad

n/a

20

5 days

n/a

additional limits apply to the GPU partition, see the Using O2 GPU resources page for more details

gpu_mpi_quad

n/a

240

5 days

n/a

invite-only. Email rchelp

gpu_requeue

n/a

20

5 days

n/a

see O2 GPU Re-Queue Partition for additional details

highmem

n/a

16

5 days

n/a

invite-only. Email rchelp

Although Slurm interprets units as KiB, MiB, and GiB in the background, O2 users must use K, M, or G as valid units for the memory (--mem) argument. KB, MB, or GB are other alternative units.

 

Flowchart Description

Decision Pathways:

  1. Does your job need access to /files?

    • Yes: Use the dedicated transfer cluster (transfer.rc.hms.harvard.edu) for copying/moving data.

    • No: Proceed to the next question.

  2. Does your job need more than 200GiB of memory?

    • Yes: Use the highmem partition.

    • No: Proceed to the next question.

  3. Does your job need access to GPU resources?

    • Yes: Choose from:

      • gpu

      • gpu_quad

      • gpu_requeue

      • gpu_mpi_quad

    • No: Proceed to the next question.

  4. Are your jobs MPI-jobs or capable of running on distributed memory systems?

    • Yes: Use the mpi partition.

    • No: Proceed to the next question.

  5. Are you trying to run 1 or 2 interactive jobs?

    • Yes: Use the interactive partition.

    • No: Proceed to the next question.

  6. Do you need to run only 1 or 2 high-priority jobs?

    • Yes: Use the priority partition.

    • No: Proceed to the next question.

  7. Are your jobs going to run for less than 12 hours?

    • Yes: Use the short partition.

    • No: Proceed to the next question.

  8. Are your jobs going to run for less than 5 days?

    • Yes: Use the medium partition.

    • No: Use the long partition.

 


Partitions and Their Characteristics:

  1. “interactive”

    • Job Type: Interactive

    • Priority: 14

    • Max Cores per Job: 20

    • Max Runtime: 12 hours

    • Notes: 2 job limit, 20 cores per job, 250GiB memory limit, default memory 4GB

  2. “short”

    • Priority: 12

    • Max Cores per Job: 20

    • Max Runtime: 12 hours

    • Notes: 20 core/job limit, 250GiB memory limit

  3. “medium”

    • Priority: 6

    • Max Cores per Job: 20

    • Max Runtime: 5 days

    • Min Runtime: 12 hours

    • Notes: 20 core/job limit, 250GiB memory limit

  4. “long”

    • Priority: 4

    • Max Cores per Job: 20

    • Max Runtime: 30 days

    • Min Runtime: 5 days

    • Notes: 20 core/job limit, 250GiB memory limit

  5. “mpi”

    • Job Type: Batch

    • Priority: 12

    • Max Cores per Job: 640

    • Max Runtime: 5 days

    • Notes: Invite-only, email rchelp

  6. “priority”

    • Job Type: Batch & Interactive

    • Priority: 14

    • Max Cores per Job: 20

    • Max Runtime: 30 days

    • Notes: Limit 2 jobs running at once, 20 core/job limit, 250GiB memory limit

  7. “transfer”

    • Priority: n/a

    • Max Cores per Job: 4

    • Max Runtime: 5 days

    • Notes: Limit of 5 concurrent cores per user, for file transfers between O2 and /n/files

  8. “gpu” and “gpu_quad”

    • Priority: n/a

    • Max Cores per Job: 20

    • Max Runtime: 5 days

    • Notes: Additional limits apply, see the Using O2 GPU resources page

  9. “gpu_mpi_quad”

    • Priority: n/a

    • Max Cores per Job: 240

    • Max Runtime: 5 days

    • Notes: Invite-only, email rchelp

  10. “gpu_requeue”

  • Priority: n/a

  • Max Cores per Job: 20

  • Max Runtime: 5 days

  • Notes: See O2 GPU Re-Queue Partition for details

  1. “highmem”

  • Priority: n/a

  • Max Cores per Job: 16

  • Max Runtime: 5 days

  • Notes: Invite-only, email rchelp


Important

  • The interactive and short partitions allow jobs up to 12 hours, while medium, long, and mpi allow jobs up to 5–30 days.

  • The mpi partition supports 640 cores per job, making it suitable for large-scale parallel jobs.

  • The priority partition allows long jobs but has a 2-job concurrent limit.

  • Transfer partition is dedicated to data transfer operations, with a limit of 5 concurrent cores per user.

  • GPU partitions have additional constraints and require checking separate documentation.

  • Some partitions, like mpi, gpu_mpi_quad, and highmem, are invite-only.

 

There is a limit on the total CPU-hours that can be reserved by a single lab at any given time. This limit was introduced to prevent a single lab from locking down a large portion of the cluster for extended periods of time. This limit will become active only if multiple users in the same lab are allocating a large portion of the O2 cluster resources. This can for example happen if few users have thousands of multi-day or hundreds of multi-week running jobs. When this limit becomes active the remaining pending jobs will display the message AssocGrpCPURunMinute.

Note: The "gpu" partition has additional limits that might trigger the above message, for more details about the "gpu" partition please refer to the "Using O2 GPU resources" wiki page











Related content

Using O2 GPU resources
Using O2 GPU resources
More like this
How To Submit Parallel Jobs in O2
How To Submit Parallel Jobs in O2
More like this
Using Slurm Basic
Using Slurm Basic
Read with this
Graphical User Interface App
Graphical User Interface App
More like this
O2 Command CheatSheet
O2 Command CheatSheet
Read with this
Optimizing O2 Jobs
Optimizing O2 Jobs
More like this