Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: tb -> tib, gb -> gib, etc.

...

  • Login Nodes: Servers where the users connect remotely and from where they can submit jobs to the cluster. No memory or cpu intense process should ever be executed on the login nodes. In O2 we are strictly limiting cpu and memory access on login nodes, so intense processes executed on login nodes will most likely be killed or have very poor performance.

  • Computing Nodes: Servers designed specifically to support intense memory and cpu processes as well as special resources (GPU, TB TiB of memory, etc.). Any job correctly submitted to the cluster is eventually dispatched by the scheduler on the first available compute node.

  • Storage Server: A system of servers storing the data used on the Cluster. These are usually accessible on both login and compute nodes 

  • Scheduler: The scheduler main task is to efficiently manage the cluster computing resources and to dispatch jobs on computing nodes accordingly with the different job priorities while maximizing the cluster efficiency. 

...

O2 currently includes 390 computing nodes for a total of 12260 cores and ~106TB ~106TiB of memory

  • 232 nodes, each node hostname is composed by the prefix compute-a-16- or compute-a-17- and the node number, for example compute-a-16-28, compute-a-16-29, ..., compute-a-16-171. Each node has 32 physical compute cores, 256GB 256GiB of memory and is connected to the network with a 10Gb ethernet card and in addition with a 40Gb Infiniband card.   
  • 69 nodes, each node hostname is composed by the prefix compute-e-16- and the node number. Each node has 28 physical compute cores, 256GB 256GiB of memory and is connected to the network with a 10Gb ethernet card.
  • 17 nodes, each node hostname is composed by the prefix compute-f-16- and the node number. Each node has 20 physical compute cores, 188GB 188GiB of memory and is connected to the network with a 10Gb ethernet card.
  • 11 heterogenous high memory nodes,  each node hostname is composed by the prefix compute-h-16- and the node number; 7 nodes have 750GB 750GiB of memory, 1 node 300GB 300GiB and the other node 1TB1TiB
  • 27 GPU compute nodes, each node hostname is composed by the prefix compute-[g,gc]- and the node number, for a total of 133 GPU cards, including Tesla K80, M40, V100, V100s and RTX 6000,8000
  • 3 transfer nodes, each node hostname is composed by the prefix compute-t-16- and the node number. Each node is a VM with 4 cores and 6GB 6GiB of memory, those nodes are intended for data transfer to/from the /n/files filesystem.


Detailed Node Hardware Information

...