Tuesday, 2 December 2014

Upgrades to Iceberg cluster

The latest upgrades to the iceberg cluster have seen the introduction of more 'Intel-based' nodes, the addition of graphical processing units (GPUs) for cutting-edge acceleration of computations, increased capacity for the fastdata parallel file system and an expansion of the fast, ‘Infiniband’, interconnect between all of the compute nodes.

The upgrade was a big piece of work. A number of research groups have purchased equipment, which has been added to the cluster. One of the ramifications of this was that we had used all of the available capacity of our fast ‘infiniband’ network. We therefore had to reconfigure the network so that further hardware could be added to the cluster.

The campus HPC framework agreement with Dell has been running for four years. It allows any group or department on campus to purchase equipment for high performance computing from Dell. The agreement comes to an end in June 2015 and the final piece of work to upgrade the Iceberg facility needed to be completed by this date. An upgrade is normally completed every two years.

Before the updates there were four racks the upgrade has resulted in an additional four cabinets. The new updates include:
  • The fastdata file store has been increased in capacity from 80 to 260TB.
  • The Infiniband network has been reconfigured and expanded to allow for the possibility of adding further servers to the cluster as requested by research groups.
  • Addition of 96 additional compute nodes using the Intel Xeon ‘Ivy bridge’ architecture.
  • Addition of 8 NVIDIA GPUs based on the Kepler architecture each GPU has 12GB of graphical memory.
The expansion of fastdata provides an increase in performance and a very large temporary storage area. Users of the HPC generate a lot of temporary data, which we can’t keep on the data storage system. A special area called fast data allows this temporary data to be stored. Files are deleted after 90 days, but over-capacity was reached at the start of the year so this had to be reduced to 60 days. The upgrade has expanded fast data storage to 260Tb, which should be sufficient for the next two years.

A benefit of the additional compute modes is that they take account of research groups with a requirement to run larger problems using more memory. With the new Intel Ivy bridge nodes we have increased available memory from 24GB per node to 64GB per node a number of nodes have 256 GB of memory giving 16GB/core.

The research data storage on Iceberg is also in the process of being upgraded. The old hardware was in need of refreshing as it was past its sell by date, although it has been running well for many years.

The older AMD based compute nodes date back to 2008 and will soon be taken out of service soon. Although these nodes are still used for some high throughput tasks they are not as power efficient, they are using space in the data centre and are now unsupported.

All the hard work carried out by the team, who have worked on the upgrades will ensure researchers at the University can continue to undertake research projects with intensive computing requirements.