HomeAbout UsNews and EventsCase HistoriesInfo RequestContact
Cambridge Online logoPooled Capabilities with Cambridge Online
Business SystemsMicrosoft SolutionsServicesComputer SystemsStorageNetworkingNetwork CablingTelecoms
BACK TO COMPUTER SYSTEMS
....................................................
HP BladeSystem
....................................................
HP Integrity Servers
....................................................
HP Proliant Servers
.................................................... Biotech Solutions
....................................................
....................................................
....................................................
High performance
computing lab

....................................................
More Info
....................................................

Search our site

Tel: +44(0)1223 422600
Email: web@cosl.co.uk

© Copyright Cambridge Online
Systems Limited


spacer

CLUSTERS

cluster guidelinesThe concept of computer clusters has been established in many guises over many years. The degree of cluster sophistication varies greatly from established proprietary OpenVMS clusters, noted in part for its maturity and high degree of fault tolerance, to “do-it-yourself” high availability clusters based on commodity hardware components. The concept of clustering, in its simplest form, joins two or more disparate systems in such a fashion to pool available system resources (CPU, memory, I/O etc), provide a scalable architecture for evolving capacity management, or to build in redundancy for achieving a higher level of availability to reduce unplanned downtime. Clusters are commonly categorised into the three cluster types. However, a clear cluster distinction can be somewhat indistinct:

High Availability (HA) - HA typically provides a fail-safe environment through redundancies in hardware, software and middleware.

High Performance Computing (HPC) - HPC typically embraces large scale parallel applications to aggregate computation processing power, memory, or I/O subsystem.

Logical Compute Farms (LCF) - LCFs typically consist of many identical compute nodes whose numbers can vary with demand over time and jobs allocate through load balancing.

High Availability

High availability diagramGreat competitive demands are being placed on corporate IT resources as most research, product development and mission critical business applications rely heavily on the availability of computational resource, project data and business databases. Failure of IT systems can quickly cascade into an operational failure across an entire business. Moreover, server and applications are expected to be available 24 hours a day, seven days a week with no room for downtime.

All high availability solutions rely on some amount of built in redundancy. At the simplest level this redundancy might involve replication of business critical data. At the other extreme high availability involves complete duplication of the solution stack to a physically separate location. Many other high availability solutions lie between these two extremities and involve redundancies in hardware, software, storage and network components. High Availability clusters usually involve two or more systems connected together via a common interconnect (or heartbeat), share a common storage subsystem and have equal access to available resources. In an event of component failure, high availability clusters necessitate failover of the defective component or a subset of the solution stack onto one or more alternative systems.

High Performance Computing
High performance computingFor the last three decades, supercomputer design has focused on expensive specially designed vector computing and massively parallel symmetric multiprocessor computing platforms. Recently there has been a shift toward parallel cluster computing that uses commodity “off-the-shelf” components connected together by a high speed internal network. HPC clusters are typically deployed for parallel computing to aggregate more processing power or effective memory for a solution of problems within the scientific or research and development arenas. The trend to deploy HPC clusters is clearly demonstrated in the number of clusters listed in the “TOP500 Supercomputer Sites” list, which lists the 500 most powerful supercomputers in the world, and makes clustered systems the most common high performance computer architecture.

HPC clusters are typically made up of a large number of compute nodes connected through high cluster interconnect. Clusters numbering into several hundreds compute nodes are not uncommon. There are two dominant architectures in parallel computing: shared-memory systems and distributed-memory systems:

  • Shared Memory Systems (SMS)- SMS provide symmetric multiprocessor with a common shared memory address space. Parallel computing takes place through the use of shared data structures or application threads.
  • Distributed Memory Systems (DMS) - DMS comprise of disparate compute nodes that do not share memory directly other than through message passing semantics in software.

HPC clusters are typically distributed memory systems that may use SMP systems as a building block. Very large application data may be required to be distributed across the HPC cluster and every compute node must interact with the others to move data between processors. Large blocks of contiguous memory requires a high speed interconnect whereas small message packets require low latency interconnect to accelerate parallel program execution.

Logical Compute Farm
Compute farmLogical Compute Farms (LCF) provide a single interface to a loosely coupled set of commodity compute nodes that can dynamically increase or decrease in response to application demand. Common examples of Logical Compute Farms include Web server farms and Internet messaging services. Jobs are typically allocated onto the LCFs through batch queues or server load balancing network switches. This kind of cluster also provides significant and transparent redundancy through the horizontal scalability of compute nodes and has many attributes of high availability clusters. In other ways, LCFs aggregated computational power has many attributes of high performance compute cluster.

spacer