![]() |
||||||||
|
||||||||
Linux Cluster |
||||||||
|
|
IntroductionECMWF installed a Linux Cluster in 2004 with the intention of establishing whether this type of system could be employed as a general purpose server, to replace the IBM Nighthawk based servers. By providing facilities to allow balanced interactive login, together with a suitable shared filesystem, the cluster of small Linux servers can be made to resemble a large single system. Configuration of the Linux ClusterLinux Networx Cluster The Linux Cluster comprises 32 nodes, each with 2 AMD Opteron processors and 4 GB memory. The Cluster, which was bought from Linux Networx (www.linuxnetworx.com) in Spring 2004, currently runs SUSE 9.0, but is being upgraded to SLES (SUSE Linux Enterprise Server) 9.2 at the present time (March 2006). It includes an Infiniband High Speed / Low Latency network for message passing, which can be used by programs using MPI. It also has a seperate "master node" which is used to configure, boot and manage the cluster. Since cluster nodes cannot be booted without a master node, the system also include a backup master node. The Cluster has two separate disk storage subsystems:
The FAStT600 is used for serving scratch filesystems which need to be quota-controlled. The filesystems are made available via NFS from dedicated servers. The Panasas Storage provides a shared filesystem with significantly better performance than using NFS. It is used for filesystems which do not require quotas. The Cluster provides both interactive and batch access, using Sun Grid Engine (SGE), an open source batch subsystem - see http://gridengine.sunsource.net for further details. Within SGE, 4 of the nodes are configured as interactive nodes, 22 as batch (or compute) nodes and the remaining 6 are I/O nodes. When initiating interactive sessions or batch jobs, SGE will choose an appropriate node, taking into account the current loadlevel of each node. This allows the workload to be distributed across the cluster. |
|||||||
|
|
|||||||