Clustering vs. Load BalancingBefore you can talk about differences between clustering and load balancing, and there are more than a few, you've got to get the definitions straight.
|
 |
Visited: 926 |
| Not rated |
|
|
|
|
by Amy Armitage September 16, 2009
|
| Amy Armitage |
Amy Armitage is the head of Business Development for Lunarpages. Lunarpages provides quality web hosting from their US-based hosting facility. They offer a wide-range of
services from linux virtual private servers and managed solutions to
shared and reseller hosting plans. Visit online for more information. |
| Amy Armitage
has written 20 articles for HostReview. |
| View all articles by Amy Armitage... |
Before you can talk about differences between clustering and load
balancing, and there are more than a few, you’ve got to get the
definitions straight. Clustering is often understood to mean the
capability of some software to provide load balancing services, and
load balancing is often used as a synonym for a hardware- or
third-party-software-based solution.
In practice, clustering is
usually used with application servers like IBM WebSphere, BEA WebLogic
and Oracle AS (10g). Also being used in that environment are load
balancing features found in Application Delivery Controllers (ADC) like
BIG-IP. (For simplicity, we will talk about clustering versus ADC
approaches.)
Scalability, horizontally speaking
There are hardware load
balancers, of course, but there we talk about pools or farms, the
server groupings where application requests get distributed. It is in
the software world that the term cluster is applied to that same group.
Clustering
will typically convert one instance of an application server to a
master controller, then process/distribute requests to multiple
instances using such industry standard algorithms as round robin,
weighted round robin or least connections. Clustering is similar to
load balancing in that it has horizontal scalability, a nearly
transparent way to add additional instances of application servers for
increased capacity or response time performance. To ensure that an
instance is actually available, clustering approaches typically use an
ICMP ping check or, sometimes, HTTP or TCP connection checks.
Health and transparency
For load balancing, ADCs support the
same industry algorithms, but have additional, complex number-crunching
processes, and check such parameters as per-server CPU and memory
utilization, fastest response times, etc. ADCs also support more robust
health monitoring than the simple app server clustering solutions. This
means they can verify content and do passive monitoring, dispensing
with even the low impact of health checks on app server instances.
For
applications that require the user to interact with the same server
during a session, clustering uses server affinity to get the user
there. This is most common during the execution of a process like order
entry, where the session is used between pages (requests) to store data
needed to close a transaction, like a shopping cart.
For the
same situation, ADCs use persistence. Clustering solutions are usually
somewhat limited as to the variables they can use, while ADCs can not
only use traditional application variables but also get other
information from the application or network-based data.
More
than a few clustering solutions need node-agents deployed on each
instance of an application server that is clustered by a controller. It
may not be a burden as far as deploying and managing it, since it is
often in place, but it is still means more processes running on the
servers and consuming memory and CPU resources. Of course, it also adds
another possible failure point to the data path. Since ADCs need no
server-side components, they remain completely transparent.
Making the choice
Some
would ask, Why do the extra work of building a distributed software
system and cluster server setup when you can have multiple servers
fulfilling specific roles such as separate database servers, web
servers, mail servers, etc. whenever necessary?
So, how do you
choose? That depends on the reasons you are considering this kind of
solution in the first place, and (perhaps) whether or not you have to
make an additional purchase to achieve clustering capabilities for the
particular application server you have. There is also the broader
question of whether or not you need (or want) to provide support for
multiple application server brands. Clustering, of course, is
proprietary to the application server, but ADCs can provide services
for any and all applications or web servers.
Clustering checklist
Pros:
- Typically available with application server’s enterprise package
- Doesn't require the highest level of networking know-how
- Usually less costly than redundant ADC deployments
Cons:
- High availability not assured with clustering solutions
- Best practices deploy the cluster controller on separate hardware
- Node agents required on managed app server instances
- Clustering is "proprietary" (you can cluster only homogeneous servers)
ADC checklist
Pros:
- Provides high availability and load balancing in heterogeneous environments
- Added value of application optimization, security and acceleration
- No changes required to applications or servers where they’re deployed
Cons:
- An additional piece of infrastructure in the architecture
- Generally more costly than clustering solutions
- Could require new skill set to deploy/manage
Recommendations
Get more insight into performance, configurations and case studies
by reading some testing-based articles on ADCs, and testing-based
reviews of server clustering. Look for case studies that mirror your
own situation, as closely as possible, and talk to people who are doing
what you are planning (or thinking about). Unlike government going into
the car business or taking over health care, do not do something
quickly just to be seen doing something. Take care with this decision. |