j2ee‎ > ‎

J2EE Clustering

posted Jun 20, 2010, 7:52 PM by Kuwon Kang

Under the Hood of J2EE Clustering August 2005 

Discuss this Article


More and more mission-critical and large scale applications are now running on Java 2, Enterprise Edition (J2EE). Those mission-critical applications such as banking and billing ask for more high availability (HA), while those large scale systems such as Google and Yahoo ask for more scalability. The importance of high availability and scalability in today's increasingly inter-connected world can be proved by a well known incident: a 22-hour service outage of eBay in June 1999, caused an interruption of around 2.3 million auctions, and made a 9.2 percent drop in eBay's stock value.

J2EE clustering is a popular technology to provide high available and scalable services with fault tolerance. But due to the lack of support from the J2EE specification, J2EE vendors implement clustering differently, which causes a lot of trouble for J2EE architects and developers. Following questions are common:

  • Why are the commercial J2EE Server products with Clustering capabilities so expensive? (10 times compared with no clustering capabilities)
  • Why does my application built on stand-alone J2EE server not run in a cluster?
  • Why does my application run very slowly in a cluster while much faster in non-clustered environment?
  • Why does my cluster application fail to port to other vendors’ server?

The best way to understand the limitations and considerations is to study their implementations and uncover the hood of J2EE clustering.

Basic Terminology

It makes sense to understand the different concepts and issues that underlie clustering technology before we discuss the different implementations. I hope this will not only give you the foundation necessary to understand various design issues and concepts in J2EE clustering products, but will also frame the various issues that differentiate clustering implementations and make them easier to understand as well.

ScalabilityIn some large-scale systems, it is hard to predict the number and behavior of end users. Scalability refers to a system’s ability to support fast increasing numbers of users. The intuitive way to scale up the number of concurrent sessions handled by a server is to add resources (memory, CPU or hard disk) to it. Clustering is an alternative way to resolve the scalability issue. It allows a group of servers to share the heavy tasks, and operate as a single server logically.

High AvailabilityThe single-server’s solution (add memory and CPU) to scalability is not a robust one because of its single point of failure. Those mission-critical applications such as banking and billing cannot tolerate service outage even for one single minute. It is required that those services are accessible with reasonable/predictable response times at any time. Clustering is a solution to achieve this kind of high availability by providing redundant servers in the cluster in case one server fails to provide service.

Load balancingLoad balancing is one of the key technologies behind clustering, which is a way to obtain high availability and better performance by dispatching incoming requests to different servers. A load balancer can be anything from a simple Servlet or Plug-in (a Linux box using ipchains to do the work, for example), to expensive hardware with an SSL accelerator embedded in it. In addition to dispatching requests, a load balancer should perform some other important tasks such as “session stickiness” to have a user session live entirely on one server and “health check” (or “heartbeat”) to prevent dispatching requests to a failing server. Sometimes the load balancer will participant in the “Failover” process, which will be mentioned later.

Fault ToleranceHighly available data is not necessarily strictly correct data. In a J2EE cluster, when a server instance fails, the service is still available, because new requests can be handled by other redundant server instances in the cluster. But the requests which are in processing in the failed server when the server is failing may not get the correct data, whereas a fault tolerant service always guarantees strictly correct behavior despite a certain number of faults.

FailoverFailover is another key technology behind clustering to achieve fault tolerance. By choosing another node in the cluster, the process will continue when the original node fails. Failing over to another node can be coded explicitly or performed automatically by the underlying platform which transparently reroutes communication to another server.

Idempotent methodsPronounced “i-dim-po-tent”, these are methods that can be called repeatedly with the same arguments and achieve the same results. These methods shouldn’t impact the state of the system and can be called repeatedly without worry of altering the system. For example, “getUsername()” method is an idempotent one, while “deleteFile()” method isn’t. Idempotency is an important concept when discussing HTTP Session failover and EJB failov