Java theory and practice: Understanding JTS -- Balancing safety and performanceGuidelines for transaction demarcation and isolationBrian Goetz (brian@quiotix.com), Principal Consultant, Quiotix Corp
Summary: In Parts 1 and 2 of his series on JTS, Brian covered the basics of what transactions are and how J2EE containers make transaction services transparent to EJB components. While being able to specify a component's transactional semantics declaratively instead of programmatically offers a great deal of flexibility in configuring enterprise applications, making poor decisions at application assembly time can impair the performance and stability of your applications. In this final installment, Brian looks at the facilities that J2EE offers for managing transaction demarcation and isolation and some guidelines for using them effectively.
Date: 01 May 2002 Level: Intermediate Activity: 2095 views Comments: 0 (Add comments)
In Part 1 ("An introduction to transactions") and Part 2 ("The magic beind the scenes") of this series, we defined what a transaction is, enumerated the basic properties of transactions, and explored how Java Transaction Service and J2EE containers work together to provide transparent support for transactions to J2EE components. In this article, we'll take on the topic of transaction demarcation and isolation.
The responsibility for defining transaction demarcation and isolation attributes for EJB components lies with the application assembler. Setting these improperly can have serious consequences for the performance, scalability, or fault-tolerance of the application. Unfortunately, there are no hard-and-fast rules for setting these attributes properly, but there are some guidelines that can help us find a balance between concurrency hazards and performance hazards.
As we discussed in Part 1, transactions are primarily an exception-handling mechanism. Transactions serve a similar purpose in programs that legal contracts do in everyday business: they help us recover if something goes wrong. But because most of the time nothing actually goes wrong, we'd like to be able to minimize their cost and intrusion the rest of the time. How we use transactions in our applications can have a big effect on application performance and scalability.
Transaction demarcation
J2EE containers provide two mechanisms for defining where transactions begin and end: bean-managed transactions and container-managed transactions. In bean-managed transactions, you begin and end a transaction explicitly in the bean methods with
UserTransaction.begin() and UserTransaction.commit() . Container-managed transactions, on the other hand, offer a lot more flexibility. By defining transactional attributes for each EJB method in the assembly descriptor, you can specify what the transactional requirements are for each method and let the container determine when to begin and end a transaction. In either case, the basic guidelines for structuring transactions are the same.
Get in, get out
The first rule of transaction demarcation is "Keep it short." Transactions provide concurrency control; this generally means that the resource manager will acquire locks on your behalf on data items you access during the course of a transaction, and it must hold them until the transaction ends. (Recall from the ACID properties discussed in Part 1 of this series that the "I" in "ACID" stands for "Isolation." That is, the effects of one transaction do not affect other transactions that are executing concurrently.) While you are holding locks, any other transaction that needs to access the data items you have locked will have to wait until you release the locks. If your transaction is very long, all those other transactions will be blocked, and your application throughput will plummet.
Rule 1: Keep transactions as short as possible.By keeping transactions short, you minimize the time you are in the way of other transactions and thereby enhance your application's scalability. The best way to keep transactions as short as possible, of course, is not to do anything that is unnecessarily time consuming in the middle of a transaction, and in particular don't wait for user input in the middle of a transaction. It may be tempting to begin a transaction, retrieve some data from a database, display the data, and then ask the user to make a choice while still in the transaction. Don't do this! Even if the user is paying attention, it will still take seconds to respond -- a long time to be holding locks in the database. And what if the user decided to step away from the computer, perhaps for lunch or even to go home for the day? The application will simply grind to a halt. Doing I/O during a transaction is a recipe for disaster. Rule 2: Don't wait for user input during a transaction.Group related operations together Because each transaction has non-trivial overhead, you might think it's best to perform as many operations in a single transaction as possible to minimize the per-operation overhead. However, Rule 1 tells us that long transactions are bad for scalability. So how do we achieve a balance between minimizing per-operation overhead and scalability? Taking Rule 1 to its logical extreme -- one operation per transaction -- would not only introduce additional overhead, but could also compromise the consistency of the application state. Transactional resource managers are supposed to maintain the consistency of the application state (recall from Part 1 that the "C" in "ACID" stands for "Consistency"), but they rely on the application to define what consistency means. In fact, the definition of consistency we used when describing transactions is somewhat circular: consistency means whatever the application says it is. The application organizes groups of changes to the application state into transactions, and the resulting application state is by definition consistent. The resource manager then ensures that if it has to recover from a failure, the application state is restored to the most recent consistent state. In Part 1, we gave an example of transferring funds from one account to another in a banking application. Listing 1 shows a possible implementation of this in SQL, which contains five SQL operations (a select, two updates, and two inserts):
Rule 3: Group related operations into a single transaction.The ideal balance Rule 1 said transactions should be as short as possible. The example from Listing 1 shows that sometimes we have to group operations together into a transaction to maintain consistency. Of course, it depends on the application to determine what constitutes "related operations." We can combine Rules 1 and 3 to give a general guideline for describing the scope of a transaction, which we will state as Rule 4: Rule 4: Group related operations into a single transaction, but put unrelated operations into separate transactions.Container-managed transactions When we use container-managed transactions, instead of explicitly stating where transactions start and end, we define transactional requirements for each EJB method. The transaction mode is defined in the trans-attribute element of thecontainer-transaction section of the bean's assembly-descriptor . (An example assembly-descriptor is shown in Listing 2.) The method's transaction mode, along with the state of whether the calling method is already enlisted in a transaction, determines which of several actions the container takes when an EJB method is called:
Required , RequiresNew , Mandatory , Supports , NotSupported , and Never . Table 1 summarizes the behavior of each mode -- both when called in an existing transaction and when called while not in a transaction -- and describes which types of EJB components support each mode. (Some containers may permit you greater flexibility in choosing transaction modes, but such use would be relying on a container-specific feature and hence would not be portable across containers.)
Table 1. Transaction modes
Required or RequiresNew . When the container creates a transaction as a result of calling a transactional method, that transaction will be closed when the method completes. If the method returns normally, the container will commit the transaction (unless the application has asked for the transaction to be rolled back). If the method exits by throwing an exception, the container will roll back the transaction and propagate the exception. If a method is called in an existing transaction T and the transaction mode specifies that the method should be run without a transaction or run in a new transaction, transaction T is suspended until the method completes, and then the previous transaction T is resumed.
Choosing a transaction mode
So which mode should we choose for our bean methods? For session and message-driven beans, you will usually want to useRequired to ensure that every call will be executed as part of a transaction, but will still allow the method to be a component of a larger transaction. Exercise care with RequiresNew ; it should only be used when you are sure that the actions of your method should be committed separately from the actions of the method that called you. RequiresNew is typically used only with objects that have little or no relation to other objects in the system, such as logging objects. (Using RequiresNew with a logging object makes sense because you would want the log message to be committed regardless of whether the enclosing transaction commits.)
Using RequiresNew in an inappropriate manner can result in a situation similar to the one described above, where the code in Listing 1 was executed in five separate transactions instead of one, which can leave your application in an inconsistent state.
For CMP (container-managed persistence) entity beans, you will usually want to use Required . Mandatory is also a reasonable option, especially for initial development; this will alert you to cases where your entity bean methods are being called outside of a transaction, which may indicate a deployment error. You almost never want to use RequiresNew with CMP entity beans.NotSupported and Never are intended for nontransactional resources, such as adapters for foreign nontransactional systems or for transactional systems that cannot be enlisted in a Java Transaction API (JTA) transaction.
When EJB applications are properly designed, applying the above guidelines for transaction modes tends to naturally yield the transaction demarcation suggested by Rule 4. The reason is that J2EE architecture encourages decomposition of the application into the smallest convenient processing chunks, and each chunk is processed as an individual request (whether in the form of an HTTP request or as the result of a message being queued to a JMS queue).
Isolation revisited
In Part 1, we defined isolation to mean that the effects of one transaction are not visible to other transactions executing concurrently; from the perspective of a transaction, it appears that transactions execute sequentially rather than in parallel. While transactional resource managers can often process many transactions simultaneously while providing the illusion of isolation, sometimes isolation constraints actually require that beginning a new transaction be deferred until an existing transaction completes. Since completing a transaction involves at least one synchronous disk I/O (to write to the transaction log), this could limit the number of transactions per second to something close to the number of disk writes per second, which would not be good for scalability.
In practice, it is common to relax the isolation requirements substantially to allow more transactions to execute concurrently and enable improved system response and greater scalability. Nearly all databases support four standard isolation levels: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
Unfortunately, managing isolation for container-managed transactions is currently outside the scope of the J2EE specification. However, many J2EE containers, such as IBM WebSphere and BEA WebLogic, provide container-specific extensions that allow you to set transaction isolation levels on a per-method basis in the same manner as transaction modes are set in the assembly-descriptor. For bean-managed transactions, you can set isolation levels via the JDBC or other resource manager connection.
To illustrate the differences between the isolation levels, let's first categorize several concurrency hazards -- cases where one transaction might interfere with another in the absence of suitable isolation. All of the following hazards have to do with the results of one transaction becoming visible to a second transaction after the second transaction has already started:
Rule 5: Use the lowest isolation level that keeps your data safe, but if in doubt, use Serializable.Even if you are planning to initially err on the side of caution and hope that the resulting performance is acceptable (the performance management technique called "denial and prayer" -- probably the most commonly employed performance strategy, though most developers will not admit it), it pays to think about isolation requirements as you are developing your components. You should strive to write transactions that are tolerant of lower isolation levels where practical, so as not to paint yourself into a corner later on if performance becomes an issue. Because you need to know what a method is doing and what consistency assumptions are buried within it to correctly set the isolation level, it is also a good idea to carefully document concurrency requirements and assumptions during development, so as to assist in making correct decisions at application assembly time. Conclusion Many of the guidelines offered in this article may appear somewhat contradictory, because issues such as transaction demarcation and isolation are inherently trade-offs. We're trying to balance safety (if we didn't care about safety, we wouldn't bother with transactions at all) against the performance overhead of the tools we're using to provide that margin of safety. The correct balance is going to depend on a host of factors, including the cost or damage associated with system failure or downtime and your organizational risk tolerance. Resources
Brian Goetz is a software consultant and has been a professional software developer for the past 15 years. He is a Principal Consultant at Quiotix, a software development and consulting firm located in Los Altos, California. See Brian's published and upcoming articles in popular industry publications.
|
j2ee >