Using JCache to Save Money

First published in Java Developers Journal, June 2003
see original article (registration required).

During the past 18 months, a rapidly growing number of organizations have been taking advantage of the emerging JCache standard for distributed caching to help scale application performance while at the same time reducing infrastructure costs.

This article looks at some of the strengths and weaknesses of various caching architectures, examines how they fit into the surrounding J2EE and other ecosystems, and pinpoints each one's "sweet spot." It will look at both "flat" and multi-tier frameworks, and contrast standards-based frameworks with proprietary offerings.

JCache: A Pluggable Java Temporary Caching Framework

The JCache specification (see sidebar) standardizes in-process caching of Java objects, and removes from the application programmer the burden of implementing standard cache features such as data validation, locking, eviction, and management (see Figure 1). As well as providing the basic put and get methods (a Cache extends a standard Map), the API offers a pluggable CacheLoader interface so users can add custom loaders for whatever data sources they are using.

An increasing number of products are supporting the JCache API; of course, simply supporting JCache is only a part of the solution. Depending on the problem you and your application developers are trying to solve, you should carefully consider which cache architecture best fits your requirements and constraints.

What Are the Trade-Offs Involved in Caching?

Caching is always based on compromise; a trade-off between performance, scalability, and accuracy using the various resources available. Ease of configuration is an important secondary consideration. Let's consider how we balance these factors to achieve the best performance possible.

Accuracy - How Stale Is Your Data?

When reading data from a cache for the second and subsequent times, how do you know whether the data is "stale"? Has the underlying data been changed? A cache can make one of the following assumptions:

The data is valid "forever" (until the data item is ejected to make space for another, or the cache is closed down). This is easy to understand and implement but obviously only works for "static" data - today's weather forecast may be static, but a current stock price certainly isn't.

The data is valid for a fixed period of time (or fixed number of accesses, or some other simple algorithm that the cache can apply), after which it is invalidated. This makes an excellent choice for data whose normal change cycle is longer than the "time to live" chosen, and where the inevitable (but occasional) use of out-of-date data is not critical.

Neither of these approaches is perfect - either can leave a wide-open window of vulnerability during which stale data could be used by the application. We'll see later how active, pushed-based caches can reduce or remove that window.

Scalability - Will the Cache Itself Become a Bottleneck?

Although a cache is intended to improve performance, simplistic techniques can sometimes have a counterintuitive effect. If you try to cache too much data, or if data is aged out too frequently, then the cache can add more overhead than it saves. Does your cache expand to such an extent that it is using up all your physical memory? If so, you may be spending far more time and effort paging (thrashing!) your virtual memory than you saved on data access.

Moreover, cache techniques that work well in single-user cases can break down when tens, hundreds, or thousands of users are involved. Multiple sessions serializing on synchronized accesses to the cache can be a significant drag on performance.

A single cache cannot grow indefinitely - sooner or later the workload somehow has to be spread across the network; thus the distributed cache is born. For read-only caches this is no big issue; each cache can operate independently, serving its own portion of users. Caches that need to support data updates meet additional problems of distributed locking and synchronization.

How Do Different Cache Architectures Measure Up?

Page Caches/Proxy Caches

Web server caches can appear within Web servers, or as stand-alone appliances in front of Web servers. Typically a Web server cache has a very simple model: supporting time-based invalidation, simple configuration policies (based on file types, filename pattern matching, etc.), and perhaps offering some degree of operator control such as the ability to flush the cache (as a whole, or by region).

The main task of a Web server is to serve pages; the cache helps that along without requiring any complex programming. These caches work best when many users are accessing the same pages; all users get the benefit of the same cached pages.

Page fragment caches add a further refinement by allowing different rules to be applied to different parts of the Web page. Static content is cached forever, volatile content has a time-to-live, and transactional content is served directly. Caching policy is associated with different page components using JSP tags, for example. These caches have to be more careful about data sharing, and typically they will have separate policies for application, session, and "global" data.

Database Caches

Most databases incorporate a data cache of some sort; some include several. Oracle, for example, includes the "shared global area" that contains a cache of recently used database blocks as well as caches of compiled stored procedure code, parsed SQL statements, data dictionary information, and more. A correctly sized cache is a crucial component of a well-tuned database. However, you should realize that the process of extracting data from the cache is still very resource hungry:

The client application issues an SQL statement.
The statement is sent across the network.
The statement is compared to cached statements; if found in the cache there's no need to reparse it.
The parsed statement - with its generated access plan - is executed; the dictionary, index, and data blocks in the cache are searched.
Disk reads into the cache are made if necessary.
Row and column data is extracted from the disk blocks.
This data is finally sent back across the network to the client application.

Of these, only Steps 3 and 4 are affected by the cache. The other steps may well take several milliseconds and many thousands of CPU cycles. So there's still plenty of room for caching software outside the database, cutting out unnecessary calls to the data server.

Some products let you hide a further cache in the transport layers above the database; for example, there are a number of JDBC drivers that can cache the result sets from frequently executed SQL statements. These caches can cut out repeated reads, and are an easy retrofit to existing applications. However, they often don't deal well (or at all) with the problems caused when data is being updated as well as read.

Transactional Caches

This is where a transactional cache comes in - dealing with volatile data that's being created, updated, and deleted as well as read. Often the cache is linked to a particular programming model, either proprietary - often based on an object database - or based on standards such as J2EE's Container Managed Persistence (CMP), or the Java Data Objects (JDO) specification. The programming model provides the "ground rules" for the cache, identifying sessions, the start and end of transactions, and the locking policies to be used. Updates go through the cache to ensure that all parties (client sessions, the cache, and the underlying data store) are kept in sync.

Transactional caches can be pessimistic (lock early, which can result in serialization with sessions queuing for locks) or optimistic (lock as late as possible, which improves concurrency but is more likely to allow update conflicts to develop). Unless data is carefully partitioned, distributed locking adds latency; this increases the serialization effect. Anyone used to distributed databases will recognize the symptoms: a system reduced to a crawl while processors are still underutilized and sessions are simply queuing up on data locks.

Transactional caches tend to be tightly coupled to a particular application platform or coding style; some products require use of specific development and runtime tools, and it can be fearfully difficult to retrofit transactional caches onto existing applications.

Active, Push-Based Caching

Active caches turn the data concurrency problem around. Rather than having the cache try to predict whether cached data is still valid (the time-based approach) or check against the database (using transactional "select for update" locking), data updates can be "pushed" out directly to the client caches, as well as "pulled" into the cache as a result of client application requests for data (see Figure 2).

The performance advantages are clear. Just one message is needed to notify a cache about new data values. Application threads register a listener with the cache, and the cache listens for the message. Client sessions are never at risk from stale, out-of-date data - whether the data changes once a day or every few seconds. Database access is reduced. Data is read once when cached; the "notification agent" - maybe a database trigger - fires only when the data is updated at the source. As long as updates are less frequent than queries, the push-based cache is extremely effective in reducing network traffic, and all data accesses except the first respond quickly, without network and datastore latency.

Heterogeneous Data Sources

Often there are several different types of data source; a homogeneous transaction model based on just one data type is inappropriate. Real-world applications deal with many different styles of data - relational, structured, object oriented - using many different access techniques - SQL, ISAM, LDAP, etc. It's unusual for typical transactional caches to support more than one of these models cleanly.

An active cache based on JCache can sidestep the problem by providing a single API to cached data, across any data type, combined with the simple update notification interface. Data updates can be fed into the cache from the data server, or from its client applications.

Distributed Caches

When necessary, caches can easily be distributed across a server farm. The distributed caches can be:

Independent: Each cache operates without reference to the others. The same data item may be cached in many places.

Partitioned:Data is divided somehow between caches; clients (or the cache API) "know" which cache to address for each piece of data, which is held only once.

Coordinated:The same data may appear in several caches, effectively side by side in a "flat" structure; cache misses may be served either from "peer" caches or from the underlying datastore. The cache software hides this complexity, and manages the necessary exchange and locking of data.

Multi-Tier Caching - The Flexible Solution

It is also quite easy to develop "multi-tier" caches. At the bottom layer there's a regular cache over the underlying datastore. Cache misses at this level convert to datastore lookups. To improve performance, further cache layers are added (see Figure 3); a cache miss at these higher layers converts to a cache lookup in the next layer down.

How does this help? Well, the top layer can be right up close to the application client - in the same virtual machine. This relatively small "VM cache" can be supported by a larger freestanding "local cache," which soaks up most top-tier cache misses without needing any network traffic. The middle-tier local cache passes its own cache misses down to the bottom, much larger, datastore cache. With the added degrees of flexibility offered by two or three cache tiers, it's quite easy to tune the caching framework to optimal performance for a specific application within the constraints of memory, processor, and network resources available.

Caching and JMS

Some distributed cache products use proprietary message formats, but increasingly JMS (Java Message Service) is recognized as being the best choice. Cache load requests are passed down the hierarchy using JMS queues, which can easily load balance requests; the data can be returned on a queue (for a specific cache) or on a topic (making it simple to organize cache clusters).

Update notifications can also be broadcast on a JMS topic; the notification agent is simply a JMS publisher client. Any kind of data source or data feed can easily be fitted into this model, and JMS guarantees to maintain the order of updates so that everything is kept consistent.

The alternatives to JMS are not attractive. Some use multicast, which is superficially attractive for broadcast - a single physical message reaches all intended subscribers - but does not offer guaranteed delivery, message ordering, or content-based addressing. Attempts to bolt these features on typically add more overhead than the multicast saves. Worse, deploying multicast across a WAN or the Internet is fraught with technical and administrative problems; many routers do not support (or do not allow) multicast traffic flow.

Conclusion

The bottom line is that application performance depends on efficient data distribution. Since almost all server interactions involve data access, it's crucial to ensure fast data access for maximum application performance. It pays to build a cache and avoid unnecessary round-trips to the datastore. By reducing traffic between the different layers of an application, you can substantially diminish the size and cost of the installation and greatly enhance the system's responsiveness.

Applications depending on JCache gain simplicity and flexibility in terms of configuration and performance management. By using a standard API, developers avoid the danger of being locked into proprietary caching mechanisms. JCache can sit over any type of datastore - whether "static" (relational, object, or legacy databases, for example) or "dynamic" (for instance, a financial market data feed, process control telemetry readings, or network management events).

Adding JMS and an active push-based caching model into the equation lets the architect set the quality of service required, scale up to the load demanded, and fine-tune his or her intercache traffic. JMS traffic-shaping techniques - carefully honed to support enterprise messaging architectures - can be applied to optimize and balance network load in a multi-tier caching framework. Using JMS and JCache active push-based caching, developers can choose to cache anywhere in the application or network. In other words, caching can be done close to the data source or close to the delivery destination. Business requirements can then be more closely aligned with the enterprise architecture to ensure that caching is done at the optimal level.

More About JCache

JCache has been in development under the Java Community Process as JSR-107) since it was first proposed by Oracle in early 2001. At the time of writing, the JCache expert group has not yet released a community draft, making it one of the slower JSRs in the pack.

There's always some risk in adopting a standard before it's ratified, but developers need not worry too much about the danger of being inextricably locked into a prerelease JCache API. First, the proposed API has been stable for quite a while now - although it has not been published outside of the expert group. Second, most developers will see only the client side of the API, which is not much different from working with regular Map objects - it's very easy to use, and it's also very easy to upgrade existing "roll your own" caches based on hashmaps.

At least one major user organization is represented on the JCache expert group alongside vendors such as Oracle, Gemstone, SpiritSoft, and Tangosol - and you can be sure that they wouldn't commit so much to the standard if they didn't think it was going to be worth it in the long run.