Overview of the ehcache

Overview of the Ehcache

2011.12.02

chois79

Contents

• About Caches

• Why caching works

• Will an Application Benefit from Caching?

• How much will an application speed up?

• About Ehcache

• Features of Ehcache

• Key Concepts of Ehcache

• Using Ehcache

• Distributed Ehcache Architecture

• References

About Caches

• In Wiktionary

– A store of things that will be required in future and can be

retrieved rapidly

• In computer science

– A collection of temporary data which either duplicates data

located elsewhere of is the result of a computation

– The data can be repeatedly accessed inexpensively

Why caching works

• Locality of Reference

– Data that is near other data or has just been used is more likely to be used

again

• The Long Tail

– One form of a Power Law distribution is the Pareto distribution (80:20 rule)

– IF 20% of objects are used 80% of the time and a way can be found to

reduce the cost of obtaining that 20%, then system performance will improve

A small number of items may make up the bulk of sales. – Chris Anderson

Will an Application Benefit from Caching? CPU bound Application

• The time taken principally depends on the speed of the CPU

and main memory

• Speeding up

– Improving algorithm performance

– Parallelizing the computations across multiple CPUs or multiple

machines

– Upgrading the CPU speed

• The role of caching

– Temporarily store computations that may be reused again

• Ex) DB Cache, Large web pages that have a high rendering cost.

Will an Application Benefit from Caching? I/O bound Application

• The time taken to complete a computation depends principally

on the rate at which data can be obtained

• Speeding up

– Hard disks are speeding up by using their own caching of blocks into

memory

• There is no Moore’s law for hard disk.

– Increase the network bandwidth

• The role of cache

– Web page caching, for pages generated from databases

– Data Access object caching

Will an Application Benefit from Caching? Increased Application Scalability

• Data bases can do 100 expensive queries per second

– Caching may be able to reduce the workload required

How much will an application speed up? (Amdahl’s Law)

• Depend on a multitude of factors

– How many times a cached piece of data can and is

reduced by the application

– The proportion of the response time that is alleviated by

caching

• Amdahl’s Law

P: Proportion speed up

S: Speed up

Amdahl’s Law Example (Speed up from a Database Level Cache)

Un-cached page time: 2 seconds

Database time: 1.5 seconds

Cache retrieval time: 2ms

Proportion: 75% (2/1.5)

The expected system speedup is thus:

1 / (( 1 – 0.75) + 0.75 / (1500/2))

= 1 / (0.25 + 0.75/750)

= 3.98 times system speedup

About Ehcache

• Open source, standards-based cache used to boost performance

• Basically, based on in-process

• Scale from in-process with one more nodes through to a mixed in-

process/out-of-process configuration with terabyte-sized caches

• For applications needing a coherent distributed cache, Ehcache uses

the open source Terracotta Server Array

• Java-based Cache, Available under an Apache 2 license

• The Wikimedia Foundation use Ehcache to improve the performance

of its wiki projects

Features of Ehcache(1/2)

• Fast and Light Weight

– Fast, Simple API

– Small foot print: Ehcache 2.2.3 is 668 kb making it convenient to package

– Minimal dependencies: only dependency on SLF4J

• Scalable

– Provides Memory and Disk store for scalability into gigabytes

– Scalable to hundreds of nodes with the Terracotta Server Array

• Flexible

– Supports Object or Serializable caching

– Provides LRU, LFU and FIFO cache eviction policies

– Provides Memory and Disk stores

Features of Ehcache(2/2)

• Standards Based

– Full implementation of JSR107 JCACHE API

• Application Persistence

– Persistent disk store which stores data between VM restarts

• JMX Enable

• Distributed Caching

– Clustered caching via Terracotta

– Replicated caching via RMI, JGroups, or JMS

• Cache Server

– RESTful, SOAP cache Server

• Search

– Standalone and distributed search using a fluent query language

Key Concepts of Ehcache Key Classes

• CacheManager

– Manages caches

• Ehcache

– All caches implement the Ehcache interface

– A cache has a name and attributes

– Cache elements are stored in the memory store, optionally the also overflow

to a disk store

• Element

– An atomic entry in a cache

– Has key and value

– Put into and removed from caches

Key Concepts of Ehcache Usage patterns: Cache-aside

• Application code use the cache directly

• Order

– Application code consult the cache first

– If cache contains the data, then return the data directly

– Otherwise, the application cod must fetch the data from the system-of-record,

store the data in the cache, then return.

– 0

Key Concepts of Ehcache Usage patterns: Read-through

• Mimics the structure of the cache-aside patterns when reading data

• The difference

– Must implement the CacheEntryFactory interface to instruct the cache how to

read objects on a cache miss

– Must wrap the Ehcache instance with an instance of SelfPopulationCache

– 4

Key Concepts of Ehcache Usage patterns: Write-through and behind

• Mimics the structure of the cache-aside pattern when data write

• The difference

– Must implement the CacheWriter interface and configure the cache for write-through or write

behind

– A write-through cache writes data to the system-of-record in the same thread of execution

– A write-behind queues the data for write at a later time

– d

Key Concepts of Ehcache Usage patterns: Cache-as-sor

• Delegate SOR reading and writing actives to the cache

• To implement, use a combination of the following patterns

– Read-through

– Write-through or write-behind

• Advantages

– Less cluttered application code

– Easily choose between write-through or write-behind strategies

– Allow the cache to solve the “thundering-herd” problem

• Disadvantages

– Less directly visible code-path

Key Concepts of Ehcache Storage Options: Memory Store

• Suitable Element Types

– All Elements are suitable for placement in the Memory Store

• Characteristics

– Thread safe for use by multiple concurrent threads

– Backed By LinkedHashMap (Jdk 1.4 later)

• LinkedHashMap: Hash table and linked list implementation of the Map interface

– Fast

• Memory Use, Spooling and Expiry Strategy

– Least Recently Used (LRU): default

– Least frequently Used (LFU)

– First In First Out (FIFO)

Key Concepts of Ehcache Storage Options: Big-Memory Store

• Pure java product from Terracotta that permits caches to use an additional type of

memory store outside the object heap. (Packaged for use in Enterprise Ehcache)

– Not subject to Java GC

– 100 times faster than Disk-Store

– Allows very large caches to be created(tested up to 350GB)

• Two implementations

– Only Serializable cache keys and values can be placed similar to Disk Store

– Serializaion and deserialization take place putting and getting from the store

• Around 10 times slower than Memory Store

• The memory store holds the hottest subset of data from the off-heap store, already in deserialized form

• Suitable Element Types

– Only Elements which are serializable can be placed in the off-heap

– Any non serializable Elements will be removed and WARNING level log message emitted

Key Concepts of Ehcache Storage Options: Disk Store

• Disk Store are optional

• Suitable Element Type

– Only Elements which are serializable can be placed in the off-heap

– Any non serializable Elements will be removed and WARNING level

log message emitted

• Eviction

– The LFU algorithm is used and it is not configurable or changeable

• Persistence

– Controlled by the disk persistent configuration

– If false or onmitted, disk store will not presit between CacheManager restarts

Key Concepts of Ehcache Replicated Caching

• Ehcache has a pluggable cache replication scheme

– RMI, JGroups, JMS

• Using a Cache Server

– To achieve shared data, all JVMs read to and write from a Cache Server

• Notification Strategies

– If the Element is not available anywhere else then the element it self shoud from the pay load

of the notification

– D

Key Concepts of Ehcache Search APIs

• Allows you to execute arbitrarily complex queries either a standalone

cache or a Terracotta clustered cache with pre-built indexes

• Searchable attributes may be extracted from both key and vales

• Attribute Extractors

– Attributes are extracted from keys or values

– This is done during search or, if using Distributed Ehcache on put() into the

cache using AttributeExtractors

– Supported types

• Boolean, Byte, Character, Double, Float, Integer, Long, Short, String, Enum, java.util.Date,

Java.sql.Date

Using Ehcache General-Purpose Caching

• Local Cache

• Configuration

– Place the Ehcache jar into your class-path

– Configure ehcache.xml and place it in your class-path

– Optionally, configure an appropriate logging level

– d

Application Local

Ehcache

DB

Web Server

Web Server

Using Ehcache Cache Server

• Support for RESTful and SOAP APIs

• Redundant, Scalable with client hash-based routing

– The client can be implemented in any language

– The client must work out a partitioning scheme

– s

Using Ehcache Integrate with other solutions

• Hivernate

• Java EE Servlet Caching

• JCache style caching

• Spring, cocoon, Acegi and other frameworks

Distributed Ehcache Architecture (Logical View)

• Distributed Ehcache combines an in-process Ehcache with the Terracotta Server Array

• The data is split between an Ehcache node(L1) and the Terracotta Server Array(L2)

– The L1 can hold as much data as is comfortable

– The L2 always a complete copy of all cache data

– The L1 acts as a hot-set of recently used data

Distributed Ehcache Architecture (Ehcache topologies)

• Standalone

– The cache data set is held in the application node

– Any other application nodes are independent with no communication

between them

• Distributed Ehcache

– The data is held in a Terracotta server Array with a subset of recently used

data held in each application cache node

• Replicated

– The cached data set is held in each application node and data is copied or

invalidated across the cluster without locking

– Replication can be either asynchronous or synchronous

– The only consistency mode available is weak consistency

Distributed Ehcache Architecture (Network View)

• From a network topology point of view Distributed Ehcache consist of

– Ehcache node(L1)

• The Ehcache library is present in each app

• An Ehcache instance, running in-process sits in each JVM

– Terracotta Server Array(L2)

• Each Ehcache instance maintains a connection with one or more Terracotta Servers

• Consistent hashing is used by the Ehcache nodes to store and retrieve cache data

• 4

Distributed Ehcache Architecture (Memory Hierarchy View)

• Each in-process Ehcache instance

– Heap memory

– Off-heap memory(Big Memory)

• The Terracotta Server Arrays

– Heap memory

– Off-heap memory

– Disk storage.

• This is optional.(Persistence)

– 1

Ehcache in-process compared with Memcached

Reference

• Ehcache User Guide

– http://ehcache.org/documentation

• Ehcache Architecture, Features And Usage patterns

– Greg Luck, 2009 JavaOne Session 2007

http://ehcache.org/documentation

Overview of the ehcache

Technology

terracotta

amdahls law

distributed

key concepts

disk store

ehcache instance

ehcache node

application