Slice for Distributed Persistence (JavaOne 2010)

© 2010 IBM Corporation

Scale Java Persistence API Applications with OpenJPA Slice

Pinaki Poddar

[email protected]


2

Agenda

Core Features of Slice

Using Slice

Under the hood

Running on Slice

Source: If applicable, describe source origin


3

What is Slice?

Slice is a OpenJPA module for horizontally, partitioned databases

Java Persistence API Specification [Section 3.1] says: “A persistence unit defines the set of all classes that are related or grouped by the application, and which must be colocated in their mapping to a single database.”

Slice changes that… single to multiple


4

Horizontal Partitioning

A data set D is said to be horizontally partitioned into N partitions D1, D2, , Dn iffD = D1 D2 D3 DnDi Dj = for any i j

A common mathematical term is mutually disjoint sets

Google coined such partition operation as Shard.


5

Horizontal Partitioning in realistic setup

Natural partitioning scenarios– Customer by region (Telecom Billing)– Transaction by Month (Finance)– Software as Service Platforms (Legal compliance)


6

History of Slice

Incubated as a Apache Lab project in Jan 2008

Integrated as a OpenJPA module on July 2008 (since version 1.1)

Available with WebSphere Application Server version 7.0 onwards


7

OpenJPA

•An implementation of JPA Specification•Apache Project since May 2007 http://openjpa.apache.org


8

Architectural tiers of a typical JPA-based application

User Application

OpenJPA

Standard JPA API

JDBC API

400 million records


9

Architectural tiers of a Slice-based application

User Application

OpenJPA

Standard JPA API

JDBC API

Slice

OpenJPAis a plugabbleplatform

User-defined DataDistribution Policy


4x100 million records


10

Separate Persistence Unit configured to partitioned databases

CA[1]

DA

Unit A

CA[2] CB[1]

DB

Unit B

CB[2] CC[1]

DC

Unit C

CC[2]


11

Same persistence unit switches contexts to partitioned databases

CA[1]

DA

Persistence Unit

CA[2] CC[2]CC[1]CB[2]CB[1]

DB DC


12

Same persistence unit connected to partitioned databases

C[1]

DA

Persistence Unit

C[2]

DB DC


13

Features of Slice

Slice-based User Application

OpenJPA

Standard JPA API

JDBC API

Slice

No changes toApplication code

No changes toApplication code



Flexible per-SliceConfiguration

Flexible per-SliceConfiguration

Parallel QueryExecution

Parallel QueryExecution

HeterogeneousDatabases

HeterogeneousDatabases

Master-basedSequence

Master-basedSequence

Targeted Query

Targeted Query

No changes toDomain Model

No changes toDomain Model

User-defined QueryTarget Policy

User-defined QueryTarget Policy

No changes toDatabase Schema

No changes toDatabase Schema

4x100 millon records


14

Agenda


Using Slice

Under the hood

Running on Slice



15

Using Slice

Decide partition policy to– distribute data – target query

Configure JPA persistence unit

No change to – Application Code (ok, almost!)– Domain Model– Database Schema


16

Policy based configuration

The design goal of “no application code change” conflicts with the user application’s ability to control

– which slice(s) will store a new instance – which slice(s) will be searched for a query

The compromise solution is policy based callback interfaces– DistributionPolicy– ReplicationPolicy– QueryTargetPolicy– FinderTargetPolicy

User application may implement these policies

Slice runtime would call the policy method


17

Data Distribution Policy determines where new records are stored

Which slice stores a new data record?

– Only the user application can decide


18

How to distribute data across slices?

01: EntityManager em = …;

02: em.getTransaction().begin();

03: Person person = new Person();

04: person.setName(“John”);

05: person.setAge(42);

06: Address addr = new Address();

07: addr.setCity(“New York”);

08: person.setAddress(addr);

09: em.persist(person);

10: em.getTransation().commit();

01: public class MyDistributionPolicy implements DistributionPolicy {

02: public String distribute(Object pc, List<String> slices, Object ctx) {

03: return slices.get(((Person)pc).getAge()/20);

04: }

05: }

@Entitypublic class Person { private String name; private int age; @OneToOne (cascade=ALL) private Address address;}

@Entitypublic class Address { private String city;}

Use

r A

pp

licati

on

Domain Classes

Data Distribution Policy


19

Distribution Policy decides target slice for each instance

public interface DistributionPolicy { /** * Decide the name of the slice where the given persistent * instance would be stored. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier of the slice. This name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String distribute(Object pc, List<String> slices, Object context); }

Slice runtime will call this method while persisting or merging a root instance.The instance and its persistent closure will be stored in the returned slice.


20

Details on Distribution Policy

a b

em.persist(a);// orem.merge(a);

persistence context

MyPolicy.distribute(a,…) { return “One”;}

OneOne

<property name="openjpa.slice.DistributionPolicy" value=“acme.org.MyPolicy"/>

Slice attaches moniker to managed instance as it enters a persistence context

Slice runtime calls

User Application

CascadeType.PERSIST

CascadeType.MERGE

c One


21

Slice enforces Collocation Constraint on persistent closure

Persistent Closure of a managed instance x is– The set of instances Cx:{y} where y is reachable from x by

traversal of a relation cascaded as PERSIST or MERGE to an unlimited depth

Persistence Closure Cx, at the time of persist() or merge(), are stored in the same slice– Because Slice can not join across databases

Compliant Domain Models are referred as Constrained Tree Schema


22

An example domain model


23

Instances that violate persistent closure must be replicated

StockCSCO

Ask-152

Bid-153

Trader-1

Trader-2

Trade-15 StockGS

Ask-210

Bid-211

Trader-7

Trader-1

Trade-21

slice.One slice.Two

Trader-1 will violate collocation constraint and must be replicatedacross all slices.

Data partitioned by Stock sectors


24

Replicate master/shared data across slices

Enumerate replicated types in configuration

By default, replicated entities are stored in all slices– or implement ReplicationPolicy

<property name=“openjpa.slice.ReplicatedTypes” value=“domain.Trader”/>

01: public class DefaultReplicationPolicy implements ReplicationPolicy {

02: public String[] replicate(Object pc, List<String> slices, Object ctx) {

03: return slices.toArray();

05: }

Data Replication Policy


25

Replication Policy

public interface ReplicationPolicy { /** * Decide the name of the slices where the given persistent * instance would be replicated. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier(s) of the slice. Each name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String[] replicate(Object pc, List<String> slices, Object context); }

Slice runtime will call this method while persisting any replicated instance.


26

Distributed Query

Each query is executed across target slices in parallel

Performance upper bound is the size of the largest partition not the size of the entire dataset.– The query will execute on 100 million instead of 400 million

record

Slice attaches moniker to the selected instance based on its origin


27

Query Target Policy decides target slice for each query

public interface QueryTargetPolicy { /** * Decide the name of the slice(s) where the given query * will be executed. * * @param query The query string to be executed. * @param params the bound parameters of the query. * @param language the language of the query* @param slices name of the configured slices* @param context persistence context executing the query. * * @return identifier of the target slices. Null value implies* all configured slices. */ String[] getTargets(String query, Map params, String language, List<String> slices, Object context); }

Slice runtime will call this method for every query.

Default policy targets all available slices.


28

Intrusive way to control target slices for a query

EntityManager em = …;String jpql = "SELECT p FROM Person p where p.name=:name”;

TypedQuery<Person> query1 = em.createQuery(jpql, Person.class);

// Set a single slice as query targetquery1.setHint(SlicePersistence.HINT_TARGET, “One");List<Person> result1 = query1.setParameter(“name”, “XYZ”) .getResultList();

TypedQuery<Person> query2 = em.createQuery(jpql , Person.class);

// Set multiple slices as query targetsquery2.setHint(SlicePersistence.HINT_TARGET, Arrays.asList(new String[]{“One“,”Two”});

List<Person> result2 = query2.setParameter(“name”, “ABC”) .getResultList();


29

Distributed Query results are appended

Results from individual slices are appended

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

MARY 24 2007

BILL 29 2001

ROB 22 2008

MARY 24 2007

BILL 29 2001

ROB 22 2008

slice1

slice3

slice2

String jpql = “select e from Employee e where e.age > 30”;List<Employee> result = em.createQuery(jpql, Employee.class).getResultList();


30

Distributed Query results are sorted in-memory

Results from individual slices are sorted across target slices for ORDER BY queries in-memory

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

MARY 24 2007

BILL 29 2001

ROB 22 2008

BILL 29 2001

MARY 24 2007

ROB 22 2007

slice1

slice3

slice2

String jpql = “select e from Employee e where e.age < 30 order by e.name”;List<Employee> result = em.createQuery(jpql, Emloyee.class).getResultList();


31

Distributed Top-N Query

Top-N Result from each slice is merged (with ordering, if any) for LIMIT BY queries

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

ROB 22 2008

BILL 29 2001

slice1

slice3

slice2

MARY 24 2007

JOHN 35 2001

HARI 31 2002

SHIVA 35 1999

ROB 22 2008

MARY 24 2007

String jpql = “select e from Employee e order by e.age”;List<Employee> result = em.createQuery(jpql, Emloyee.class) .getMaxResult(2).getResultList();


32

Distributed Top-N Query

Top-N Results from individual slices are appended for LIMIT BY queries without an ORDER BY clause.

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

ROB 22 2008

BILL 29 2001

slice1

slice3

slice2

MARY 24 2007

JOHN 35 2001

HARI 31 2002

SHIVA 35 1999

ROB 22 2008

MARY 24 2007

List result = em.createQuery(“select e from Employee e”) .setMaxResult(2).getResultList();


33

Targeted Query Query and find() can be targeted to a subset of slices by hints

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

SANDRA 43 1975

JOHN 35 2001

JOSE 41 1987

SHIVA 35 1999

List result = em.createQuery(“SELECT e FROM Employee e WHERE e.age > 34”)

.setHint(“openjpa.slice.Targets”, “slice1,slice3”)

.getResultList();

SANDRA 43 1975

JOHN 35 2001

JOSE 41 1987

SHIVA 35 1999


34

Aggregate Query

Aggregate results are supported when aggregate operation is commutative to partition

NAME AGE JOIN_YEAR

ROB 22 2008

LEUNG 37 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

JOSE 41 1987

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

78 37 107

22278 37 107

Number sum = em.createQuery(“select sum(e.age) from Employee e where e.age > 30”,

Number.class).getSingleResult();


35

Distributed Aggregate Query Limitations

Commutativity– ability to change the order of operations without changing

the end result.

SUM() or MAX() is commutative to partition– SUM(D) = SUM(SUM(D1), SUM(D2), SUM(D3))

where Partition(D) = {D1,D2,D3}

But AVG() is not– AVG(D) != AVG(AVG(D1), AVG(D2), AVG(D3))


36

Aggregate Query

Aggregate results are not supported when aggregate operation is not commutative to partition

NAME AGE JOIN_YEAR

ROB 23 2008

LEUNG 38 2005

BILL 29 2001

NAME AGE JOIN_YEAR

HARI 31 2002

SHIVA 35 1999

NAME AGE JOIN_YEAR

JOHN 35 2001

MARY 24 2007

SANDRA 43 1975

slice1

slice3

slice2

34.0 30.0 32.0

Number sum = em.createQuery(“select avg(e.age) from Employee e”, Number.class)

.getSingleResult();

34.0 + 30.0 32.0=]+ 32.0[ / 3

Wrong!


37

Query for Replicated Entities

Replicated instances are detected and queried in a single slice

Number sum = (Number)em.createQuery(“SELECT COUNT(c) FROM Coutry c”)

.getSingleResult();

CODE POPULATION

US 300M

GERMANY 82M

INDIA 1200M

CODE POPULATION

US 300M

GERMANY 82M

INDIA 1200M

CODE POPULATION

US 300M

GERMANY 82M

INDIA 1200M

slice1

slice3

slice2

3

3


38

<?xml version="1.0" encoding="UTF-8"?><persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"> <persistence-unit name="test“ transaction=“RESOURCE_LOCAL”> <provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider> <class>domain.EntityA</class> <class>domain.EntityB</class> <properties> <property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost/test"/> <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> <property name="openjpa.Log" value="SQL=TRACE"/> </properties> </persistence-unit>

META-INF/persistence.xml configures a persistence unit

List of knownPersistent types

Vendor-specific configuration

Governed by XML Schema

JPA Provideris pluggable

Identified byUnit Name


39

Activate Slice through configuration

<property name="openjpa.BrokerFactory" value=“slice"/>

• Mandatory configuration

• Activates a specialized EntityManagerFactory


40

Each slice is referred by a moniker

<property name=“openjpa.slice.Names” value=“One,Two,Three”/>

• Optional (but recommended) configuration

• Associates mnemonics to physical slices


41

Identify a Master slice

<property name=“openjpa.slice.Master” value=“One”/>

• Optional (but recommended) configuration

• Identifes a master slice for identity generation


42

<property name="openjpa.slice.One.ConnectionURL“ value="jdbc:mysql://localhost/slice1"/>

<property name=“openjpa.slice.Two.ConnectionURL” value=“jdbc:mysql://localhost/slice2”/>

Specify physical slice connection details

• Mandatory configuration

• Specifies physical connection for each slice

• Property name prfixed by the slice moniker

Monikerfor a slice


43

Slices can share common properties

<property name=“openjpa.slice.Names” value=“One,Two,Three”/>

<property name="openjpa.ConnectionDriverName“

value=" com.mysql.jdbc.Driver"/>

<property name=“openjpa.slice.Three.ConnectionDriverName“

value=“com.ibm.db2.jcc.DB2Driver”/>

Properties can be shared

• unless overwritten for a specific slice


44

Ignoring unavailable slices

<property name=“openjpa.slice.Lenient” value=“true”/>

• Optional configuration

• Ignores any unreachable slice


45

Configuration Rules

Each slice is identified by a moniker

All monikers should be explicitly declared in openjpa.slice.Names– Though implicit declaration is allowed

• openjpa.slice.XYZ.abc declares a slice with moniker XYZ

A master slice is either configured by openjpa.slice.Master property– Or automatically detected by convention/heuristic as the first

slice

Each slice must be configured with database URL


46

Configuration Rules (continued)

Other properties can be shared

Each slice property defaults to common configuration– If openjpa.slice.XYZ.abc is unspecified, then abc defaults to openjpa.abc property


47

A complete example of Slice Configuration <properties>

<property name="openjpa.BrokerFactory" value=“slice"/>

<property name=“openjpa.slice.Names” value=“One,Two,Three”/> <property name=“openjpa.slice.Master” value=“One”/>

<property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql://mac1:3456/slice1"/> <property name=“openjpa.slice.Two.ConnectionURL” value=“jdbc:mysql://mac2:5634/slice2”/>

<property name=“openjpa.slice.Three.ConnectionDriverName” value=“com.ibm.db2.jcc.DB2Driver”/> <property name=“openjpa.slice.Three.ConnectionURL” value=“jdbc:db2://mac3:50000/slice3”/>

<property name="openjpa.slice.DistributionPolicy" value=“acme.org.MyDistroPolicy"/>

<property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> </properties> </persistence-unit>

META-INF/persistence.xml

Activate Slice

Declare slices

Configure each slice

Configure common behavior

Define Data Distribution Policy


48

Updates

Slice remembers original slice of each instance. – SlicePersistence.getSlice(Object pc) returns the

logical slice name for the given argument.

If an instance is modified then the update occurs in the original slice.

Replicated instances are updated to many slices– SlicePersistence.isReplicated(Object pc)

Commit will not be invoked for a slice if no update exists for that slice


49

Database and Transaction

Slices can be in heterogeneous database platforms– Each slice can use its own JDBC driver

A pseudo (weaker) 2-phase commit protocol


50

Agenda


Using Slice

Under the hood

Running on Slice



51

Core Architectural constructs of OpenJPA

EntityManagerFactory

BrokerFactory

EntityManager

Broker

StoreManager

JDBCStoreManager

JDBC API

OpenJPAConfiguration

creates

creates

delegates delegates

configured by

POJO+

State manager

facade

kernel

storage


52

Slice extends OpenJPA by Distributed Template

EntityManagerFactory

BrokerFactory

EntityManager

Broker

DistributedStoreManager

JDBCStoreManager

JDBC API

JDBCStoreManagerJDBCStore

Manager

DistributedConfiguration

applies Distributed Template Pattern

Not aware of partitioned Databases

applies Distributed Template Pattern




POJO+

State manager+ Slice Moniker

facade

kernel

storage


53

Distributed Template Design Pattern

public class DistributedTemplate<T> implements T, Iterable<T> { protected List<T> _delegates = new ArrayList<T>(); public void add(T t) { _delegates.add(t); } public Iterator<T> iterator() { return _delegates.iterator(); } // execution requires operation-specific merge semantics public boolean execute(String arg0) {

boolean ret = true;for (T t : this) ret = t.execute(arg0) & ret; // merge execution resultreturn ret;

}}

• Similar to Composite


54

Slice applies Distributed Template Design Pattern on OpenJPA/JDBC

• Distributed Template Design Pattern as main metaphor• on JDBC artifacts (Statement, ResultSet)• major OpenJPA artifacts such as StoreManager, Query.


55

Agenda


Using Slice

Under the hood

Running on Slice


56

OpenTrader : OpenJPA/Slice and GWT


57

An example data distribution policy

/** * This distribution policy determines the sector of the stock and * picks the slice at ordinal index of the enumerated Sector. */public class SectorDistributionPolicy implements DistributionPolicy { public String distribute(Object pc, List<String> slices, Object context) { Stock stock = null; if (pc instanceof Tradable) { stock = ((Tradable)pc).getStock(); } else if (pc instanceof Stock) { stock = (Stock)pc; } else if (pc instanceof Trade) { stock = ((Trade)pc).getStock(); } else { throw new IllegalArgumentException(“No policy for “ + pc); } return stock != null ? slices.get(stock.getSector().ordinal()) : null; }}


58

An example query target policy

public static final String MATCH_BID = "select new Match(a,b) from Ask a, Bid b " + "where b = :bid and a.stock.symbol = b.stock.symbol " + "and a.price <= b.price and a.volume >= b.volume " + "and NOT(a.seller = b.buyer) “ + “and a.trade is NULL and b.trade is NULL";

public class SectorBasedQueryTargetPolicy implements QueryTargetPolicy {

public String[] getTargets(String query, Map<Object, Object> params, String language, List<String> slices, Object context) { Stock stock = null; if (TradingService.MATCH_BID.equals(query)) { stock = ((Tradable)params.get("bid")).getStock(); return new String[]{slices.get(stock.getSector().ordinal())}; } return null; }}


59

Future Work

Support a wider notion of heterogeneity– Different mappings to different databases– Mixing data storage technologies


60

Future Work

Dynamic reconfiguration– Adding slices– Removing slices– Availability/Consistency debate

Detection of unsupported queries

Stronger transaction warranty


61

References

Slice Documentation

http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_slice

Article on Slice

http://www.ibm.com/developerworks/java/library/os-openjpa/index.html?ca=drs-

OpenTrader: a case-study on Slice + GWT

http://openjpa.apache.org/samples/opentrader

svn co https://svn.apache.org/repos/asf/openjpa/trunk/openjpa-examples/opentrader


62

Thank You!

Slice for Distributed Persistence (JavaOne 2010)

Technology

d c unit c c c

slice source

target slice

mypolicy slice

history of slice

returned slice

d b unit b c b

policies slice runtime