© 2010 IBM Corporation Scale Java Persistence API Applications with OpenJPA Slice Pinaki Poddar [email protected]
May 19, 2015
© 2010 IBM Corporation
Scale Java Persistence API Applications with OpenJPA Slice
Pinaki Poddar
© 2010 IBM Corporation
2
Agenda
Core Features of Slice
Using Slice
Under the hood
Running on Slice
Source: If applicable, describe source origin
© 2010 IBM Corporation
3
What is Slice?
Slice is a OpenJPA module for horizontally, partitioned databases
Java Persistence API Specification [Section 3.1] says: “A persistence unit defines the set of all classes that are related or grouped by the application, and which must be colocated in their mapping to a single database.”
Slice changes that… single to multiple
© 2010 IBM Corporation
4
Horizontal Partitioning
A data set D is said to be horizontally partitioned into N partitions D1, D2, , Dn iffD = D1 D2 D3 DnDi Dj = for any i j
A common mathematical term is mutually disjoint sets
Google coined such partition operation as Shard.
© 2010 IBM Corporation
5
Horizontal Partitioning in realistic setup
Natural partitioning scenarios– Customer by region (Telecom Billing)– Transaction by Month (Finance)– Software as Service Platforms (Legal compliance)
© 2010 IBM Corporation
6
History of Slice
Incubated as a Apache Lab project in Jan 2008
Integrated as a OpenJPA module on July 2008 (since version 1.1)
Available with WebSphere Application Server version 7.0 onwards
© 2010 IBM Corporation
7
OpenJPA
•An implementation of JPA Specification•Apache Project since May 2007 http://openjpa.apache.org
© 2010 IBM Corporation
8
Architectural tiers of a typical JPA-based application
User Application
OpenJPA
Standard JPA API
JDBC API
400 million records
© 2010 IBM Corporation
9
Architectural tiers of a Slice-based application
User Application
OpenJPA
Standard JPA API
JDBC API
Slice
OpenJPAis a plugabbleplatform
User-defined DataDistribution Policy
User-defined DataDistribution Policy
4x100 million records
© 2010 IBM Corporation
10
Separate Persistence Unit configured to partitioned databases
CA[1]
DA
Unit A
CA[2] CB[1]
DB
Unit B
CB[2] CC[1]
DC
Unit C
CC[2]
© 2010 IBM Corporation
11
Same persistence unit switches contexts to partitioned databases
CA[1]
DA
Persistence Unit
CA[2] CC[2]CC[1]CB[2]CB[1]
DB DC
© 2010 IBM Corporation
12
Same persistence unit connected to partitioned databases
C[1]
DA
Persistence Unit
C[2]
DB DC
© 2010 IBM Corporation
13
Features of Slice
Slice-based User Application
OpenJPA
Standard JPA API
JDBC API
Slice
No changes toApplication code
No changes toApplication code
User-defined DataDistribution Policy
User-defined DataDistribution Policy
Flexible per-SliceConfiguration
Flexible per-SliceConfiguration
Parallel QueryExecution
Parallel QueryExecution
HeterogeneousDatabases
HeterogeneousDatabases
Master-basedSequence
Master-basedSequence
Targeted Query
Targeted Query
No changes toDomain Model
No changes toDomain Model
User-defined QueryTarget Policy
User-defined QueryTarget Policy
No changes toDatabase Schema
No changes toDatabase Schema
4x100 millon records
© 2010 IBM Corporation
14
Agenda
Core Features of Slice
Using Slice
Under the hood
Running on Slice
Source: If applicable, describe source origin
© 2010 IBM Corporation
15
Using Slice
Decide partition policy to– distribute data – target query
Configure JPA persistence unit
No change to – Application Code (ok, almost!)– Domain Model– Database Schema
© 2010 IBM Corporation
16
Policy based configuration
The design goal of “no application code change” conflicts with the user application’s ability to control
– which slice(s) will store a new instance – which slice(s) will be searched for a query
The compromise solution is policy based callback interfaces– DistributionPolicy– ReplicationPolicy– QueryTargetPolicy– FinderTargetPolicy
User application may implement these policies
Slice runtime would call the policy method
© 2010 IBM Corporation
17
Data Distribution Policy determines where new records are stored
Which slice stores a new data record?
– Only the user application can decide
© 2010 IBM Corporation
18
How to distribute data across slices?
01: EntityManager em = …;
02: em.getTransaction().begin();
03: Person person = new Person();
04: person.setName(“John”);
05: person.setAge(42);
06: Address addr = new Address();
07: addr.setCity(“New York”);
08: person.setAddress(addr);
09: em.persist(person);
10: em.getTransation().commit();
01: public class MyDistributionPolicy implements DistributionPolicy {
02: public String distribute(Object pc, List<String> slices, Object ctx) {
03: return slices.get(((Person)pc).getAge()/20);
04: }
05: }
@Entitypublic class Person { private String name; private int age; @OneToOne (cascade=ALL) private Address address;}
@Entitypublic class Address { private String city;}
Use
r A
pp
licati
on
Domain Classes
Data Distribution Policy
© 2010 IBM Corporation
19
Distribution Policy decides target slice for each instance
public interface DistributionPolicy { /** * Decide the name of the slice where the given persistent * instance would be stored. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier of the slice. This name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String distribute(Object pc, List<String> slices, Object context); }
Slice runtime will call this method while persisting or merging a root instance.The instance and its persistent closure will be stored in the returned slice.
© 2010 IBM Corporation
20
Details on Distribution Policy
a b
em.persist(a);// orem.merge(a);
persistence context
MyPolicy.distribute(a,…) { return “One”;}
OneOne
<property name="openjpa.slice.DistributionPolicy" value=“acme.org.MyPolicy"/>
Slice attaches moniker to managed instance as it enters a persistence context
Slice runtime calls
User Application
CascadeType.PERSIST
CascadeType.MERGE
c One
© 2010 IBM Corporation
21
Slice enforces Collocation Constraint on persistent closure
Persistent Closure of a managed instance x is– The set of instances Cx:{y} where y is reachable from x by
traversal of a relation cascaded as PERSIST or MERGE to an unlimited depth
Persistence Closure Cx, at the time of persist() or merge(), are stored in the same slice– Because Slice can not join across databases
Compliant Domain Models are referred as Constrained Tree Schema
© 2010 IBM Corporation
22
An example domain model
© 2010 IBM Corporation
23
Instances that violate persistent closure must be replicated
StockCSCO
Ask-152
Bid-153
Trader-1
Trader-2
Trade-15 StockGS
Ask-210
Bid-211
Trader-7
Trader-1
Trade-21
slice.One slice.Two
Trader-1 will violate collocation constraint and must be replicatedacross all slices.
Data partitioned by Stock sectors
© 2010 IBM Corporation
24
Replicate master/shared data across slices
Enumerate replicated types in configuration
By default, replicated entities are stored in all slices– or implement ReplicationPolicy
<property name=“openjpa.slice.ReplicatedTypes” value=“domain.Trader”/>
01: public class DefaultReplicationPolicy implements ReplicationPolicy {
02: public String[] replicate(Object pc, List<String> slices, Object ctx) {
03: return slices.toArray();
05: }
Data Replication Policy
© 2010 IBM Corporation
25
Replication Policy
public interface ReplicationPolicy { /** * Decide the name of the slices where the given persistent * instance would be replicated. * * @param pc The newly persistent or to-be-merged object. * @param slices name of the configured slices. * @param context persistence context managing the given instance. * * @return identifier(s) of the slice. Each name must match one of the * configured slice names. * @see DistributedConfiguration#getSliceNames() */ String[] replicate(Object pc, List<String> slices, Object context); }
Slice runtime will call this method while persisting any replicated instance.
© 2010 IBM Corporation
26
Distributed Query
Each query is executed across target slices in parallel
Performance upper bound is the size of the largest partition not the size of the entire dataset.– The query will execute on 100 million instead of 400 million
record
Slice attaches moniker to the selected instance based on its origin
© 2010 IBM Corporation
27
Query Target Policy decides target slice for each query
public interface QueryTargetPolicy { /** * Decide the name of the slice(s) where the given query * will be executed. * * @param query The query string to be executed. * @param params the bound parameters of the query. * @param language the language of the query* @param slices name of the configured slices* @param context persistence context executing the query. * * @return identifier of the target slices. Null value implies* all configured slices. */ String[] getTargets(String query, Map params, String language, List<String> slices, Object context); }
Slice runtime will call this method for every query.
Default policy targets all available slices.
© 2010 IBM Corporation
28
Intrusive way to control target slices for a query
EntityManager em = …;String jpql = "SELECT p FROM Person p where p.name=:name”;
TypedQuery<Person> query1 = em.createQuery(jpql, Person.class);
// Set a single slice as query targetquery1.setHint(SlicePersistence.HINT_TARGET, “One");List<Person> result1 = query1.setParameter(“name”, “XYZ”) .getResultList();
TypedQuery<Person> query2 = em.createQuery(jpql , Person.class);
// Set multiple slices as query targetsquery2.setHint(SlicePersistence.HINT_TARGET, Arrays.asList(new String[]{“One“,”Two”});
List<Person> result2 = query2.setParameter(“name”, “ABC”) .getResultList();
© 2010 IBM Corporation
29
Distributed Query results are appended
Results from individual slices are appended
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
MARY 24 2007
BILL 29 2001
ROB 22 2008
MARY 24 2007
BILL 29 2001
ROB 22 2008
slice1
slice3
slice2
String jpql = “select e from Employee e where e.age > 30”;List<Employee> result = em.createQuery(jpql, Employee.class).getResultList();
© 2010 IBM Corporation
30
Distributed Query results are sorted in-memory
Results from individual slices are sorted across target slices for ORDER BY queries in-memory
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
MARY 24 2007
BILL 29 2001
ROB 22 2008
BILL 29 2001
MARY 24 2007
ROB 22 2007
slice1
slice3
slice2
String jpql = “select e from Employee e where e.age < 30 order by e.name”;List<Employee> result = em.createQuery(jpql, Emloyee.class).getResultList();
© 2010 IBM Corporation
31
Distributed Top-N Query
Top-N Result from each slice is merged (with ordering, if any) for LIMIT BY queries
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
ROB 22 2008
BILL 29 2001
slice1
slice3
slice2
MARY 24 2007
JOHN 35 2001
HARI 31 2002
SHIVA 35 1999
ROB 22 2008
MARY 24 2007
String jpql = “select e from Employee e order by e.age”;List<Employee> result = em.createQuery(jpql, Emloyee.class) .getMaxResult(2).getResultList();
© 2010 IBM Corporation
32
Distributed Top-N Query
Top-N Results from individual slices are appended for LIMIT BY queries without an ORDER BY clause.
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
ROB 22 2008
BILL 29 2001
slice1
slice3
slice2
MARY 24 2007
JOHN 35 2001
HARI 31 2002
SHIVA 35 1999
ROB 22 2008
MARY 24 2007
List result = em.createQuery(“select e from Employee e”) .setMaxResult(2).getResultList();
© 2010 IBM Corporation
33
Targeted Query Query and find() can be targeted to a subset of slices by hints
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
slice1
slice3
slice2
SANDRA 43 1975
JOHN 35 2001
JOSE 41 1987
SHIVA 35 1999
List result = em.createQuery(“SELECT e FROM Employee e WHERE e.age > 34”)
.setHint(“openjpa.slice.Targets”, “slice1,slice3”)
.getResultList();
SANDRA 43 1975
JOHN 35 2001
JOSE 41 1987
SHIVA 35 1999
© 2010 IBM Corporation
34
Aggregate Query
Aggregate results are supported when aggregate operation is commutative to partition
NAME AGE JOIN_YEAR
ROB 22 2008
LEUNG 37 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
JOSE 41 1987
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
slice1
slice3
slice2
78 37 107
22278 37 107
Number sum = em.createQuery(“select sum(e.age) from Employee e where e.age > 30”,
Number.class).getSingleResult();
© 2010 IBM Corporation
35
Distributed Aggregate Query Limitations
Commutativity– ability to change the order of operations without changing
the end result.
SUM() or MAX() is commutative to partition– SUM(D) = SUM(SUM(D1), SUM(D2), SUM(D3))
where Partition(D) = {D1,D2,D3}
But AVG() is not– AVG(D) != AVG(AVG(D1), AVG(D2), AVG(D3))
© 2010 IBM Corporation
36
Aggregate Query
Aggregate results are not supported when aggregate operation is not commutative to partition
NAME AGE JOIN_YEAR
ROB 23 2008
LEUNG 38 2005
BILL 29 2001
NAME AGE JOIN_YEAR
HARI 31 2002
SHIVA 35 1999
NAME AGE JOIN_YEAR
JOHN 35 2001
MARY 24 2007
SANDRA 43 1975
slice1
slice3
slice2
34.0 30.0 32.0
Number sum = em.createQuery(“select avg(e.age) from Employee e”, Number.class)
.getSingleResult();
34.0 + 30.0 32.0=]+ 32.0[ / 3
Wrong!
© 2010 IBM Corporation
37
Query for Replicated Entities
Replicated instances are detected and queried in a single slice
Number sum = (Number)em.createQuery(“SELECT COUNT(c) FROM Coutry c”)
.getSingleResult();
CODE POPULATION
US 300M
GERMANY 82M
INDIA 1200M
CODE POPULATION
US 300M
GERMANY 82M
INDIA 1200M
CODE POPULATION
US 300M
GERMANY 82M
INDIA 1200M
slice1
slice3
slice2
3
3
© 2010 IBM Corporation
38
<?xml version="1.0" encoding="UTF-8"?><persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"> <persistence-unit name="test“ transaction=“RESOURCE_LOCAL”> <provider>org.apache.openjpa.persistence.PersistenceProviderImpl</provider> <class>domain.EntityA</class> <class>domain.EntityB</class> <properties> <property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.ConnectionURL" value="jdbc:mysql://localhost/test"/> <property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> <property name="openjpa.Log" value="SQL=TRACE"/> </properties> </persistence-unit>
META-INF/persistence.xml configures a persistence unit
List of knownPersistent types
Vendor-specific configuration
Governed by XML Schema
JPA Provideris pluggable
Identified byUnit Name
© 2010 IBM Corporation
39
Activate Slice through configuration
<property name="openjpa.BrokerFactory" value=“slice"/>
• Mandatory configuration
• Activates a specialized EntityManagerFactory
© 2010 IBM Corporation
40
Each slice is referred by a moniker
<property name=“openjpa.slice.Names” value=“One,Two,Three”/>
• Optional (but recommended) configuration
• Associates mnemonics to physical slices
© 2010 IBM Corporation
41
Identify a Master slice
<property name=“openjpa.slice.Master” value=“One”/>
• Optional (but recommended) configuration
• Identifes a master slice for identity generation
© 2010 IBM Corporation
42
<property name="openjpa.slice.One.ConnectionURL“ value="jdbc:mysql://localhost/slice1"/>
<property name=“openjpa.slice.Two.ConnectionURL” value=“jdbc:mysql://localhost/slice2”/>
Specify physical slice connection details
• Mandatory configuration
• Specifies physical connection for each slice
• Property name prfixed by the slice moniker
Monikerfor a slice
© 2010 IBM Corporation
43
Slices can share common properties
<property name=“openjpa.slice.Names” value=“One,Two,Three”/>
<property name="openjpa.ConnectionDriverName“
value=" com.mysql.jdbc.Driver"/>
<property name=“openjpa.slice.Three.ConnectionDriverName“
value=“com.ibm.db2.jcc.DB2Driver”/>
Properties can be shared
• unless overwritten for a specific slice
© 2010 IBM Corporation
44
Ignoring unavailable slices
<property name=“openjpa.slice.Lenient” value=“true”/>
• Optional configuration
• Ignores any unreachable slice
© 2010 IBM Corporation
45
Configuration Rules
Each slice is identified by a moniker
All monikers should be explicitly declared in openjpa.slice.Names– Though implicit declaration is allowed
• openjpa.slice.XYZ.abc declares a slice with moniker XYZ
A master slice is either configured by openjpa.slice.Master property– Or automatically detected by convention/heuristic as the first
slice
Each slice must be configured with database URL
© 2010 IBM Corporation
46
Configuration Rules (continued)
Other properties can be shared
Each slice property defaults to common configuration– If openjpa.slice.XYZ.abc is unspecified, then abc defaults to openjpa.abc property
© 2010 IBM Corporation
47
A complete example of Slice Configuration <properties>
<property name="openjpa.BrokerFactory" value=“slice"/>
<property name=“openjpa.slice.Names” value=“One,Two,Three”/> <property name=“openjpa.slice.Master” value=“One”/>
<property name="openjpa.ConnectionDriverName" value="com.mysql.jdbc.Driver"/> <property name="openjpa.slice.One.ConnectionURL" value="jdbc:mysql://mac1:3456/slice1"/> <property name=“openjpa.slice.Two.ConnectionURL” value=“jdbc:mysql://mac2:5634/slice2”/>
<property name=“openjpa.slice.Three.ConnectionDriverName” value=“com.ibm.db2.jcc.DB2Driver”/> <property name=“openjpa.slice.Three.ConnectionURL” value=“jdbc:db2://mac3:50000/slice3”/>
<property name="openjpa.slice.DistributionPolicy" value=“acme.org.MyDistroPolicy"/>
<property name="openjpa.jdbc.SynchronizeMappings" value="buildSchema"/> </properties> </persistence-unit>
META-INF/persistence.xml
Activate Slice
Declare slices
Configure each slice
Configure common behavior
Define Data Distribution Policy
© 2010 IBM Corporation
48
Updates
Slice remembers original slice of each instance. – SlicePersistence.getSlice(Object pc) returns the
logical slice name for the given argument.
If an instance is modified then the update occurs in the original slice.
Replicated instances are updated to many slices– SlicePersistence.isReplicated(Object pc)
Commit will not be invoked for a slice if no update exists for that slice
© 2010 IBM Corporation
49
Database and Transaction
Slices can be in heterogeneous database platforms– Each slice can use its own JDBC driver
A pseudo (weaker) 2-phase commit protocol
© 2010 IBM Corporation
50
Agenda
Core Features of Slice
Using Slice
Under the hood
Running on Slice
Source: If applicable, describe source origin
© 2010 IBM Corporation
51
Core Architectural constructs of OpenJPA
EntityManagerFactory
BrokerFactory
EntityManager
Broker
StoreManager
JDBCStoreManager
JDBC API
OpenJPAConfiguration
creates
creates
delegates delegates
configured by
POJO+
State manager
facade
kernel
storage
© 2010 IBM Corporation
52
Slice extends OpenJPA by Distributed Template
EntityManagerFactory
BrokerFactory
EntityManager
Broker
DistributedStoreManager
JDBCStoreManager
JDBC API
JDBCStoreManagerJDBCStore
Manager
DistributedConfiguration
applies Distributed Template Pattern
Not aware of partitioned Databases
applies Distributed Template Pattern
OpenJPAConfiguration
OpenJPAConfiguration
OpenJPAConfiguration
POJO+
State manager+ Slice Moniker
facade
kernel
storage
© 2010 IBM Corporation
53
Distributed Template Design Pattern
public class DistributedTemplate<T> implements T, Iterable<T> { protected List<T> _delegates = new ArrayList<T>(); public void add(T t) { _delegates.add(t); } public Iterator<T> iterator() { return _delegates.iterator(); } // execution requires operation-specific merge semantics public boolean execute(String arg0) {
boolean ret = true;for (T t : this) ret = t.execute(arg0) & ret; // merge execution resultreturn ret;
}}
• Similar to Composite
© 2010 IBM Corporation
54
Slice applies Distributed Template Design Pattern on OpenJPA/JDBC
• Distributed Template Design Pattern as main metaphor• on JDBC artifacts (Statement, ResultSet)• major OpenJPA artifacts such as StoreManager, Query.
© 2010 IBM Corporation
55
Agenda
Core Features of Slice
Using Slice
Under the hood
Running on Slice
© 2010 IBM Corporation
56
OpenTrader : OpenJPA/Slice and GWT
© 2010 IBM Corporation
57
An example data distribution policy
/** * This distribution policy determines the sector of the stock and * picks the slice at ordinal index of the enumerated Sector. */public class SectorDistributionPolicy implements DistributionPolicy { public String distribute(Object pc, List<String> slices, Object context) { Stock stock = null; if (pc instanceof Tradable) { stock = ((Tradable)pc).getStock(); } else if (pc instanceof Stock) { stock = (Stock)pc; } else if (pc instanceof Trade) { stock = ((Trade)pc).getStock(); } else { throw new IllegalArgumentException(“No policy for “ + pc); } return stock != null ? slices.get(stock.getSector().ordinal()) : null; }}
© 2010 IBM Corporation
58
An example query target policy
public static final String MATCH_BID = "select new Match(a,b) from Ask a, Bid b " + "where b = :bid and a.stock.symbol = b.stock.symbol " + "and a.price <= b.price and a.volume >= b.volume " + "and NOT(a.seller = b.buyer) “ + “and a.trade is NULL and b.trade is NULL";
public class SectorBasedQueryTargetPolicy implements QueryTargetPolicy {
public String[] getTargets(String query, Map<Object, Object> params, String language, List<String> slices, Object context) { Stock stock = null; if (TradingService.MATCH_BID.equals(query)) { stock = ((Tradable)params.get("bid")).getStock(); return new String[]{slices.get(stock.getSector().ordinal())}; } return null; }}
© 2010 IBM Corporation
59
Future Work
Support a wider notion of heterogeneity– Different mappings to different databases– Mixing data storage technologies
© 2010 IBM Corporation
60
Future Work
Dynamic reconfiguration– Adding slices– Removing slices– Availability/Consistency debate
Detection of unsupported queries
Stronger transaction warranty
© 2010 IBM Corporation
61
References
Slice Documentation
http://openjpa.apache.org/builds/latest/docs/manual/manual.html#ref_guide_slice
Article on Slice
http://www.ibm.com/developerworks/java/library/os-openjpa/index.html?ca=drs-
OpenTrader: a case-study on Slice + GWT
http://openjpa.apache.org/samples/opentrader
svn co https://svn.apache.org/repos/asf/openjpa/trunk/openjpa-examples/opentrader
© 2010 IBM Corporation
62
Thank You!