Managing “Big Data” Application Complexity using CloudGraph ® Scott Cinnamond, TerraMeta Software Inc. http://cloudgraph.org -Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-
17
Embed
Managing "Big Data" Application Complexity with CloudGraph
Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Managing “Big Data” Application Complexity using CloudGraph®
Scott Cinnamond, TerraMeta Software Inc.http://cloudgraph.org
-Analysis and solutions for problems faced by HBase™ and other columnar data store client applications under the ever increasing demand for domain model complexity-
Complexity Increases With Added Data Model Entities
Com
plex
ity(f
or c
olum
nar
data
sto
re c
lient
app
licat
ions
)
#Model Entities / Classes
Why More App Complexity? (with Added Data Model
Entities)
1. Column Mapping Difficult
2. Composite Row Key Mapping, Hashing, Salting and Formatting
3. Persistence Code Development, Refactoring and Maintenance
Typical Column Mapping Strategies
• Hard Coded Names Embedded in Source Code– Not good
• Column Names in Java Constants File(s)– Better, but still really hard coded– Feasible with 5-10 entities, 50 attributes– With 500-1000 entities and 5000+ attributes? Not maintainable
• Custom XML Configuration– Create a “meta model” using, say XML Schema and JAXB– Construct unique names and refer to them in source – Better but application specific ”one off”– Does not solve “state” management challenges
CloudGraph Column Mapping A Standards Based Approach Using SDO and UML
UM
L Nam
e “A
liase
s”
SDO Metadata “Repository”
Data Graph “State”
CloudGraph Statefull Column
Key Factories
Logical Nam
es
(readable)
Physical Names
(terse)
Business Nam
es
Java
byte
[] as
sess
ors
Cachin
g
Object
Poolin
g
Seq
uenc
e M
anag
emen
t
Ent
ity ID
M
appi
ng
Row
Key
M
appi
ng
Mar
shal
ling
Great, Still How Do We Keep Column Names Entirely Out Of CRUD Source
Code?Create | Update | Delete: Read (Query):
CloudGraph SDO API(Service Data Objects)
CloudGraph Query DSL(Domain Specific
Language)
CloudGraph SDO Your complex domain model as a
(create | update | delete) API• Drives all Column Mapping Transparently• Granular Control over Data Graph Edits• Convenient “Create Entity” Factory Methods• Change Tracking Including History• Rich Built In Data Types • 100% Compile Time Checking• Supports Multiple Inheritance Models• Currently Uses PlasmaSDO™
Why More Complexity? 3.) Persistence Code Development,
Refactoring and Maintenance
*Example from UML conversion from XML Schema of BIOXSD - see http://bioxsd.org/**Example from UML adaptation of HL7 POCD/HD000040 Clinical Document ***Example from UML conversion from XML Schema of Chemical Markup Language 2.4 – see http://xml-cml.org
Small Domain Model (e.g. CML 164 Entities) : 95,000 Lines “Average” Custom Domain Model (e.g. 300 Entities): 174,000 Lines
• Project Status– CloudGraph® is currently in private beta testing– Other services for Cassandra, MongoDB and others are under
analysis– See http://cloudgraph.org for contact info and other details
• Licensing– CloudGraph® 0.5.5 Community Edition (CE) is open source
licensed under version 2 of the GNU General Public License• Trademarks
– CloudGraph® is a registered trademark of TerraMeta Software LLC– Java™ is a trademark of Oracle Corporation– HBase™ is a trademark of Apache Software Foundation