© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Enterprise NoSQL Converging Analysis and Operations
Ken Krupa, Enterprise CTO, MarkLogic
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
A Brief History: Database Duality
~1990
Star
Schema
EDW
Mainstream
~2000
WWW
Mainstream
1st peak
~2010
Big Data
Main-
stream
~2013
NoSQL
Mainstream
Analytical
Specialization
Operational
Specialization Analysis /
Operations
Gap
Specialization between
analysis and operations
Accelerated by disruptive
IT shifts (e.g. Internet,
Hadoop)
Need for greater
convergence
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
Our world is changing …and heterogeneous data is a problem
OLTP Warehouse
Data
Marts ?
Reference
Data
Archives
44 ZB
4.4 ZB
2013 2020
12% Structured 88% Unstructured
Source: IDC
Accelerating the divide?
THE DATA WAREHOUSES
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
Traditional Enterprise Data Warehouse (RDBMS)
Pre-defined schemas
Complex ETL processes
Changes depend on SDLC
EDW Definition: “A subject oriented, nonvolatile,
integrated, time variant collection of data in support of
management's decisions” – Bill Inmon
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
Star Schema Modeling - Waterfall
1. Choose the business process
2. Declare the grain
3. Identify the dimensions
4. Identify the fact
Source:
http://en.wikipedia.org/wiki/Dimensional_modeling
Identify
Model
Integrate
Discover
TIME COST
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
OLTP
Warehouse
Data Marts Archives
“Unstructured”
“ ”
Video Audio
Signals, Logs, Streams
Social
Documents, Messages
{ } Metadata
Search 🔍
Reference Data
View from the Enterprise
WHAT ABOUT HADOOP
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
Hadoop – What You Get
Advantages Gaps
HDFS provides scale and economies of scale
File-based nature allows for greater Variety
Raw data is fine and any shape will do
Schema-on-read possible Map-reduce and YARN enables
massive parallel scaling
Hadoop was designed for batch
processing
Does not support real-time
applications on its own
Requires expertise to configure, deploy
and manage
Has security limitations
On its own, it is not a database
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
RDBMS + Hadoop
Still a lot of ETL – RDBMS is still in the
picture
Shortcomings in security and
governance capabilities with Hadoop
Reliance on RDBMS for anything
operational
A mismatch between analytical and
operational aspects
ENTERPRISE NOSQL
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric,
schema-agnostic database. Pre-requisite modeling not required
Search and Query Built-in search to find answers in documents, relationships, and metadata
Scalability and Elasticity Scale out on commodity hardware, and also scale down
ACID Transactions
MVCC for data consistency and simultaneous read+write
Enterprise-Grade Security Certified, granular security for modern data governance
Enterprise NoSQL
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
Core Benefits of Enterprise NoSQL
A database more in line with today’s data processing problems and expectations
– Handle all types of data
– Minimize (or eliminate) ETL and data copying
– Scale out on commodity hardware and in the cloud
– Deliver results more quickly
A database that offers opportunities for operational convergence
– Handles mixed workloads (real-time and batch)
– Does not abandon enterprise capabilities – e.g. transactions and security
– A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop
and related)
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
More Than Just Query…
What if an analyst could talk back to the data warehouse…?
I found
Something!
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
Semantics Enterprise triple store, document store, and database combined
Store and query billions of facts and relationships; infer
new facts
Facts and relationships provide context for better search
Flexible data modeling—integrate and link data from
different sources
Standards-based for ease of use and integration
– RDF, SPARQL, and standard REST interfaces
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
Semantics: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object
Jean Dubois : livesIn : Paris
Paris : isIn : France
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
Semantics: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object
Jean Dubois : livesIn : Paris
Paris : isIn : France
Query with SPARQL gives us simple lookup .. and more!
Find people who live in (a place that's in) France
”Jean Dubois" ”France" livesIn
”Paris" isIn
livesIn
RDF Triples
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
Asserting Facts with Semantics
Assert newly discovered information during analysis “I received an email that
about the date of birth”
Decorate with additional items of interest
“Bob has an interest in art”
Assert relationships as they are discovered “Bob knows Alice”
Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
Data Provenance with Semantics
Data lineage & provenance Utilize PROV-O
The PROV Ontology
W3C Recommendation Expressed with RDF Triples
For example…
prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
Benefits of Enterprise NoSQL with Semantics
Make analysis conversational
– Using machine-readable standards
Provide even more modeling flexibility
– Ad-hoc facts and relationships
– Richer metadata
Further enable operational/analytical convergence
HANDLING TIME
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
Bi-Temporality
Audits – Preserved history
Regulation and compliance
Risk Management – Financial risk assessment
models need to factor in all history
What were my customer’s credit ratings last
Monday as I knew it last Friday?
A complete history (audit trail) of what you knew
and when you knew it
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
Bi-temporality with Enterprise NoSQL
Others Enterprise NoSQL
Other NoSQL: No transactions means no bi-
temporal.
Full ACID transactions with MVCC
RDBMS: Bi-temporal at table level, composed
objects make implementation complex.
Bi-temporal at the object/document level.
Implementations much more straightforward
RBDMS: Bi-temporal data only. What happens when
the schema changes?
For Enterprise NoSQL, schema is data.
RDBMS: Bi-temporal data only. What happens when
the security changes?
Security may also be bi-temporally managed.
RDBMS: Inflexible implementations with respect to
bi-temporal views and clocks.
Flexible implementations based on customer input
and without compromising auditability. Capabilities
such as multi-layered bi-temporality and use of
external transaction clocks possible.
PUTTING THINGS TOGETHER
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
Enterprise NoSQL Operational Data Warehouse
RDF
Discover &
Enrich
Schema-agnostic
Straightforward data integration
Full-text indexing and search
Scale-out infrastructure
Real-time or batch load and analysis
Write back during discovery!
RT Events
or Batch
Load
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28
Getting Noticed
BEYOND THE DATA WAREHOUSE
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
If only the EDW was the only problem…
SOA
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
Application Centric Architecture
Characterized by:
– Applications that own “their own” data
– Small pockets of “authoritative” sources (e.g. reference data, CRM)
– Data exchange between systems for cross-LoB operations
Resulting in:
– Multiple copies of the same data (even from authoritative sources)
– Diminishing data quality with each copy
But that’s not all…
– Accelerated by SOA (an otherwise good thing)
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
Operational Data Hub
A read & write Database supports more than analysis
Enables a Data Centric Architecture for the Enterprise
Brings all of the data-centric goodness beyond the Data
Warehouse space
– React immediately to important events – e.g. alerts
– Create workflow based on analysis
– Make SOA better, redeem broken implementations
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33
Operational Data Hub
Operational
Applications
Bidirectional
analysis of all data
Multi-channel
distribution
Customers Feedback
Warm archives
Read+write real-time DW
Data-centric enterprise
Unified distribution
Direct external feedback
Makes use of Hadoop investment
Semantics plays a key role
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34
Convergence
Platform for mixed workloads
– Simultaneous read & write during discovery
– Analytical and Operational functions within the same DB
– React immediately to important events – e.g. alerts
– Create workflow based on analysis
A Data Centric Architecture for the Enterprise
– Bring applications to the data
Bring the flexibility of “Big Data” beyond the Data Warehouse space
– “Three V’s” for running the business