Top Banner
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Enterprise NoSQL Converging Analysis and Operations Ken Krupa, Enterprise CTO, MarkLogic
33

Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Enterprise NoSQL Converging Analysis and Operations

Ken Krupa, Enterprise CTO, MarkLogic

Page 2: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

A Brief History: Database Duality

~1990

Star

Schema

EDW

Mainstream

~2000

WWW

Mainstream

1st peak

~2010

Big Data

Main-

stream

~2013

NoSQL

Mainstream

Analytical

Specialization

Operational

Specialization Analysis /

Operations

Gap

Specialization between

analysis and operations

Accelerated by disruptive

IT shifts (e.g. Internet,

Hadoop)

Need for greater

convergence

Page 3: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

Our world is changing …and heterogeneous data is a problem

OLTP Warehouse

Data

Marts ?

Reference

Data

Archives

44 ZB

4.4 ZB

2013 2020

12% Structured 88% Unstructured

Source: IDC

Accelerating the divide?

Page 4: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

THE DATA WAREHOUSES

Page 5: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

Traditional Enterprise Data Warehouse (RDBMS)

Pre-defined schemas

Complex ETL processes

Changes depend on SDLC

EDW Definition: “A subject oriented, nonvolatile,

integrated, time variant collection of data in support of

management's decisions” – Bill Inmon

Page 6: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6

Star Schema Modeling - Waterfall

1. Choose the business process

2. Declare the grain

3. Identify the dimensions

4. Identify the fact

Source:

http://en.wikipedia.org/wiki/Dimensional_modeling

Identify

Model

Integrate

Discover

TIME COST

Page 7: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7

OLTP

Warehouse

Data Marts Archives

“Unstructured”

“ ”

Video Audio

Signals, Logs, Streams

Social

Documents, Messages

{ } Metadata

Search 🔍

Reference Data

View from the Enterprise

Page 8: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

WHAT ABOUT HADOOP

Page 9: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

Hadoop – What You Get

Advantages Gaps

HDFS provides scale and economies of scale

File-based nature allows for greater Variety

Raw data is fine and any shape will do

Schema-on-read possible Map-reduce and YARN enables

massive parallel scaling

Hadoop was designed for batch

processing

Does not support real-time

applications on its own

Requires expertise to configure, deploy

and manage

Has security limitations

On its own, it is not a database

Page 10: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

RDBMS + Hadoop

Still a lot of ETL – RDBMS is still in the

picture

Shortcomings in security and

governance capabilities with Hadoop

Reliance on RDBMS for anything

operational

A mismatch between analytical and

operational aspects

Page 11: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

ENTERPRISE NOSQL

Page 12: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

Flexible Data Model Store and manage JSON, XML, RDF, and Geospatial data with a document-centric,

schema-agnostic database. Pre-requisite modeling not required

Search and Query Built-in search to find answers in documents, relationships, and metadata

Scalability and Elasticity Scale out on commodity hardware, and also scale down

ACID Transactions

MVCC for data consistency and simultaneous read+write

Enterprise-Grade Security Certified, granular security for modern data governance

Enterprise NoSQL

Page 13: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Core Benefits of Enterprise NoSQL

A database more in line with today’s data processing problems and expectations

– Handle all types of data

– Minimize (or eliminate) ETL and data copying

– Scale out on commodity hardware and in the cloud

– Deliver results more quickly

A database that offers opportunities for operational convergence

– Handles mixed workloads (real-time and batch)

– Does not abandon enterprise capabilities – e.g. transactions and security

– A database that is built to integrate with the Big Data ecosystem (e.g. Hadoop

and related)

Page 14: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

More Than Just Query…

What if an analyst could talk back to the data warehouse…?

I found

Something!

Page 15: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

Semantics Enterprise triple store, document store, and database combined

Store and query billions of facts and relationships; infer

new facts

Facts and relationships provide context for better search

Flexible data modeling—integrate and link data from

different sources

Standards-based for ease of use and integration

– RDF, SPARQL, and standard REST interfaces

Page 16: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

Semantics: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object

Jean Dubois : livesIn : Paris

Paris : isIn : France

Page 17: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

Semantics: A New Way to Organize Data

Data is stored in Triples, expressed as: Subject : Predicate : Object

Jean Dubois : livesIn : Paris

Paris : isIn : France

Query with SPARQL gives us simple lookup .. and more!

Find people who live in (a place that's in) France

”Jean Dubois" ”France" livesIn

”Paris" isIn

livesIn

RDF Triples

Page 18: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

Asserting Facts with Semantics

Assert newly discovered information during analysis “I received an email that

about the date of birth”

Decorate with additional items of interest

“Bob has an interest in art”

Assert relationships as they are discovered “Bob knows Alice”

Source: http://www.w3.org/TR/2014/NOTE-rdf11-primer-20140225/

Page 19: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

Data Provenance with Semantics

Data lineage & provenance Utilize PROV-O

The PROV Ontology

W3C Recommendation Expressed with RDF Triples

For example…

prov:wasGeneratedBy prov:wasDerivedFrom prov:wasAttributedTo prov:wasAssociatedWith

Page 20: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

Benefits of Enterprise NoSQL with Semantics

Make analysis conversational

– Using machine-readable standards

Provide even more modeling flexibility

– Ad-hoc facts and relationships

– Richer metadata

Further enable operational/analytical convergence

Page 21: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

HANDLING TIME

Page 22: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

Bi-Temporality

Audits – Preserved history

Regulation and compliance

Risk Management – Financial risk assessment

models need to factor in all history

What were my customer’s credit ratings last

Monday as I knew it last Friday?

A complete history (audit trail) of what you knew

and when you knew it

Page 23: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

Bi-temporality with Enterprise NoSQL

Others Enterprise NoSQL

Other NoSQL: No transactions means no bi-

temporal.

Full ACID transactions with MVCC

RDBMS: Bi-temporal at table level, composed

objects make implementation complex.

Bi-temporal at the object/document level.

Implementations much more straightforward

RBDMS: Bi-temporal data only. What happens when

the schema changes?

For Enterprise NoSQL, schema is data.

RDBMS: Bi-temporal data only. What happens when

the security changes?

Security may also be bi-temporally managed.

RDBMS: Inflexible implementations with respect to

bi-temporal views and clocks.

Flexible implementations based on customer input

and without compromising auditability. Capabilities

such as multi-layered bi-temporality and use of

external transaction clocks possible.

Page 24: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

PUTTING THINGS TOGETHER

Page 25: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

Enterprise NoSQL Operational Data Warehouse

RDF

Discover &

Enrich

Schema-agnostic

Straightforward data integration

Full-text indexing and search

Scale-out infrastructure

Real-time or batch load and analysis

Write back during discovery!

RT Events

or Batch

Load

Page 26: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Getting Noticed

Page 27: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

BEYOND THE DATA WAREHOUSE

Page 28: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30

If only the EDW was the only problem…

SOA

Page 29: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31

Application Centric Architecture

Characterized by:

– Applications that own “their own” data

– Small pockets of “authoritative” sources (e.g. reference data, CRM)

– Data exchange between systems for cross-LoB operations

Resulting in:

– Multiple copies of the same data (even from authoritative sources)

– Diminishing data quality with each copy

But that’s not all…

– Accelerated by SOA (an otherwise good thing)

Page 30: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32

Operational Data Hub

A read & write Database supports more than analysis

Enables a Data Centric Architecture for the Enterprise

Brings all of the data-centric goodness beyond the Data

Warehouse space

– React immediately to important events – e.g. alerts

– Create workflow based on analysis

– Make SOA better, redeem broken implementations

Page 31: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33

Operational Data Hub

Operational

Applications

Bidirectional

analysis of all data

Multi-channel

distribution

Customers Feedback

Warm archives

Read+write real-time DW

Data-centric enterprise

Unified distribution

Direct external feedback

Makes use of Hadoop investment

Semantics plays a key role

Page 32: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34

Convergence

Platform for mixed workloads

– Simultaneous read & write during discovery

– Analytical and Operational functions within the same DB

– React immediately to important events – e.g. alerts

– Create workflow based on analysis

A Data Centric Architecture for the Enterprise

– Bring applications to the data

Bring the flexibility of “Big Data” beyond the Data Warehouse space

– “Three V’s” for running the business

Page 33: Introducing Enterprise NoSQL - Big Data Paris 2020 Ken KRUPA.pdf · Enterprise triple store, document store, and database combined Store and query billions of facts and relationships;

Thank You

[email protected]

@kenkrupa

marklogic.com

world.marklogic.com