Transcript
NoSQL: The Challenges Beyond Multi-Model and Integrating into Big Data Applications
Presenters:
Matthew Aslett, Research Director, 451 Research• NoSQL Beyond Polyglot
Persistence
Peter Coppola, VP Product & Marketing, Basho Technologies• How a Data Platform solves the
challenges of integrating NoSQL into Big Data applications
NoSQL: Beyond polyglot persistence
Matthew Aslett, research director
451 Research is an information technology research & advisory companyFounded in 2000
210+ employees, including over 100 analysts
1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers
12,500+ senior IT professionals in our research community
Over 52 million data points each quarter
4,500+ reports published each year covering 2,000+ innovative technology & service providers
Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C.
451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group
Research & Data
Advisory Services
Events
4
Copyright (C) 2015 451 Research LLC
5
The birth of NoSQL• The genesis of much – although by no means all – of the momentum
behind the NoSQL database movement can be attributed to two research papers:
• Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006
• Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007
• The term itself was coined by Johan Oskarsson as the name for a June 2009 meeting of developers, users and others interested in a group of loosely related data technologies
SPRAINED RELATIONAL DATABASES
Photo credit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/
The traditional relational database has been stretched beyond its normal capacity by the needs of high-volume, highly distributed or highly complex applications.
Scalability Performance Relaxed consistency Increased willingness to look
towards Agility emerging alternatives Intricacy Necessity
Database SPRAIN
7
The traditional relational database has been stretched beyond its normal capacity by the needs of high-volume, highly distributed or highly complex applications.
Scalability Performance Relaxed consistency A diverse array of NoSQL projects Agility serving a range of use-cases Intricacy Necessity
Database SPRAIN
8
114
Relational zone
Non-relationalzone
Lotus Notes
Objectivity
MarkLogic
InterSystemsCaché
McObject
Starcounter
ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
RiakCouchbase
Splice Machine
Actian IngresSAP Sybase ASE
EnterpriseDB
SQL Server
MySQL
InformixMariaDB
SAP HANA
IBMDB2
Database.com
ClearDB
Google Cloud SQL
RackspaceCloud Databases
AWS RDS
Azure SQLDatabase
HP CloudRelational Database
StormDB
Hadapt Teradata Aster
HPCC
Cloudera
Azure Data Lake
MapR IBM BigInsights
Zettaset
NGDATA
InfochimpsMetascale
Rackspace
Qubole
Voldemort
Aerospike
Teradata
IBM PureDatafor Analytics/dashDB
Pivotal GreenplumHP Vertica
SAP Sybase IQ
IBM InfoSphere
Actian Vector
XtremeData
Kx Systems
Exasol
Actian Matrix
ParStreamTokuDB
ScaleDB
ScaleArc
ContinuentTransLattice
NuoDB
Drizzle
JustOneDB
Pivotal GemFire XD
Galera
ScaleBase
Clustrix
Tesora DVE
MemSQL
DatomicUrika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDBTrinity
MemCachier
Redis LabsMemcached Cloud
FairCom
BitYota
IronCache
Grid/cache zoneMemcached
Ehcache
ScaleOutSoftware
IBM eXtreme
ScaleOracle
Coherence
GigaSpaces XAPApache Ignite
PivotalGemFire
CloudTran
InfiniSpan
Hazelcast
OracleExalytics
OracleDatabase
MySQL Cluster
Oracle Endeca Server Attivio
LucidWorksBig Data
Lucene/Solr
IBM InfoSphere Data Explorer
TowardsE-discovery
Towardsenterprise search
DocumentumxDB
TaminoXML Server
Ipedo XMLDatabase
ObjectStore
LucidDB
MonetDB
Metamarkets Druid
Apache Spark
AWSElastiCache
FirebirdSQLite
Oracle TimesTensolidDB
Adabas
IBM IMS
UniData
UniVerse
WakandaDB
Altiscale
Oracle Big Data Appliance
OrientDB
Sparksee
Doopex
TreasureData
PostgreSQLPercona Server
vFabric Postgres
© 2015 by 451 Research LLC. All rights reserved
HyperDex
TIBCOActiveSpaces
SAP Sybase SQL Anywhere
JethroData
CitusDB
PivotalHD/HAWQ
BigMemory
ActianVersant
DataStaxEnterprise
DeepEnigine
Infobright
FatDB
Google CloudDatastore
HerokuPostgres
GrapheneDBInstacluster
Hypertable
BerkeleyDB
SqrrlEnterprise
AzureHDInsight
HPAutonomy
OracleExadata
IBM PureData
IBMBig SQL
ClouderaImpala
ApacheDrill
Presto
MicrosoftSQL Server
PDW
ApacheTajo
ApacheHive
MammothDB
Altibase HDB
LogicBlox
SRCH2
TIBCOLogLogic
Splunk
TowardsSIEM
Loggly SumoLogicLogentries
InfiniSQL
JumboDB
Actian PSQL
Progress OpenEdge
Kognitio
Altibase XDB
CenturyLink
IBM SoftlayerJoyent
xPlenty
Stardog
MariaDB Enterprise
Apache StormApache S4
IBMInfoSphereStreams
TIBCOStreamBase
DataTorrent
AWSKinesis
Feedzai
GuavusLokad
SQLStream
Software AG
Key: General purposeSpecialist analytic
BigTablesGraphDocumentKey value stores
-as-a-Service
Key value direct accessHadoop
MySQL ecosystem
Advanced clustering/shardingNew SQL databases
Data caching
Data grid
Search
Appliances
In-memory
Stream processing
OpenStack Trove
1010dataGoogle BigQuery
AWSRedshift
TempoIQInfluxDBWebScaleSQL
MySQLFabricSpider
2
E
D
A
B
C
T-Systems
E
D
A
B
C
2 43 5
SQream
SpaceCurve
Postgres-XL
Google Cloud Dataflow
Trafodion Hadapt
AzureSearch
Red Hat JBossData Grid
654
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
1 3
1 6
Data PlatformsMapJune 2015
https://451research.c
om/dashboard/dpa
CockroachDB
AWS DynamoDB AWS SimpleDB
Redis LabsRedis Cloud
RedisGreen
AWS ElastiCachewith Redis
MagnetoDB
ObjectRocketwith Redis
TokuMX
VoltDB
CortexDB
CodeFutures
Oracle Big Data Cloud
AWSEMR
StratioTeradata Cloud
for Hadoop
MapR-DB
Snowflake
Cloudant Local GridGain In-Memory Data Fabric
Databricks
Apache Hadoop
MongoDirector
Redis-to-go
GraphHost
Redis LabsEnterprise Cluster
Azure Redis Cache
Azure ManagedCache Service
Azure In-Role Cache
SciDB AsterixDB Apache FlinkData Artisans
BrytlytMapD
Modulus
Elasticsearch
ElasticFound
OrchestrateHP NonStop SQL
Crate
Titan
TesoraDBaaS
AWS Aurora
MariaDB MaxScale
Azure SQLData Warehouse
Hortonworks
Ontotext GraphDB
Google CloudBigTable
The NoSQL database landscape
10
MarkLogic ArangoDB
Neo4J
InfiniteGraph
Apache CouchDB
Oracle NoSQL
Redis
Handlersocket
RavenDB
RethinkDB
LevelDB
Apache Accumulo
Apache Cassandra
Apache HBase
RiakCouchbase
Voldemort
Aerospike
Urika-GD
FlockDB
Allegrograph
HypergraphDB
AffinityDB
OrientDB
Sparksee
HyperDex
DataStaxEnterprise
FatDB
Google CloudDatastore
GrapheneDBInstacluster
Hypertable
BerkeleyDB
SqrrlEnterprise
JumboDB
Stardog
MongoDB
Cloudant
Iris Couch
MongoLab
Compose
ObjectRocket
CloudBird
Azure DocumentDB
AWS DynamoDB AWS SimpleDB
Redis LabsRedis Cloud
RedisGreen
AWS ElastiCachewith Redis
MagnetoDB
ObjectRocketwith Redis
TokuMX
CortexDB
MapR-DB
Cloudant Local
MongoDirector
Redis-to-go
GraphHost
Redis LabsEnterprise Cluster
Azure Redis Cache
Modulus
Orchestrate
Google CloudBigTable
TitanTrinity
Ontotext GraphDB
The idea that different data storage models have their own strengths and should be used in combination to solve the various data processing needs of a complex application.
Polyglot persistence
Wide-column
Data is mapped by a row key, column key and time stamp.
Key Value
Store keys and associated values.
Graph
Store data and the relationships between data.
Document
Store all data related to a specific key as a single document.
DATA MODEL COMPLEXITY
11
Polyglot persistence
Wide-columnKey Value GraphDocument
12
Polyglot persistence
Wide-columnKey Value GraphDocument
13
Search Analytics Cache
Multi-model
Wide-column stores
Key Value GraphDocument stores
14
Search Analytics Cache
Multi-model databases
Support a combination of the various individual NoSQL data models - avoid operational complexity- maintain developer agility
Multi-model
Wide-column stores
Key Value GraphDocument stores
15
Search Analytics Cache
Multi-model databases
Multi-model
Wide-column stores
Key Value GraphDocument stores
16
Search Analytics Cache
Multi-model databases
Multi-model data platform
Wide-columnKey Value GraphDocument
17
Search Analytics Cache
Thank You!matthew.aslett@451research.com@maslettwww.451research.com
Delivering on a Data PlatformPeter CoppolaVP, Product & Marketing
THE EVOLUTION OF NOSQL
UnstructuredData Platforms
Multi-Model Solutions
Point Solutions
Basho Technologies | 20CONFIDENTIAL
42% of database decision makers admit they
struggle to manage the NoSQL solutions deployed in their environments”
COMPLEX TECHNOLOGY STACK
Riak
Spark
Basho Technologies | 21
OUR CUSTOMERS ARE INTEGRATINGNoSQL, Caching, Real-time Analytics and Search
Basho Technologies | 22
Big data, hybrid cloud architectures and IoT require developers to integrate, replicate and synchronize information across functionsMac Devine
Vice President and CTO IBM Cloud Services
Enterprises building Big Data, IoT and Hybrid Cloud applications are struggling with complexity
Distributed workload challenges: availability, scale and geo-location
Proliferation of data models: Key-Value, In-Memory, Document, etc.
High costs to ensure data accuracy: replication, synchronization and integration
High operational costs: architectural and management simplicity & efficiency
Lack of available developer expertise
Big DataHybrid Cloud
IoT
Database(s)
Storage
Caches Analytics Queues SearchLog
Mgmt.
Current Operational Challenges
• Managing separate clusters for Riak KV, Redis and Spark
• Manually synchronizing data across the applications
• Using Zookeeper for Spark cluster management
• Manually sharding data in Redis
• Manually managing failures of Redis instances
Customers manually integrating
Big data applications like ours need to integrate and then deploy many different technology components
Martin DaviesCEO of Technology
BASHO DATA PLATFORM
Basho Technologies | 26
SERVICEINSTANCES
STORAGEINSTANCES
Solr
SparkRedis
(Caching)Solr
ElasticSearch
Web Services3rd Party Web
Services & Integrations
RiakKey/Value
Riak Object Storage
Riak Coming Soon
Document Store
Columnar Graph
Replication & Synchronization
MessageRouting
Cluster Management &
Monitoring
Logging &Analytics
Internal Data Store
CORE SERVICES
CONFIDENTIAL
BASHO DATA PLATFORM
Data Replication and SynchronizationReplicate and synchronize data across and between storage instances and service instances to ensure data accuracy with no data loss and high availability.
Cluster Management Integrated cluster management automates deployment and configuration of Riak KV, Riak S2, Spark and Redis. Once deployed in production, auto-detect issues and restart Redis instances or Spark clusters. Cluster management eliminates the need for Zookeeper.
Internal Data StoreA built-in, distributed data store for ensuring speed, fault-tolerance and ease-of-operations is used to persist static and dynamic configuration data (port number and IP address) across the Basho Data Platform.
Message RoutingA high-throughput, distributed message system for speed, scalability and high availability. This message system will have the ability to persist and route messages across platform clusters.
Logging and AnalyticsEvent logs provide valuable information that can facilitate the enhanced tuning of clusters and accurately analyze dataflow across the cluster
Core Services
BASHO DATA PLATFORM: SERVICE INSTANCES
Apache Spark Add-OnZookeeper not required
Real-Time Analytics• Move data from Riak KV
to Spark for batch and real-time analytics and store results back in Riak KV for future processing
• Cluster management eliminates the need for Zookeeper
Redis Add-OnAvailability w/ auto-sharding
Integrated Caching• Redis is now
Enterprise grade with high availability, data synchronization with Riak KV and cluster management
• Automatic data sharding across multiple cache servers simplifies operations
Apache Solr Add-OnQuery like Solr
Enriched Search • Powerful full-text
search of Solr with the availability and scalability of Riak KV
• As data changes, search indexes are automatically synchronized
BASHO DIFFERENCE
• Ease of Scale• Optimized for High Availability• Data Correctness• Solving data distribution
challenge• Operational Simplicity
Basho Technologies | 29CONFIDENTIAL
We are excited that Basho is stepping forward and simplifying our daunting technology stack
Jason OrdwayCTO
Basho Technologies | 30
RIAK DEPLOYED WORLDWIDE
QUESTIONS?
top related