Top Banner
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab
43

SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Mar 04, 2018

Download

Documents

vuongdien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

SQL, NoSQL, and Next Generation DBMSs

Shahram Ghandeharizadeh

Director of the USC Database Lab

Page 2: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Outline

A brief history of DBMSs.

1960/70 1980+

OSs SQL

2000+

NoSQL

Page 3: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Before Computers

Database

DBMS/Data Store

Page 4: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Digital Era

Database

File System/

Data Store

0011101000000101110101

Page 5: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Application

programs

Before DBMSs: 1960/70s

Data

Data

Application

programs

Developer 1

Developer 2

Page 6: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Application

programs

After DBMSs

Application

programs

Developer 1

Developer 2

DBMS

Physical Data Independence.

SQL as a “what”-oriented language.

Page 7: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

SQL Data Stores

Manage records/tuples

A record/tuple is a row in a table where attribute names are pre-defined in a schema.

Alternative physical designs:

Column-store versus Row-store.

Transactions with ACID properties

Page 8: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,
Page 9: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

SQL IS OVERHYPED

Page 10: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Why?

Marketing campaigns have become too exaggerated!

Relational vendors claim RDBMS is the answer to all data management needs.

What are some counter examples?

Seltzer. Beyond Relational Databases. Communications of the ACM, July 2008.

Page 11: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Web Search

Semi-structured data

HTML pages instead of raw data.

Queries are keyword lookups and the desired response is a sorted list of possible answers.

Need for efficient inverted indices.

Bulk updates, read mostly.

Need for nontraditional indexing.

Page 12: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Directory Services International organizations with distributed

resources and personnel. Requirement: fast lookup of entities arranged in

a hierarchical structure that corresponds to a hierarchy of the organization.

LDAP standard. Core of identification and authentication system

from a number of vendors, e.g., IBM Tivoli, Microsoft Active Directory Server, SUN ONE Directory Server.

Bulk updates similar to data warehousing.

Multi-valued attributes.

Queries are single-row retrieval or lookups based on attribute values.

Page 13: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Other Examples

Mobile device caching

Your cell phone’s directory as a transient cache of a global directory.

Stream management

Real-time filtering of streams for interesting patterns. Example: identify hotly traded stock, or a stock that is not traded as heavily as expected.

Filters look like SQL selection predicates, causing developers to mistake a RDBMS as the right choice.

XML management

Page 14: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Summary Relational DBMS have been designed for transaction

processing and workloads consisting of ad hoc queries and significant amount of updates. 25 years ago, One market for DBMS: Business data

processing. This has changed to include different applications with different requirements.

Example applications are read-dominated: No need for transactional guarantees.

SQL is the wrong choice for stream processing.

One software architecture will not support the diverse needs of these applications. Possible solutions: 1) each application re-builds its own storage manager from

scratch,

2) provide a flexible solution that can be tailored to the needs of a particular application.

Page 15: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Past 25 Years

Two trends:

1. Bloated systems.

Need for a specialist, a trained DBA, to keep a system and its applications running.

2. Few applications need all the features available in today’s RDBMSs.

The application must pay for all the features even though it requires a small subset.

Page 16: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

NOSQL DATA STORES

Page 17: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

NoSQL Data Stores

Scale horizontally for “simple operations” using many servers.

Replicate and distribute (partition) data across many servers.

Provide a simple call level interface or protocol.

A weaker concurrency model than ACID:

Basically Available, Soft state, Eventually consistent (BASE).

Efficient use of distributed indexes and DRAM for data storage.

Ability to dynamically add new attributes to data records.

Cattell. Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), 2010.

Ghandeharizadeh, Boghrati, and Barahmand. An Evaluation of Graph Data Models. TPCTC 2014.

Page 18: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

NoSQL Data Model A “key-value” store:

A distributed hash table,

A key/value may be an arbitrary sequence of bytes,

E.g., memcached, Voldemort, Riak, Redis, Tokyo Cabinet, Membase, Membrain.

A “document” store:

A value may be a scalar, lists, nested documents,

Attribute names might be dynamically defined at runtime,

E.g., SimpleDB, CouchDB, MongoDB, Terrastore.

An “Extensible record” store:

A hybrid between a SQL store and a document store,

Families of attributes are defined in a schema and new attributes can be added,

Attributes may be list-valued,

E.g., BigTable, HBase, HyperTable, Cassandra, PNUTs.

Page 19: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

MIDDLEWARE: CACHE AUGMENTED DATA STORES

Page 20: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Simple Operations Operations that read and write a small amount of

data.

Challenge: High volume of requests with a low latency requirement.

Person-to-person service providers in 1 Minute:

147K page views

100M queries 7K user visits

347K Tweets

Facebook, http://thenextweb.com/facebook/2014/10/28/facebook-1-35-billion-users/

Google, http://expandedramblings.com/index.php/google-plus-statistics/

Twitter, https://about.twitter.com/company

Wikipedia, http://stats.wikimedia.org/EN/Sitemap.htm

Page 21: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

How?

Look up query result instead of query processing.

Ideal for applications with workloads that exhibit a high read to write ratio.

Key-value store as the cache manager.

Query result caching:

Key: query string, Value: result set

Trillions of cached key-value pairs.

Page 22: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Cache Augmented DBMSs

1. Value = Get (Key)

2. If Value is found, go to Step 6.

3. SQL queries

4. Query results Application

constructs Value using the results

5. Put(Key, Value)

6. Use Value to generate HTML result page

RDBMS

Server

Cache

Server

(KVS,

e.g., memcached)

1 23

54

Page 23: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

CADBMS: Update

1. SQL DML Command: Insert, Delete, Update

2. Invalidate key-value pairs: Delete

Alternatives to invalidate include Refill/Refresh and incremental update

RDBMS

Server

21

Cache

Server

(KVS,

e.g., memcached)

Page 24: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Developer 1

Developer 2Data

Store

memcached

Cache

Server

Application

programs

Persistent

Data

In-memory

Copy of

Data

Application

programs

Stale

CADBMS Today

Page 25: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Physical Data Independence.

A “what”-oriented language.

Future CADBMSs

Application

programs

Application

programs

CADBMS

Data

Store

Key Value

Cache Server

Developer 1

Developer 2

Page 26: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Physical Data Independence.

SQL as a “what”-oriented language.

KOSAR

Application

programs

Application

programs

KOSAR

RDBMS

Key Value

Cache Server

Developer 1

Developer 2

Ghandeharizadeh et. al. A Demonstration of KOSAR. Middleware 2014.

Page 27: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Architecture A database driven application:

Data Store Server

Data Store Client

Application

Page 28: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Architecture: Example An RDBMS driven application authored

using Java:

MySQL Server

JDBC

Application

SQL Result Set

Page 29: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

KOSAR: Transparent Caching

Simply replace the client component of your application with KOSAR and see it run much faster.

Data Store Server

Data Store Client

Application

Ghandeharizadeh, Yap, and Nguyen. Strong Consistency in Cache Augmented SQL Systems. Middleware 2014.

Ghandeharizadeh, Irani, Lam, Yap. CAMP: A Multi-Queue Eviction Policy for Key-Value Stores. Middleware 2014.

Page 30: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

How?1. Lookup query result instead of query

processing.

Data Store Server

Data Store Client

Application

memcached Servers

Ideal for workloads that exhibit a high read to write ratio.

Page 31: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Client-Server Architecture

0

2000

4000

6000

8000

10000

12000SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

0.1% Write 10% Write

SQL-X SQL-X

CADBMSCADBMS

Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.

Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.

Page 32: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

BG Benchmark, http://bgbenchmark.org

BG is a macro benchmark for interactive social networking actions.

BG quantifies the Social Action Rating (SoAR) of a data store:

For a given workload, the maximum number of simultaneous actions performed by a data store while satisfying a pre-specified SLA.

Barahmand and Ghandeharizadeh. BG: A Social Networking Benchmark. CIDR 2013.

Barahmand and Ghandeharizadeh. D-Zipfian: A Decentralized Implementation of Zipfian. SIGMOD DBTest 2013.

Barahmand and Ghandeharizadeh. Expedited Benchmarking of Social Network Actions. CIKM 2013.

Alabdulkarim, Barahmand and Ghandeharizadeh. A Scalable Benchmark for Interactive Social Networking Actions.

Ph.D. Fellowship

Page 33: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Client-Server Architecture

0

2000

4000

6000

8000

10000

12000SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

0.1% Write 10% Write

SQL-X SQL-X

CADBMSCADBMS

Page 34: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Shared Address Space1. Avoid overhead of serialization and

network communication

Data Store Server

Data Store Client

Application

Page 35: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Shared Address Space

0

20000

40000

60000

80000

100000

120000

140000

0.1% Write

SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

10% Write

CADBMS

CADBMS

SQL-X SQL-X

Page 36: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Shared Address Space

0

20000

40000

60000

80000

100000

120000

140000

0.1% Write

SoAR (Actions/Second)

SLA: 95% of actions to observe a response time faster than 100 msec.

10% Write

CADBMS

CADBMS

SQL-X SQL-X

Page 37: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Why?1. CPU overhead of query processing is

more than 85% [1, 2].

Data Store Server

Data Store Client

Application

Cache Servers

Harizopoulos et. al. OLTP: Through the Looking Glass and What We Found There. SIGMOD 2008.

Stonebraker and Cattell. 10 Rules for Scalable Performance in Simple Operation Datastores. CACM 2011.

Page 38: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Architectures Client-Server, Shared-Address Space,

and Hybrids.

Client-Server Shared-Address Space

Ghandeharizadeh, and Yap. Cache Augmented Data Stores. SIGMOD DBSocial 2013.

Page 39: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

NON VOLATILE MEMORY

Page 40: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Non Volatile Memory

Flash

DRAM HDD

CPU

DRAM HDD

CPU Flash

DRAM HDD

CPUNVM

Traditional

2010

2017(late 2016)

Flash

DRAM

CPU

Page 41: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Non-Volatile Memory

Byte-addressable

Time to rewrite the key-value stores & database engine!

Configurable:

Time to re-design algorithms

Emulated

HDD

NVM

Emulated

Flash

CPU

Emulated

HDD

DRAM

NVM

Emulated

Flash

Emulated

DRAM

CPU

Page 42: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Digital Era

Database

File System/

Data Store

0011101000000101110101

Page 43: SQL, NoSQL, and Next Generation DBMSs - dexa. · PDF fileSQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab. Outline ... E.g., SimpleDB,

Future (Biological) Computers

Database DBMS/Data Store