5/2/16 1 Database Systems CSE 414 Lecture 16: NoSQL and JSon CSE 414 - Spring 2016 1 Announcements • Current assignments: – Homework 4 due tonight – Web Quiz 6 due next Wednesday – [There is no Web Quiz 5] • Today’s lecture: – JSon – The book covers XML instead (11.1-11.3, 12.1) CSE 414 - Spring 2016 2 The New Hipster: NoSQL CSE 414 - Spring 2016 3 NoSQL Motivation • Originally motivated by Web 2.0 applications • Goal is to scale simple OLTP-style workloads to thousands or millions of users • Users are doing both updates and reads CSE 414 - Spring 2016 4 What is the Problem? • Single server DBMS are too small for Web data • Solution: scale out to multiple servers • This is hard for the entire functionality of DMBS • NoSQL: reduce functionality for easier scale up – Simpler data model – Simpler transactions Serverless CSE 414 - Spring 2016 6 User SQLite: • One data file • One user • One DBMS application • But only a limited number of scenarios work with such model DBMS Application (SQLite) File Desktop Data file Disk
8
Embed
Database Systems CSE 414...CSE 414 - Spring 2016 21 Some requests Other requests Three replicas Data Models Taxonomy based on data models: • Key-value stores – e.g., Project Voldemort,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
5/2/16
1
Database Systems CSE 414
Lecture 16: NoSQL and JSon
CSE 414 - Spring 2016 1
Announcements
• Current assignments: – Homework 4 due tonight – Web Quiz 6 due next Wednesday – [There is no Web Quiz 5]
• Today’s lecture: – JSon – The book covers XML instead (11.1-11.3, 12.1)
CSE 414 - Spring 2016 2
The New Hipster: NoSQL
CSE 414 - Spring 2016 3
NoSQL Motivation
• Originally motivated by Web 2.0 applications
• Goal is to scale simple OLTP-style workloads to thousands or millions of users
• Users are doing both updates and reads
CSE 414 - Spring 2016 4
What is the Problem?
• Single server DBMS are too small for Web data
• Solution: scale out to multiple servers
• This is hard for the entire functionality of DMBS
• NoSQL: reduce functionality for easier scale up – Simpler data model – Simpler transactions
Serverless
CSE 414 - Spring 2016 6
User SQLite: • One data file • One user • One DBMS application
• But only a limited number of scenarios work with such model
DBMS Application (SQLite)
File
Desktop
Data file
Disk
5/2/16
2
Client-Server
7
Client Applications
CSE 414 - Spring 2016
Client-Server
Connection (JDBC, ODBC)
8
Client Applications
CSE 414 - Spring 2016
Client-Server
Server Machine
Connection (JDBC, ODBC)
9
Client Applications
• One server running the database • Many clients, connecting via the ODBC or JDBC
(Java Database Connectivity) protocol
DB Server
File 1
File 2
File 3
Client-Server
Server Machine
Connection (JDBC, ODBC)
10
Client Applications
• One server running the database • Many clients, connecting via the ODBC or JDBC
(Java Database Connectivity) protocol
Supports many apps and many users simultaneously
DB Server
File 1
File 2
File 3
11
Client-Server
• One server that runs the DBMS (or RDBMS): – Your own desktop, or – Some beefy system, or – A cloud service (SQL Azure)
CSE 414 - Spring 2016 12
Client-Server
• One server that runs the DBMS (or RDBMS): – Your own desktop, or – Some beefy system, or – A cloud service (SQL Azure)
• Many clients run apps and connect to DBMS – Microsoft’s Management Studio (for SQL Server), or – psql (for postgres) – Some Java program (HW7) or some C++ program
CSE 414 - Spring 2016
5/2/16
3
13
Client-Server
• One server that runs the DBMS (or RDBMS): – Your own desktop, or – Some beefy system, or – A cloud service (SQL Azure)
• Many clients run apps and connect to DBMS – Microsoft’s Management Studio (for SQL Server), or – psql (for postgres) – Some Java program (HW5) or some C++ program
• Clients “talk” to server using JDBC/ODBC protocol
CSE 414 - Spring 2016
3-Tiers DBMS Deployment
DB Server
File 1
File 2
File 3
14
Browser
CSE 414 - Spring 2016
3-Tiers DBMS Deployment
DB Server
File 1
File 2
File 3
15
App+Web Server
Connection (e.g., JDBC)
HTTP/SSL
Browser
CSE 414 - Spring 2016
3-Tiers DBMS Deployment
DB Server
File 1
File 2
File 3
16
App+Web Server
Web-based applications
Connection (e.g., JDBC)
HTTP/SSL
Browser
CSE 414 - Spring 2016
3-Tiers DBMS Deployment
DB Server
File 1
File 2
File 3
17
App+Web Server
Connection (e.g., JDBC)
HTTP/SSL App+Web Server
App+Web Server CSE 414 - Spring 2016
3-Tiers DBMS Deployment
DB Server
File 1
File 2
File 3
18
Why don’t we replicate the DB server too?
App+Web Server
Connection (e.g., JDBC)
HTTP/SSL App+Web Server
App+Web Server
Replicate App server for scaleup
CSE 414 - Spring 2016
5/2/16
4
Replicating the Database
• Much harder, because the state must be unique, in other words the database must act as a whole
• Two basic approaches: – Scale up through partitioning – Scale up through replication
CSE 414 - Spring 2016 19
Scale Through Partitioning
• Partition the database across many machines in a cluster – Database now fits in main memory – Queries spread across these machines
• Can increase throughput • Easy for reads but writes become expensive!
CSE 414 - Spring 2016 20
Transaction starts here Also touches
data here Three partitions
Scale Through Replication
• Create multiple copies of each database partition • Spread queries across these replicas • Can increase throughput and lower latency • Can also improve fault-tolerance
• Data remains in main memory • One type of impl.: distributed hash table • Most systems also offer a persistence option • Others use replication to provide fault-tolerance
– Asynchronous or synchronous replication – Tunable consistency: read/write one replica or majority
• Some offer ACID transactions others do not • Multiversion concurrency control or locking
• Extensible Record Stores – e.g., HBase, Cassandra, PNUTS
CSE 414 - Spring 2016 28
☞
Extensible Record Stores
• Based on Google’s BigTable
• Data model is rows and columns
• Scalability by splitting rows and columns over nodes – Rows partitioned through sharding on primary key – Columns of a table are distributed over multiple nodes by
using “column groups”
• HBase is an open source implementation of BigTable
CSE 414 - Spring 2016 29
JSon and Semistructured Data
CSE 414 - Spring 2016 30
5/2/16
6
The Semistructured Data Model
• So far we have studied the relational data model – Data is stored in tables(=relations) – Queries are expressions in the relational calculus (or
relational algebra, or datalog, or SQL…)
• Today: Semistructured data model – Popular formats today: XML, JSon, protobuf
CSE 414 - Spring 2016 31
JSON - Overview
• JavaScript Object Notation = lightweight text-based open standard designed for human-readable data interchange. Interfaces in C, C++, Java, Python, Perl, etc.
• The filename extension is .json.
CSE 414 - Spring 2016 32 We will emphasize JSon as semi-structured data
JSon vs Relational
• Relational data model – Rigid flat structure (tables) – Schema must be fixed in advanced – Binary representation: good for performance, bad for exchange – Query language based on Relational Calculus
• Semistructured data model / JSon – Flexible, nested structure (trees) – Does not require predefined schema ("self describing”) – Text representation: good for exchange, bad for performance – Most common use: Language API; query languages emerging