Transcript
Oracle NoSQL Database Dave Rubin
Director – NoSQL Database Development
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Agenda
• NoSQL Use Case
• Oracle NoSQL Database
• Architecture
• Integration with the RDBMS
• Benchmark Results
Use Case – Online Display Advertising
• Problem
• Very low latency requirements – Publishers require 50 – 60 ms response
time from the ad serving platform
• Extreme data velocity – Multi-millions of requests per second
• Highly available – 24/7 sites
• Revenue maximization – Deliver the most relevant ad to maximize
revenue
• Solution – Where to use a NoSQL Database?
• Cookie store – NoSQL database used to store cookies and associated
behavioral segments
• Track behavioral data – Beacons utilized during browsing to store
timestamp, frequency, and behavioral segments by cookie
• Optimize ad delivery – Recency, frequency, and behavioral segments
used to determine optimal ad to deliver to user
Online Display Advertising Overall Solution
Real Time Reporting and
Campaign Management
Hadoop Cluster
Multi Dimensional
Reporting
Ad Server
RDBMS
Online Display Advertising – Usage
Characteristics
• NoSQL Database
• Low latency high volume • Millions of ad serving requests per minute or second
• Stringent latency requirements from publishers
• Loose consistency • Cookie data used for ad targeting – Increase probability that user will click on ad.
• Relational Database
• Campaign booking information – hundreds of users
• Real time business metrics for publishers and advertisers
• Business financials for ad serving company • Year to date revenue, quarter over quarter etc.
• Billing
• SOX reporting for public companies
• Hadoop
• Unique visits (select count(distinct)) over many terabytes of data
• Inventory forecasting across behavioral segments
Agenda
• NoSQL Use Case
• Oracle NoSQL Database
• Architecture
• Integration with the RDBMS
• Benchmark Results
• Simple Data Model
• Key-value pair with major+minor-key paradigm
• CRUD + range scans
• Scalability
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Resilient to partition failures
• Disaster recovery through location of replicas
• No single point of failure
• Transparent load balancing
• Reads from master or replicas
• Driver is network topology & latency aware
• Elastic Expansion
• Online addition/removal of storage nodes and automatic data redistribution
A Distributed, Scalable Key-Value Database
Storage Nodes Data Center A
Storage Nodes Data Center B
NoSQL DB Driver
Application
NoSQL DB Driver
Application
Architecture – The Application’s Perspective
NoSQL DB Driver
Application
Shard 1
Replicas
Shard 2
Replicas
Master
Shard N
Replicas
Master Master
• ACID transactions at shard granularity
• Transaction Scope
• Single API call
• All records must have the same major key
• Multiple operations within a transaction via collections
• Can be relaxed for increased performance on a per-
operation basis
Transactions
Simple Data Model ACID Transactions – Configurability
• Configurable Durability Policy
• Configurable Consistency Policy
• Oracle External Tables
• Export data directly from NoSQL database and create Oracle
External Table
• Pre-packaged utility
• Oracle Loader for Hadoop
• Parallel map reduce job
• Utilizes InputFormat
• Oracle Event Processing
• NoSQL data available through OEP query language (CQL)
Integration with the RDBMS and Other
Products
• YCSB-based QA/benchmarking
• Key ~= 10 bytes, Data = 1108 bytes
• Configurations of 6-30 nodes
• Typical Replication Factor of 3 (master + 2 replicas)
• 200m records per shard, 2 billion records in total
• 2 replication nodes per storage node
• Used SSDs - Two of them per host
• Minimal I/O overhead
• B+Tree fits in memory => one I/O per record read
• Writes are buffered + log structured storage system == fast write throughput
Benchmarks – General Configuration
• 2 billion records
• 226K ops/sec
• HA ack. policy = ‘Majority’
• Low latency
• Highly Scalable
Benchmark Results
0
1
2
3
4
0
50,000
100,000
150,000
200,000
250,000
6 (2x3) 12 (4x3) 24 (8x3) 30 (10x3)
Avera
ge L
ate
ncy (
ms)
Th
rou
gh
pu
t (o
ps/s
ec)
Cluster Size
Insert Throughput
Throughput (insert/sec) Write Latency (ms)
Benchmark Results (cont.)
0
1
2
3
4
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
6 (2x3) 12 (4x3) 24 (8x3) 30 (10x3)
Avera
ge L
ate
ncy (
ms)
Th
rou
gh
pu
t (o
ps/s
ec)
Cluster Size
Mixed Throughput
Throughput (ops/sec) Write Latency (ms)
Read Latency (ms)
• 95% read, 5% update
• 2 billion records
• 1.25M ops/sec
• HA ack. policy = ‘Majority’
• Low read/write latency
• Highly Scalable
Benchmark Results (cont.)
• Changed ack-policy from ‘MAJORITY’ to ‘NONE’
•Throughput increased from 226K to 407K ops/sec
• 80% improvement
0
100,000
200,000
300,000
400,000
500,000
30 (10x3)
Th
rou
gh
pu
t (o
ps/s
ec)
Insert Throughput
Majority
None
Questions
top related