CUBRID Features Optimized for Social Networking Services
Post on 15-Jan-2015
1421 Views
Preview:
DESCRIPTION
Transcript
ⓒ 2010 NHN BUSINESS PLATFORM CORPORATION
CUBRID Reference Architecture for Social Networking Service
Kieun Park
NHN Business Platform Corp.
2011.8
저작권 Copyright Notice
Copyright 2010 NHN Corporation. All Rights Reserved.
이 문서는 NHN ㈜의 지적 자산이므로 NHN ㈜의 승인 없이 이 문서를 다른 용도로 임의 변경하여 사용할 수 없습니다 . 이 문서는 정보제공의 목적으로만 제공됩니다 . NHN ㈜는 이 문서에 수록된 정보의 완전성과 정확성을 검증하기 위해 노력하였으나 , 발생할 수 있는 내용상의 오류나 누락에 대해서는 책임지지 않습니다 . 따라서 이 문서의 사용이나 사용 결과에 따른 책임은 전적으로 사용자에게 있으며 , NHN ㈜는 이에 대해 명시적 혹은 묵시적으로 어떠한 보증도 하지 않습니다 . 관련 URL 정보를 포함하여 이 문서에서 언급한 특정 소프트웨어 상품이나 제품은 해당 소유자의 저작권법을 따르며 , 해당 저작권법을 준수하는 것은 사용자의 책임입니다 .NHN ㈜는 이 문서의 내용을 예고 없이 변경할 수 있습니다 .
This document is an intellectual asset of NHN Corp.; it cannot be arbitrarily used for other pur-poses without the approval of NHN Corp.This document is offered only for the purpose of information provision. NHN Corp. has endeav-ored to verify the completeness and accuracy of information contained in this document, but it does not take the responsibility for possible errors or omissions in this document. Therefore, the responsibility for the usage of this document or the results of the usage falls entirely upon the user, and NHN Corp. does not make any explicit or implicit guarantee regarding this. Software products or merchandises mentioned in this document, including relevant URL infor-mation, conform to the copyright laws of their respective owners. It is the responsibility of the user to abide by the corresponding copyright law.NHN Corp. may modify the details of this document without prior notice.
46 CUBRID Reference Architecture for Social Networking Ser-vice
2 /
46 CUBRID Reference Architecture for Social Networking Service
Abstract
3 /
The top ranked facebook celebrity has 44 million fans. The top ranked twitter user has 11 million followers. There are
over 900 million objects in the facebook site and 140 million tweets people send per day. Needless to say, these facts
heavily impact on database they have. Thus, best practice in database architecture is important.
Online social networking (OSN) services have rapidly proliferated and changed the way data is stored and served. Social
data is an enormous graph of small objects that are tightly interconnected. The service page of OSN is a view of those
small objects customized to a specific viewers at a specific time. Typically, the view is aggregation of events connected
by social graph which is changing constantly with users' realtime interaction. Even though the Dunbar's number shows
that the number of people with whom one gets stable social relationship is relatively small as 150, in OSN site celebs
have a large number of followers so that the social graph is very huge. These properties of the data lead to new chal-
lenges, and demands new database architecture to handle them.
The main considerations of database architecture for OSN are about scale-out and performance in addition to high avail-
ability as mandatory. the main characteristics of OSN service in terms of data are power-law scaling, data feeding frenzy
and Zipfian distribution access. Data being delivered are exponentially growing according to the popularity of the ser-
vice. Cost-effective database scale-out architecture is important to business requirement as well as to technical issues.
In this presentation, CUBRID Reference Architecture for social networking service will be shown. The presented architec-
tures are based on best practices developed from real business cases of NHN, biggest portal service provider in Korea.
Described are the helpful features to support the database architecture demands for OSN service. For example, index
scan with top-k sorting technique is developed for fast feed aggregation. Also, HA, automatic sharding and clustering
features of the CUBRID will be explained. Finally, the nStore, a distributed database system based on the CUBRID, will be
introduced. Concept of the nStore is similar to Amazon Dynamo but different in that it support SQL.
I Am
46 CUBRID Reference Architecture for Social Networking Ser-vice
4 /
박기은 Kieun Park
• Software/Database Architect
• Service Platform Development Center
• NHN Business Platform Corp.
• iamyaw@nhn.com
• CUBRID Open Source DBMS
• nStore Distributed Database System
46 CUBRID Reference Architecture for Social Networking Service
Contents
5 /
Characteristics of online social net-working service
How fast is the data growing in online social networking service?
Characteristics of OSN service: Power-law scaling growth, data
feeding frenzy, and Zipfian distribution access
How does it access database? Feed aggregation
Challenges and demands on data-base architecture
CUBRID features
CUBRID reference architecture for so-
cial networking service
46 CUBRID Reference Architecture for Social Networking Service
Contents
6 /
Characteristics of online social net-working service
Business demands and system requirements
Main considerations of database architecture for OSN service
Scale-out, performance, and high availability
Challenges and demands on data-base architecture
CUBRID features
CUBRID reference architecture for so-
cial networking service
46 CUBRID Reference Architecture for Social Networking Service
Contents
7 /
Characteristics of online social net-working service
Index scan with top-k sorting technique
High availability feature
Automatic sharding component
CUBRID Cluster System
nStore, a distributed database system based on the CUBRID
Challenges and demands on data-base architecture
CUBRID unique features
CUBRID reference architecture for so-
cial networking service
46 CUBRID Reference Architecture for Social Networking Service
Contents
8 /
Characteristics of online social net-working service
CUBRID Web Reference Architecture
CUBRID SNS Reference Architecture
Challenges and demands on data-base architecture
CUBRID features
CUBRID reference architecture for social networking
service
46 CUBRID Reference Architecture for Social Networking Service
9 /
Characteristics of online social networking service
46 CUBRID Reference Architecture for Social Networking Service
Some Infographics about Online Social Networking Service
10 /
Source http://blog.skloog.com/history-social-media-history-social-media-bookmarking/
The history and evolution of OSN are made in last 10 years.
46 CUBRID Reference Architecture for Social Networking Service
Some Infographics about Online Social Networking Service
11 /
Source http://www.digitalsurgeons.com/facebook-vs-twitter-infographic/
500 million Facebook users, 106 million Twitter users
Social networks with user bases larger than the population of most
countries
46 CUBRID Reference Architecture for Social Networking Service
Some Infographics about Online Social Networking Service
12 /
Source http://www.digitalbuzzblog.com/infographic-twitter-statistics-facts-figures/
The top ranked twitter user, Lady Gaga, has 11 million
followers. About 55 million Tweets per day.
Twitter gets about 600 million queries every day.
(http://twitaholic.com)
46 CUBRID Reference Architecture for Social Networking Service
Some Infographics about Online Social Networking Service
13 /
Source http://www.digitalbuzzblog.com/facebook-statistics-stats-facts-2011/
Source http://www.digitalbuzzblog.com/facebook-statistics-facts-figures-for-2010/
The most followed person, Eminem, has more than 44 million
fans.
More than 5 billion pieces of content shared each week.
2,716,000 messages, 1,587,000 wall posts, 10,208,000 com-
ments in 20 minutes on Facebook.
(http://www.independent.co.uk)
46 CUBRID Reference Architecture for Social Networking Service
Some Infographics about Online Social Networking Service
14 /
Source http://www.flowtown.com/blog/have-we-reached-a-world-of-infinite-information
Have we reached a world of infinite information?
In a similar manner to our universe, the Internet is ex-panding at an incredibly rapid pace, reaching new levels of information storage and con-tent creation every second.Every minute,
24 hours of video
By 2020,roughly 25x1018 (quintillion)
information containers
The growth gapbetween
the digital contents createdand the available storage
46 CUBRID Reference Architecture for Social Networking Service
Statistics of Facebook and Twitter
15 /
Source http://blog.twitter.com/2011/03/numbers.htmlSource http://www.facebook.com/press/info.php?statistics
More than 750 million active users.
There are over 900 million objects that people interact with (pages, groups, events and community
pages)
140 million; the average number of Tweets people sent per day.
6,939; current TPS record.
46 CUBRID Reference Architecture for Social Networking Service
Statistics of Me2Day
16 /
Jan/11 Feb/11 Mar/11 Apr/11 May/11 Jun/11 Jul/11
4,367,8614,721,644
5,010,230
5,430,343
6,019,556
6,425,8476,684,905
# Members Postings per day: 278,461
Total postings: 123,456,727
Total photos: 10,638,089
Rank Nickname Friends
1 지 ** 곤 432,186
2 산 ** 박 427,021
3 * 봄 337,414
4 아 ** 258,272
5 미투도우미 257,759
6 대 * 228,359
7 유 ** 224,226
8 민 * 223,739
9 신 ** 223,541
10 빅 ** 아 221,132
46 CUBRID Reference Architecture for Social Networking Service
Online social networking service
17 /
Social data is an enormous graph of small ob-
jects that are tightly interconnected.
The service page of OSN is a aggregation of
events connected by social graph which is
changing constantly with users' realtime inter-
action.
46 CUBRID Reference Architecture for Social Networking Service
Feed Following Works
18 /
Data Storage Layer
Content Management Layer
Application Layer
DatabaseCache
Database
Delivery & AggregationEngine
Feeds Following
FollowerContents(comment, photo, tag, …) News Feeds
(personalized feeds)
Outbox Inbox
46 CUBRID Reference Architecture for Social Networking Service
Characteristics of Online Social Networking Service
19 /
• Users follow activity and news of other users and entities.
• Followers gets personalized feeds that aggregate streams produced those followed.
• Highly variable and somewhat bit fan-out of the follows graph makes data feeding difficult to implement and requires high cost to operate.
Data feeding frenzy
Power-law scal-
ing growth
Online social networks have proper-ties of significant clustering, small diameter, and power-law degrees.
Zipfian distribu-tion ac-
cessTwitter Activity
5% of users account for 75% of all ac-tivity, 10% account for 86% of activity, and the top 30% account for 97.4%.
46 CUBRID Reference Architecture for Social Networking Service
20 /
Challenges and demands on database architecture
46 CUBRID Reference Architecture for Social Networking Service
Challenge and Demands on Database Architecture
21 /
• Online social networking service have rapidly proliferated and
changed the way data is stored and served.
• Today social media generates more information in a short period
of time than was previously available in the entire world a few
generations ago.
• Not only the exponential growth of Facebook, Google+, Twitter,
but also the use of more and more rich media such as user-gen-
erated video from smart phone, is surely driving big data.
Source http://www.itu.int/net/itunews/issues/2010/06/35.aspx
From business demands to technology implementation.
46 CUBRID Reference Architecture for Social Networking Service
When an application is being designed, software architects need to plan for much greater application load to avoid major redesigns in the future. While scaling out web servers can be done quite easily, properly scaling out database servers is far more challenging and happens.
With enterprise data volumes moving past terabytes to tens of petabytes and more, business and IT leaders face significant opportunities and challenges from big data. For a large enterprise, big data may be in the petabytes or more; for a small or mid-size enterprise, data volumes that grow into tens of terabytes may become challenging to analyze and manage.
Social media now produces massive amounts of data. Facebook’s network, for in-stance, consists of 100 million entities generating tens of millions of events per second. Twitter, meanwhile, funnels 140 million public tweets a day. [GigaOM research notes]
Challenge and Demands on Database Architecture
22 /
Managing user generated social interaction data!
Coping with explosion in data volume!
Cost-effective scale-out to meet rapidly growing demands!
46 CUBRID Reference Architecture for Social Networking Service
23 /
CUBRID unique features
CUBRID
46 CUBRID Reference Architecture for Social Networking Ser-vice
24 /
Free
open sourceis the choice
of the modernworld
Powerful
clean architecturewith rich functional-
ityfor competitive
performance
Enterprise
unique featuresfor stability
and reliability
46 CUBRID Reference Architecture for Social Networking Service
• HA feature• Reclaim deleted space• Fast serial data (cached)• LFS (large file support )
for database volume
CUBRID
25 /
2006 20112007 2008 2009 2010 2012
CUBRID became an open source project.CUBRID 2008 R1.1 stable was released.
The development of CUBRID DBMS started.
First internal release CUBRID 2008 R1.0
October, 2008
November, 2008
August, 2009CUBRID 2008 R2.0 stable released.
October, 2009CUBRID Cluster Project has been started.
September, 2009
Official open source community, www.cubrid.org, opened.
October, 2010
CUBRID 3.0 stable released.
CUBRID 4.0 stable released.July, 2011
• INSERT per-formance en-hancement
• Database volume size reduced.
• Multi-range scan and key limit function
• Covered in-dex
• FBO (file-based object)
• HA monitor-ing
• Full SQL func-tion support
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Index Scan with Top-k Sorting Technique
26 /
Multi-range scan
(4,10001) (4,9999) (4,875) …
(15, 10000) (15,9999) (15, 7467) …
(36,947) (36,120) (36,3) …
Single range scan with key filter
Filter out
# of leaf pages accessed> # of keys of scan result
# of leaf pages accessed = # of keys of scan result
Sort after scan On the fly sortingduring scan
SELECT post_no FROM postsWHERE id IN (4, 15, 36, …) AND registered_date < 20000ORDER BY registered_date DESC LIMIT 20
CUBRID does multi-range index scan.
(4,10001) (4,9999) (4,875) …
(15, 10000) (15,9999) (15, 7467) …
(36,947) (36,120) (36,3) …
My friends’ newest twenty
comments
Disk I/O ?!
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Index Scan with Top-k Sorting Technique
27 /
SELECT * FROM tbl WHERE a = 2 AND b < ‘K’ORDER BY b LIMIT 3;
SELECT * FROM tbl WHERE a IN (2, 4, 5) AND b < ‘K’ORDER BY b LIMIT 3;
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Test Results
28 /
Refer http://www.cubrid.org/cubrid_mysql_sns_benchmark_test
Test Case 1Test Case 2
Test Case 3Test Case 4
0
50
100
150
200
250
300
M UNIONM INC UNIONC IN
User group 1: users with 50 or less friendsUser group 2: users with 51~2000 friendsUser group 3: users with friends up to tens of thou-sands
Test case 1: user group 1 onlyTest case 2: user group 2 onlyTest case 3: 40% of user group 1, 50% of user group
2, 10% of user group 3Test case 4: 10% of user group 1, 50% of user group
2, 40% of user group 3
46 CUBRID Reference Architecture for Social Networking Service
CUBRID High Availability Feature
29 /
Database Server
Application
Master DB Slave DB Slave DB
ActiveServer
Standby-2Server@ Remote IDC
Standby-1Server
ActiveBroker
Read-WriteMode
Read-OnlyMode
BackupBroker
automaticfail-over/fail-back
Broker
automaticswitch-over
CUBRID Driver CUBRID Driver
UPDATE
SELECT
UPDATE
CUBRID HA, highly fault-resistant DBMS enables
• Non-stop 24x7 ser-vice
• System maintenance without shutdown
• Automatically fail-over (less than 20 sec)
• Various acess modes (read-write, read-only)
46 CUBRID Reference Architecture for Social Networking Service
CUBRID High Availability Feature
30 /
A-NodeActive Server Node
UPDATE
S1-NodeStandby Server Node
SELECT
S2-Node
TransactionLog
SlaveDB
MasterDB
SlaveDB
TransactionLog
TransactionLog
ReplicationLog
ReplicationLog
ReplicationLog
SELECT
Log Shipping(synchronous)
Log Shipping(asynchronous)
LogWriter
LogApplier
CUBRIDServer
LogWriter
LogApplier
CUBRIDServer
Heartbeat Heartbeat
Heartbeat
Log Applying Log Applying
Log Applying
HA feature is based on database replication with transaction log
multiplication technique.
Statement-based replication could cause data inconsistency.
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Automatic Sharding Component
31 /
Application
Shard #1 Shard #2 Shard #3 Shard #4Database Server
Broker
k0001k0005K000…
k0002k0006K000…
k0003k0007K000…
k0004k0008K000…
SELECT … WHERE key=k0008UPDATE … WHERE key=k0002
ShardingMetadata
Expand Shard
New Shard
Automatic sharding fea-ture enables• No more application logic• Scale-out DB architec-
ture
Features• Multiple sharding strate-
giesShard by modulus, date/time range, extendible hash
• User hint-awareSELECT * FROM tbl WHERE nonkey=‘abc’ /* shard=1 */automatic sharding
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Cluster System
32 /
Application
Node #1 Node #2 Node #3 Node #4Cluster Server
Broker
global schema / distributed partition
load balancing
gtablepart_01part_05
gtablepart_02part_06
gtablepart_03part_07
gtablepart_04part_08
SELECT * FROM gtableWHERE part_key=2 AND …
INSERT INTO gtable …
Main features of CUBRID Cluster are
• Global schema• Distributed partition• Load balancing
Users can get
• Single big database view
• Location transparency• Additionally, linear
scalability
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Cluster System
33 /
Global Schema
Local Schema #4Local Schema #3Local Schema #2Local Schema #1
Database #1 Database #2 Database #3 Database #4
contents contents contents
contents
info
info author
authorcode level local
GlobalSchema
User
LocalSchema
UserSELECT * FROM info, code WHERE info.id = code.idINSERT INTO contents…
UPDATE local …SELECT * FROM con-tents WHERE …
SELECT * FROM contentsWHERE auth = (SELECT name FROM author WHERE …)
The global schema is a single representation or a global view of all nodes where each node has its own database and schema.
The users can access any databases through a single schema regardless of and without knowing the location of the distributed data.
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Cluster
34 /
Data
SystemCatalog
Index
DataSystemCatalog
Index
DataSystemCatalog
Index
Logical View Logical View
Physical ViewPhysical View
Schema Schema
Global Schema
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Cluster
35 /
The distributed partition maps global schema onto table partitioning.Partitions are resident in different nodes but accessed through global
schema.
Database #1 Database #2 Database #3 Database #4
Global Schema
part_01 part_02 part_03 part_04
part_05 part_06 part_07 part_08
gtable – PARTITION BY HASH (part_key)
SELECT * FROM gtable, info WHERE gtable.part_key=02 AND info.id = gtable.id
info
part_02
part_06
part_03
part_07
part_03
part_08
part_01
part_05
info
Partition DataPartition DataPartition DataPartition Data
46 CUBRID Reference Architecture for Social Networking Service
nStore, a distributed database system based on the CUBRID
36 /
Concept• Container > Table >
Column
Data Model• Simplified Tabular
Query Language• Simplified SQL
Availability• 3-copy Replication
Distribution• Key-based Consis-
tency Hashing
RDB-like tabular model• Schema, column, record• Index on columns (ordered search)Restricted data type• Integer(bigint), string,
timestamp(msec), id(128bit), boolData partitioned by key• E.g., user-id could be a key
SQL-like query language• SELECT a, b, c FROM post
WHERE fid IN (?, ?, ?) AND b=?ORDER BY ts LIMIT 20,CK=“iamyaw”
• INSERT INTO post(no, id, date) VALUES (?, ?, ?),CK=“iamyaw”
Join supported• Between tables in one container
46 CUBRID Reference Architecture for Social Networking Service
nStore, a distributed database system based on the CUBRID
37 /
Application Application Application
CUBRID
CUBRID
CUBRID
CUBRID
CUBRIDnStore nStore
nStorenStore
nStore
REST API
http://server/keyspace/query?ckey=iamyaw&nsql=‘select a from tbl where k=100’&format=json
Data DistributionReplication (3- Copy)
Rebalancing
Query ProcessingStorage System
46 CUBRID Reference Architecture for Social Networking Service
nStore, a distributed database system based on the CUBRID
38 /
Table A Table B
Table C
IndexedColumn
Indexed Column
Container (ckey=iamyaw)
Global Table G
Equi-join
Equi-join
Table A Table B
Table C
IndexedColumn
Indexed Column
Container (ckey=kieun_park)Equi-join
Container Server
Container Server
Management Node
nStore
Distribution layer
Application
RDBMS
REST API
Container Server
Container Server
Container Server
Tables
46 CUBRID Reference Architecture for Social Networking Service
nStore Test Results
39 /
INSERTREAD
READ w/ compatction READ/UPDATE
READ/INSERT
0
5000
10000
15000
20000
25000
CassandraHbaseMongoDBnStore
Tested using YCSB (http://research.yahoo.com/Web_Information_Management/YCSB)
INSERT: 50,000,000 records (1K size)READ: Zifian distributionREAD w/ compaction: after SSTable compaction (Cassandra,
Hbase)READ/UPDATE: 50:50 (50,000,000 records DB)READ/INSERT: 50:50 (50,000,000 records DB)
46 CUBRID Reference Architecture for Social Networking Service
40 /
CUBRID reference architecture for social networking service
46 CUBRID Reference Architecture for Social Networking Service
CUBRID Web Reference Architecture
41 /
CUBRID HA
slavemaster
Web Server RW RO
master master master master
slave slave slave slave
CUBRID HA
DB Sharding
CUNITOR
Cache Server
Web Application Server (Business Logic)
Web Server(User Interface)
Small-size
web ser-vice
Mid-sizeweb ser-
vice
46 CUBRID Reference Architecture for Social Networking Service
Social Networking Service Architecture
42 /
User Profile DB Social Relation DB Analytics DBFeed Outbox DB Feed Inbox DB
Cache Layer
Social Query EngineAggregation EngineDelivery Engine Search Engine RecommendationEngine
Search Index
Web Application Servers (Business Logic)
Web Servers (User Interface)
46 CUBRID Reference Architecture for Social Networking Service
CUBRID SNS Reference Architecture
43 /
slave
master
CUBRID HA
slave
master
CUBRID Cluster
node #1 node #2 node #n
nStore w/ CUBRID
container container
containercontainer
RW RO
DB Sharding
broker
User profile DBsharded by user-id
slave
master
CUBRID HA
slave
master
RW RO
DB Sharding
broker
Social relation DBsharded by user-id Inbox/Outbox storage
distributed according to user-id
Analytic DBpartitioned for OLAP
management
container container
CUNITOR
monitoringserver
OAM
Cache server farm Application servers ETL
46 CUBRID Reference Architecture for Social Networking Service
Best Practices
44 /
Automatic sharding is an effective way to scale-out DB
system storing relational model data.
High available database architecture is the basic business
requirements and not technical barrier anymore.
nStore is a solution for peta-byte scale data with benefits
of high available and scalable distributed store.
End of Slides.
top related