Top Banner
Migrating from PostgreSQL to MySQL at Cocolog Naoto Yokoyama, NIFTY Corporation Garth Webb, Six Apart Lisa Phillips, Six Apart Credits: Kenji Hirohama, Sumisho Computer System s Corp.
47
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 5MB

Migrating from PostgreSQL to MySQL at Cocolog

Naoto Yokoyama, NIFTY CorporationGarth Webb, Six ApartLisa Phillips, Six Apart

Credits:Kenji Hirohama, Sumisho Computer Systems Corp.

Page 2: 5MB

Agenda

1. What is Cocolog 2. History of Cocolog 3. DBP: Database Partitioning 4. Migration From PostgreSQL to MySQL

Page 3: 5MB

1. What is Cocolog

Page 4: 5MB

What is Cocolog

NIFTY Corporation Established in 1986 A Fujitsu Group Company NIFTY-Serve (licensed and interconnected with CompuServe) One of the largest ISPs in Japan

Cocolog First blog community at a Japanese ISP Based on TypePad technology by SixApart Several hundred million PV/month

History Dec/02/2003: Cocolog for ISP users launch Nov/24/2005: Cocolog Free for free launch April/05/2007: Cocolog for Mobile Phone launch

Page 5: 5MB

2008/04700 Thousand Users

Cocolog (Screenshot of home page)

Page 6: 5MB

Cocolog (Screenshot of home page)

TypePadCocolog

Page 7: 5MB

Cocolog template sets

Page 8: 5MB

Cocolog Growth (User)  ■ Cocolog  ■ Cocolog Free

phase1

phase2

phase3

phase4

Page 9: 5MB

Cocolog Growth (Entry)  ■ Cocolog  ■ Cocolog Free

phase1

phase2

phase3

phase4

Page 10: 5MB

Technology at Cocolog

Core System Linux 2.4/2.6 Apache 1.3/2.0/2.2  & mod_perl Perl 5.8+CPAN PostgreSQL 8.1 MySQL 5.0 memcached/TheSchwartz/cfengine

Eco System LAMP,LAPP,Ruby+ActiveRecord, Capistrano Etc...

Page 11: 5MB

Monitoring Management Tool

Proprietary in-house development with PostgreSQL, PHP, and Perl

Monitoring points (order of priority) response time of each post number of spam comments/trackbacks number of comments/trackbacks source IP address of spam number of entries number of comments via mobile devices page views via mobile devices time of batch completion amount of API usage bandwidth usage

DB Disk I/O Memory and CPU usage time of VACUUM analyze

APP number of active processes CPU usage Memory usage

Hard

DB

Service

APL

Page 12: 5MB

2. History of Cocolog

Page 13: 5MB

Phase1 2003/12 ~ (Entry: 0.04Million)

Register

PostgreSQL

NAS

WEB

Static contents Published

Before DBP10servers

TypePad

Page 14: 5MB

PodcastPortal

Profile Etc..

Phase2 2004/12 ~ (Entry: 7Million)

Rich templatePublish Book

Tel Operator Support

NAS

WEB

Static contents Published

PostgreSQL

Register

TypePad2004/12 ~2005/5 ~

Before DBP50servers

Page 15: 5MB

Phase2 - Problems

The system is tightly coupled. Database server is receiving from multiple poi

nts. It is difficult to change the system design and

database schema.

Page 16: 5MB

Phase3 2006/3 ~ (Entry: 12Million)

NAS

WEB

Static contents Published

Web-API

memcached

PodcastPortal

Profile Etc..

PostgreSQL

Rich templatePublish Book

Tel Operator Support

RegisterTypePad

Before DBP200servers

Page 17: 5MB

Phase4 2007/4 ~ (Entry: 16Million)

Web-API

NASWEB

Static contents Published

memcached

Atom

MobileWEB

Rich templatePublish Book

Tel Operator Support

Register

Typepad

PostgreSQL

Before DBP300servers

Page 18: 5MB

Now 2008/4 ~Web-API

NASWEB

Static contents Published

memcached

Atom

MobileWEB

Typepad

Rich templatePublish Book

Tel Operator Support

Register

Multi MySQL

After DBP150servers

Page 19: 5MB

3. TypePad Database Partitioning

Page 20: 5MB

Steps for Transitioning

• Server Preparation Hardware and software setup

• Global Write Write user information to the global DB

• Global Read Read/write user information on the global DB

• Move Sequence Table sequences served by global DB

• User Data Move Move user data to user partitions

• New User Partition All new users saved directly to user partition 1

• New User Strategy Decide on a strategy for the new user partition

• Non User Data Move Move all non-user owned data

Page 21: 5MB

Storage

TypePad Overview (PreDBP)

Database(Postgres)

Static Content (HTML, Images, etc)

ApplicationServer

WebServer

TypeCastServer

ATOMServerMEMCACHED

Data Caching servers to reduce DB load

Dedicated Server for TypeCast (via ATOM)

https(443)http(80)

http(80) : atom apimemcached(11211)

postgres(5432)

MailServer

Internet

nfs(2049)

ADMIN(CRON)Server

smtp(25) / pop(110)Blog Readers

Blog Owners

Mobile Blog Readers

smtp(25) / pop(110)

Cron Server for periodic asynchronous tasks

Page 22: 5MB

TypePadTypePadTypePad

Non-User Role

Why Partition?

TypePad

User Role(User0)

All inquires (access) go to one DB(Postgres)

After DBPCurrent setup

Inquiries (access) are divided among several DB(MySQL)

TypePadTypePadTypePadTypePad

GlobalRole

Non-UserRole

User Role(User1)

User Role(User2)

User Role(User3)

Page 23: 5MB

Non-User Role

Server Preparation

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

New expanded setup

DB(MySQL) for partitioned data

Current Setup

Job Server+ TypePad + Schwartz

SchwartzDB

User information is partitioned

Maintains user mapping and primary key generation Stores job

details

Server for executing Jobs

※Grey areas are not used in current steps

Asynchronous Job Server

Information that does not need to be partitioned (such as session information)

Page 24: 5MB

Global WriteCreating the user map

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation

①: For new registrations only, uniquely identifying user data is written to the global DB ②: This same data continues to be written to the existing DB

DB(MySQL) for partitioned data

Asynchronous Job Server

Maintains user mapping and primary key generation

※Grey areas are not used in current steps

Page 25: 5MB

Global ReadUse the user map to find the user partition

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: Migrate existing user data to the global DB ②: At start of the request, the application queries global DB for the location of user data ③: The application then talks to this DB for all queries about this user. At this stage the global DB points to the user0 partition in all cases.

DB(MySQL) for partitioned data

Maintains user mapping and primary key generation

①Migrate existing

user data

Asynchronous Job Server

※Grey areas are not used in current steps

Page 26: 5MB

Move SequenceMigrating primary key generation

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: Postgres sequences (for generating unique primary keys) are migrated to tables on the global DB that act as “pseudo-sequences”. ② Application requests new primary keys from global DB rather than the user partition.

DB(MySQL) for partitioned data

Maintains user mapping and primary key generation

※Grey areas are not used in current steps

Migrate sequence management

Asynchronous Job Server

Page 27: 5MB

User Data MoveMoving user data to the new user-role partitions

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: Existing users that should be migrated by Job Server are submitted as new Schwartz jobs. User data is then migrated asynchronously ②: If a comment arrives while the user is being migrated, it is saved in the Schwartz DB to be published later. ③: After being migrated all user data will exist on the user-role DB partitions ④: Once all user data is migrated, only non-user data is on Postgres

DB(MySQL) for partitioned data

Stores job details

Server for executing Jobs

Maintains user mapping and primary key generation

User information is partitioned

※Grey areas are not used in current steps

Migrating each user data

DB(MySQL) for partitioned data

Page 28: 5MB

New User PartitionNew registrations are created on one user role partition

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: When new users register, user data is written to a user role partition. ②: Non-user data continues to be served off Postgres

DB(MySQL) for partitioned data

Maintains user mapping and primary key generation

User information is partitioned

※Grey areas are not used in current steps

Asynchronous Job Server

Page 29: 5MB

New User StrategyPick a scheme for distributing new users

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: When new users register, user data is written to one of the user role partitions, depending on a set distribution method (round robin, random, etc) ②: Non-user data continues to be served off Postgres

DB(MySQL) for partitioned data

Maintains user mapping and primary key generation

User information is partitioned

※Grey areas are not used in current steps

Asynchronous Job Server

Page 30: 5MB

Non User Data MoveMigrate data that cannot be partitioned by user

Non-User Role

TypePad

User Role(User0)

DB(PostgreSQL)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation ①: Migrate non-user role data left on PostgreSQL to the MySQL side.

DB(MySQL) for partitioned data

Maintains user mapping and primary key generation

User information is partitioned

※Grey areas are not used in current steps

Migrate non-User data

Asynchronous Job Server

Information that does not need to be partitioned (such as session information)

Page 31: 5MB

Data migration done

Non-User Role

TypePad

User Role(User0)

DB(Postgres)

User Role(User1)

User Role(User2)

User Role(User3)

GlobalRole

Non-UserRole

Job Server+ TypePad + Schwartz

SchwartzDB

Explanation

①: All data access is now done through MySQL ②: Continue to use The Schwartz for asynchronous jobs

DB(MySQL) for partitioned data

Stores job details

Server for executing Jobs

Maintains user mapping and primary key generation

User information is partitioned

※Grey areas are not used in current steps

② Asynchronous Job Server

Information that does not need to be partitioned (such as session information)

Page 32: 5MB

Storage

The New TypePad configuration

Database(MySQL)

Static Content (HTML,

Images, etc)

ApplicationServer

WebServer

TypeCastServer

ATOMServerMEMCACHED

Data Caching servers to reduce DB load

Dedicated Server for TypeCast (via ATOM)

https(443)http(80)

http(80) : atom api

memcached(11211)

MySQL(3306)

MailServer

Internet

nfs(2049)

ADMIN(CRON)Server

smtp(25) / pop(110)

Blog Readers

Blog Owners (management

interface)

Mobile Blog Readers

smtp(25) / pop(110)

Cron Server for periodic asynchronous tasks

JobServer

TheSchwartz server for running ad-hoc jobs

asynchronously

Page 33: 5MB

4. Migration from PostgreSQL to MySQL

Page 34: 5MB

DB Node Spec History

Time OS(RedHat) CPU Xeon MEM DiskArray

2003/12

2007/11

7.4(2.4.9) 1.8GHz/512k×1 1GB No

ES2.1(2.4.9) 3.2GHz/1M×2 4GB No

ES2.1(2.4.9) 3.2GHz/1M×2 4GB Yes

AS2.1(2.4.9) 3.2GHz/1M×4 12GB Yes

AS4 (2.6.9) 3.2GHz/1M×4 12GB Yes

AS4 (2.6.9) MP3.3GHz/1M×4〔 2Core×4 〕

16GB Yes

History of scale up PostgreSQL server, Before DBP

Page 35: 5MB

DB DiskArray Spec [FUJITSU ETERNUS8000]

Best I/O transaction performance in the world 146GB (15 krpm) * 32disk with RAID - 10 MultiPath FibreChannel 4Gbps QuickOPC (One Point Copy)

OPC copy functions let you create a duplicate copy of any data from the original at any chosen time.

http://www.computers.us.fujitsu.com/www/products_storage.shtml?products/storage/fujitsu/e8000/e8000

History of scale up PostgreSQL server, Before DBP

Page 36: 5MB

Scale out MySQL servers, After DBP

A role configuration Each role is configured as HA cluster

HA Software: NEC ClusterPro Shared Storage

Page 37: 5MB

Scale out MySQL servers, After DBP

PostgreSQL

FibreChannel SAN

DiskArray

heart beat

MySQLRole3

MySQLRole2

MySQLRole1

TypePadApplication

Page 38: 5MB

Scale out MySQL servers, After DBP

Backup Replication w/ Hot backup

Page 39: 5MB

Scale out MySQL servers, After DBP

PostgreSQL

FibreChannel SAN

DiskArray

heart beat

MySQLRole3

MySQLRole2

MySQLRole1

MySQLBackupRole

TypePadApplication

mysqld mysqld mysqld

rep rep rep

opc

mysqldmysqldmysqld

Page 40: 5MB

Troubles with PostreSQL 7.4 – 8.1

Data size over 100 GB 40% is index

Severe Data Fragmentation VACUUM

“VACUUM analyze” cause the performance problem Takes too long to VACUUM large amounts of data dump/restore is the only solution for de-fragmentation

Auto VACUUM We don’t use Auto VACUUM since we are worried about

latent response time

Page 41: 5MB

Troubles with PostgreSQL 7.4 – 8.1

Character set PostgreSQL allow the out of boundary UTF-8

Japanese extended character sets and multi bytes character sets which normally should come back with an error - instead of accepting them.

Page 42: 5MB

“Cleaning” data

Removing characters set that are out of the boundries UTF-8 character sets.

Steps PostgreSQL.dumpALL Split for Piconv UTF8 -> UCS2 -> UTF8 & Merge PostgreSQL.restore

dump Split UTF8->UCS2->UTF8 Mergerestore

Page 43: 5MB

TypePadTypePad

Migration from PostgreSQL to MySQL using TypePad script

Steps PostgreSQL -> PerlObject & tmp publish

-> MySQL -> PerlObject & last publish diff tmp & last Object ( data check ) diff tmp & last publish ( file check )

PostgreSQLDocument

Object

tmp

Document

Object

lastFile check

data check

Page 44: 5MB

Troubles with MySQL

convert_tz function doesn't support the input value outside the

scope of Unix Time sort order

different sort order without “order by” clause

Page 45: 5MB

Cocolog Future Plans

Dynamic Job queue

Page 46: 5MB

Consulting by

Sumisho Computer Systems Corp. System Integrator first and best partner of MySQL in Japan

since 2003 provide MySQL consulting, support, training

service HA Maintenance

online backup Japanese character support

Page 47: 5MB

Questions