Top Banner
GUIDE TO SQL - NOSQL MIGRATION Anton Yazovskiy Solution Architect, Thumbtack Technology
28

Guide to SQL to NoSQL migration

Jan 15, 2015

Download

Engineering

Anton Yazovskiy

Is your legacy database infrastructure struggling to meet the demand of customer Service Level Agreements? If you, like many companies, are discovering that your infrastructure is not robust enough to deal with the speed and scale required of today's Internet-scale applications, it may be time to consider a switch to NoSQL storage.

Changing storage systems can be a daunting process and, with all the buzz surrounding NoSQL, it can be difficult to know where to start. As a Solutions Architect at Thumbtack Technology, Anton Yazovskiy has helped many companies through the selection and deployment process of NoSQL technologies. In this webinar, Anton will explain the main advantages of NoSQL and common use cases in which the migration to NoSQL makes sense. You will learn key questions that you should ask before migration, as well as important differences in data modeling and architectural approaches. Finally, you will take a look at a typical application based on Relational Database Management System (RDBMS) and will migrate it to NoSQL step-by-step.

Key topics that will be covered:

> Why you would want to migrate to NoSQL
> Conceptual differences between RDBMS and NoSQL
> Data modeling and architectural best practices
> "I got it. But what exactly I need to do?" - Practical migration steps

ABOUT THE PRESENTER
Anton Yazovskiy is a Software Engineer at Thumbtack Technology, where he focuses on high-performance enterprise architecture. He has presented at a variety of IT conferences and “DevDays” on topics such as NoSQL and MarkLogic.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Guide to SQL to NoSQL migration

GUIDE TO SQL - NOSQL MIGRATION

Anton Yazovskiy Solution Architect, Thumbtack Technology

Page 2: Guide to SQL to NoSQL migration

AGENDA

• Why would you want to migrate to NoSQL

• Conceptual difference between RBDMS and NoSQL

• Data modeling and architectural best practices

• Practical migration steps / questions you have to ask

Page 3: Guide to SQL to NoSQL migration

WHY?scalability

performance developer productivity

Page 4: Guide to SQL to NoSQL migration

CONCEPTUAL DIFFERENCE BETWEEN RBDMS AND NOSQL• relational schema allows you to query data in many different ways in different contexts

• accessible for many types of applications and separate dev teams

• schema helps to control rules common for everybody

!

• always remember that in most cases you run queries across the cluster

• NoSQL is about focusing on particular need and goal

• model your data for specific use case

• define what are you willing to sacrifice to achieve better results

Page 5: Guide to SQL to NoSQL migration

DATA MODELING AND ARCHITECTURAL BEST

PRACTICES

Page 6: Guide to SQL to NoSQL migration

POLYGLOT PERSISTENCE• different solutions are designed to solve different problems

• session & fast transactions

• cache

• aggregations

• analytical ad-hoc queries

• graph traversal

• the requirements for OLTP and OLAP storages are very different

Page 7: Guide to SQL to NoSQL migration

POLYGLOT PERSISTENCE

Page 8: Guide to SQL to NoSQL migration

NOSQL DATA STRUCTURES

• Key-Value: Riak, Redis, MemcacheDB, Aerospike and Amazon DynamoDB (Cloud).

• Key-Document: MongoDB and Couchbase.

• Column-Family: Cassandra, HBase

• Graph Databases - Neo4j and OrientDB.

Page 9: Guide to SQL to NoSQL migration

PRACTICAL MIGRATION

STEPS• what would you like to achieve • learn your traffic • lean your data set • what are you willing to sacrifice • apply polyglot persistence • model your data • synchronization

Page 10: Guide to SQL to NoSQL migration

WHAT WOULD YOU LIKE TO ACHIEVE

• better performance

• scale current solution

• process more or(and) different data

• speed-up the development

• I heard of it

Page 11: Guide to SQL to NoSQL migration

LEARN YOUR TRAFFIC• how workload looks like:

• OLTP (simple lookups, short transactions)

• OLAP (aggregations, analytical queries, ad-hock scans, etc.)

• heavy-read, heavy-write

• what kind of queries do you perform in order to address application's questions:

• simple lookups, uncertain search, inner requests, traversal, BI/Analysis

Page 12: Guide to SQL to NoSQL migration

LEAN YOUR DATA SET• what kind of data types do you operate with

• simple key-value

• structure, semi-structure

• nested/hierarchical

• graph-oriented

• what size of each data type do you have

Page 13: Guide to SQL to NoSQL migration

WHAT ARE YOU WILLING TO SACRIFICE

• what data doesn't require a strong consistency

• where transactional guarantees aren't require

• what data are you willing to lost in case of hardware failure

• where are you willing to sacrifice joins

Page 14: Guide to SQL to NoSQL migration

APPLY POLYGLOT PERSISTENCE

• Based on discovered answers, define the most obvious types of storages that you may need

• fast & simple storage for lookups, non-critical data and short transactions

• RDBMS for data that fit into single server

• document-oriented storage for inner/hierarchical data and aggregate-oriented reads & writes

• graph-oriented storage for traversal queries, social relations, etc.

• highly-scalable storage for BigData background processing

Page 15: Guide to SQL to NoSQL migration

DEFINE A DATA MODEL

Page 16: Guide to SQL to NoSQL migration

DATA MODELING: BEFORE YOU START

• from “what data do I have”to “what questions do I have”

• denormalization & duplication are your best friends

• hierarchical and embedded structures make your life easier, but they are your worst enemy

Page 17: Guide to SQL to NoSQL migration

REFERENCES

• in-application joins

• nothing to be ashamed about

• apply carefully

!{ user_name: ayazovskiy, contact: {..}, access: { level: 523, group: dev } } { access_level: 523, rules: [...] }

Page 18: Guide to SQL to NoSQL migration

DUPLICATION• Duplication is a technique of copying pieces of data between

structures in order to either optimize query processing time or convert data into particular business model.

!

• The main advantages of denormalization is ability to:

1. reduce the number of I/O operations and query time

2. reduce complexity of query processing in distributed systems

Page 19: Guide to SQL to NoSQL migration

AGGREGATES• simplify data processing logic

• optimize read/write time

• ability to distribute the data across the cluster

• reduce # of requests across the cluster

• perform atomic updates

{ user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }

Page 20: Guide to SQL to NoSQL migration

AGGREGATES

• updates of duplicated data are heavy and complex

• querying across aggregates heavy and complex

{ user_name: ayazovskiy, contact: { phone: 123, email: @thumbtack.net }, access: { level: 5, group: dev } }

Page 21: Guide to SQL to NoSQL migration

COUNTERS

• NoSQL auto-increment analog

• distributed consistent auto-increment is tricky

• counters aren't always reliable *

Page 22: Guide to SQL to NoSQL migration

COMPOSITE KEYS

{ "ID": "chat#user_1#user_2#december_12_2014", "messages": [ { "user_1": "hey" }, { "user_1": "how is going?" }, { "user_2": "thanks, pretty well!" } ] }

Page 23: Guide to SQL to NoSQL migration

APPEND

{ ID: account#User_A, account_total: $100, account_total_calculation_time: .., changes_since_last_calculation: [ 1399493200: +$10, 1399892139: -$25 ] }

Page 24: Guide to SQL to NoSQL migration

THINK OF DATA SYNCHRONIZATION

• application-level synchronization:

• e.g. update user profile in document-oriented storage, it's social network in graph storage, and session in key-value cache

• regular synchronization:

• this may be a hourly/daily/weekly process that takes updated data and propagates across the system

• incremental background synchronization

• solutions like Tungsten synchronizer allows you to track changes in RDBS via transactional log, and apply these changes immediately to NoSQL storage

• e.g. user profiles in MySQL synchronized with Aerospike via property configured Tungsten Replicator

Page 25: Guide to SQL to NoSQL migration
Page 26: Guide to SQL to NoSQL migration

–Anton Yazovskiy

“always remember that in most cases you run queries across the cluster”

Page 27: Guide to SQL to NoSQL migration

Any questions?

Thank you

@yazovsky [email protected] www.thumbtack.net

Page 28: Guide to SQL to NoSQL migration

THANKS / REFERENCES• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot

Persistence by Pramod J. Sadalage and Martin Fowler

• NoSQL Data Modeling Techniques

(http://highlyscalable.wordpress.com)

• MongoDB documentation (http://docs.mongodb.org)

• Couchbase documentation (http://docs.couchbase.com)

• FoundationDB Blog (http://blog.foundationdb.com)