Top Banner
Migration Best Practices: From RDBMS to Cassandra without a Hitch #Cassandra @doanduyhai
137
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Migration Best Practices: From RDBMS to Cassandra without a Hitch

Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Page 2: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Who am I ?

2

DuyHai Doan

Achilles Cassandra Technical Advocate @ Datastax

Former Java Developer @ Libon

Page 3: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Agenda •  Libon context

•  Migration strategy

•  Business code migration

•  Data Modeling

•  Take Away

3

Page 4: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Libon Context

Page 5: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

What is Libon ? •  Messaging app

•  VOIP (out)

•  Custom voicemail & greetings

•  SMS/chat/file transfer

•  Contacts matching

5

Page 6: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Contact Matching

6

Libon User

Page 7: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Contact Matching

7

Libon User Friend

Page 8: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Contact Matching

8

Libon User Friend

Contact matching

Page 9: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Contact Matching

9

Libon User Friend

Accept link

Page 10: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  Application grew over the years

10

Page 11: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  Application grew over the years

•  Already using Cassandra to handle events

•  messaging / file sharing / SMS / notifications

•  Cassandra R/W latencies ≈ 0,4 ms

•  server response time under 10 ms

11

Page 12: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  About contacts …

12

Page 13: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

13

Page 14: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

14

Page 15: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

15

Page 16: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Project Context •  About contacts …

•  stored as relational model in RDBMS (Oracle)

•  1 user ≈ 300 contacts

•  with millions users ☞ billions of contacts to handle

•  query latency unpredictable

16

Page 17: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai 17

Page 18: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

18

Page 19: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

19

Page 20: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

20

Page 21: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

21

Page 22: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

22

Page 23: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

23

That worked

Page 24: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Fixing the problem •  Tune the RDBMS

•  indices

•  partitioning

•  less joins, simplified relational model

•  hardware capacity increased

24

That worked but …

Page 25: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Back-end application

RDBMS Cassandra

25

Page 26: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Back-end application

RDBMS Cassandra

26

We need to choose

Page 27: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Next Challenges •  High Availability (DB failure, site failure …)

27

Page 28: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

28

Page 29: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

•  Going to multi data-centers

29

Page 30: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Next Challenges •  High Availability (DB failure, site failure …)

•  Predictable performance at scale

•  Going to multi data-centers

☞ Cassandra, what else ?

30

Page 31: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data Migration Strategy

Page 32: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Objectives •  No downtime

32

Page 33: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Objectives •  No downtime

•  No concurrency corner-cases

33

Page 34: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

34

Page 35: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Objectives •  No downtime

•  No concurrency corner-cases

•  Safe rollback possible

•  Replay-ability & resume-ability

35

Page 36: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Strategy •  4 phases

36

Page 37: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Strategy •  4 phases

•  Write contacts to both data stores

37

Page 38: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Strategy •  4 phases

•  Write contacts to both data stores

•  Old contacts migration

38

Page 39: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Strategy •  4 phases

•  Write contacts to both data stores

•  Old contacts migration

•  Switch to Cassandra (but keep RDBMS in case of…)

39

Page 40: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Strategy •  4 phases

•  Write contacts to both data stores

•  Old contacts migration

•  Switch to Cassandra (but keep RDBMS in case of…)

•  Remove the RDBMS code

40

Page 41: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 1

41

Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

contactUUID

contactId … contactUUID 129363 123e4567-

e89b-12d3… 834849

contacId(long) + contactUUID

Page 42: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 1

42

Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Read

Page 43: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2 •  On live production, migrate old contacts

43

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

Old contacts created before phase 1

Page 44: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2 •  On live production, migrate old contacts

44

SQL SQL SQL

C*

C*

C* C*

C*

For each batch of users SELECT * FROM contacts WHERE user_id = … AND contact_uuid IS NULL

Logged batches of INSERT INTO contacts(..) VALUES(…) USING TIMESTAMP now() - 1 week

Old contacts created before phase 1

Page 45: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2

45

USING TIMESTAMP now() - 1 week 😳

Page 46: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2 •  During data migration …

46

Page 47: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

47

Page 48: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2 •  During data migration …

•  … concurrent writes from the migration batch …

•  … and updates from production for the same contact

48

Page 49: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2

49

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Insert from batch (to the past)

Update from production

Page 50: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2

50

contact_uuid name (now -1 week) … name (now) …

Johny … Johnny …

Future reads pick the most up-to-date value

Page 51: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Last Write Win in action

51

Case 1 Case 2

Batchpast(Johny) t1

Prodnow(Johnny) t2

t3 Read(Johnny)

Batchpast(Johny)

t1 Prodnow(Johnny)

t2

t3 Read(Johnny)

Page 52: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 2

52

"Write to the Past… to save the Future"

Libon – 2014/10/08

Page 53: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 3

53

Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

Page 54: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Migration Phase 4

54

Back end server

· ·

·

SQL SQL SQL

C*

C*

C* C*

C*

Write

Page 55: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Business Code Refactoring

Page 56: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory •  Written for RDBMS

56

Page 57: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

57

Page 58: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

58

Page 59: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory •  Written for RDBMS

•  Lots of joins (no surprise)

•  Designed around transactions

•  Spring @Transactional everywhere

59

Page 60: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory cont. •  Entities go through Services & Repositories

60

Repositories

Services

ContactEntity

Page 61: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory cont. •  Hibernate is auto-magic

61

Page 62: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Inventory cont. •  Hibernate is auto-magic

•  lazy loading

•  1st level cache

•  N+1 select

62

Repositories

Services

ContactEntity

Page 63: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra

63

Page 64: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Which options ? •  Throw existing code …

•  … and re-design from scratch for Cassandra

64

No way !

Page 65: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

65

Page 66: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Quality •  Existing business code has…

•  … ≈ 3500 unit tests

•  and ≈600+ integration tests

66

Page 67: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Code Quality

67

"The code coverage is one of your most

valuable technical asset" Libon – since beginning

Page 68: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Repositories

Services

Refactoring Strategy

68

ContactMatchingService ContactService ContactSync

ContactEntity

n 1 n n

Page 69: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Repositories

Services

Refactoring Strategy

69

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

Proxy

Page 70: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Repositories

Services

Refactoring Strategy

70

ContactMatchingService ContactService

ContactNoSQLEntity

ContactSync

ContactEntity

n 1 n n

Denorm2 … DenormN Denorm1

Proxy

Page 71: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Refactoring Strategy •  Use CQRS

•  ContactReadRepository

•  ContactWriteRepository

•  ContactUpdateRepository

•  ContactDeleteRepository

71

Page 72: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Refactoring Strategy •  ContactReadRepository

•  direct sequential read

•  no joins

•  1 read ≈ 1 SELECT

72

Page 73: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Refactoring Strategy •  ContactWriteRepository

•  write to all denormalized tables

•  using CQL logged batches

•  use TTLs

73

Page 74: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Refactoring Strategy •  ContactUpdateRepository

•  read-before-write most of the time 😟

•  rare updates ☞ acceptable perf penalty

74

Page 75: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Refactoring Strategy •  ContactDeleteRepository

•  delete by partition key

75

Page 76: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Outcome •  5 months of 2 men work

76

Page 77: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

77

Page 78: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

78

Page 79: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Gatling Output

79

Page 80: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

80

Page 81: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Outcome •  5 months of 2 men work

•  Many iterations to fix bugs (thanks to IT)

•  Lots of performance benchmarks using Gatling

☞ data model & code validation

•  … we are almost there for production

81

Page 82: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data Model

Page 83: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Denormalization, the good •  Support fast reads

•  1 read ≈ 1 SELECT

•  Worthy because mostly read, few updates

83

Page 84: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Denormalization, the bad •  Updating mutable data can be nightmare

•  Data model bound by existing client-facing API

•  Update paths very error-prone without tests

84

Page 85: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model in detail

85

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user

Page 86: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model in detail

86

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user user_id always component

of partition key

Page 87: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Scalable design

87

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1

user_id2

user_id3

user_id4

user_id5

Page 88: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Scalable design

88

n1

n2

n3

n4

n5

n6

n7

n8

A

B

C

D

E

F

G

H

user_id1 user_id2

user_id3

user_id4

user_id5

Page 89: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Bloom filters in action •  For some tables, partition key = (user_id, contact_id)

☞ fast look-up, leverages Bloom filters

☞ touches 1 SSTable most of the time

89

Page 90: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model in detail

90

Contacts_by_id

Contacts_by_identifiers

Contacts_in_profiles

Contacts_by_modification_date

Contacts_by_firstname_lastname

Contacts_linked_user Wide partition

Page 91: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

A "queue" story •  contacts_by_modification_date

•  queue-like pattern 😭

91

Page 92: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

A "queue" story •  contacts_by_modification_date

•  queue-like pattern 😭

☞ buckets to the rescue

92

user_id:2014-12 date35 date12 … … date47

… … … …

user_id:2014-11 date11 date12 … … date34

… … … …

Page 93: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model summary •  7 tables for denormalization

93

Page 94: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

94

Page 95: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Data model summary •  7 tables for denormalization

•  Normalize some tables because rare access

•  Read-before write in most update scenarios 😟

95

Page 96: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  In SQL, auto-generated long using sequence

•  In Cassandra, auto-generated timeuuid

96

Page 97: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  How to store both types ?

97

Page 98: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

98

Page 99: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  How to store both types ?

•  As text ? ☞ easy solution …

•  … but waste of space !

•  because encoded as UTF-8 or ASCII in Cassandra

99

Page 100: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  Long ☞ 8 bytes

•  Long as text(UTF-8: 1 byte) ☞ "digits count" bytes

100

Page 101: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  UUID ☞ 16 bytes

E81D4C70-A638-11E4-83CB-DEB70BF9330F

•  32 hex chars + 4 hyphens = 36 chars

•  UUID as text(UTF-8: 1 byte) ☞ 36 bytes

•  Bytes overhead = 36 – 16 = 20 bytes

101

Page 102: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  20 bytes wasted per contact uuid

102

Page 103: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

103

Page 104: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  20 bytes wasted per contact uuid

•  × 7 denormalizations = 140 bytes per contact uuid

•  × 109 contacts = 140 GB wasted

104

😠 not even counting replication factor …

Page 105: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  ☞ just save contact id as byte[ ]

105

Page 106: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

106

Page 107: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Notes on contact_id •  ☞ just save contact id as byte[ ]

•  Achilles @TypeTransformer for automatic conversion (see later)

•  Use blobAsBigInt( ) or blobAsUUID( ) to view data

107

Page 108: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Advanced "object mapper"

•  Fluent API

•  Tons of features

•  TDD friendly

108

Page 109: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dirty checking, what is it ?

109

Page 110: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dirty checking, what is it ?

•  1 contact ≈ 8 mutable fields

110

Page 111: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dirty checking, what is it ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

111

Page 112: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dirty checking, what is it ?

•  1 contact ≈ 8 mutable fields

•  × 7 denormalizations = 56 update combinations …

•  and not even counting multiple fields updates …

112

Page 113: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

113

Page 114: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Are you going to manually generate 56+ prepared

statements for all possible updates ?

•  Or just use dynamic plain string statements and get some perf penalty ?

114

Page 115: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dirty check in action

115

//No read-before-write ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId); proxy.setFirstName(…); proxy.setLastName(…); //type-safe updates proxy.setAddress(…);

manager.update(proxy);

Page 116: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

116

Empty Entity

DirtyMap

Proxy Setters interception

PrimaryKey

Page 117: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dynamic statements generation

117

UPDATE contacts SET firstname=?, lastname=?,address=? WHERE contact_id=?

prepared statements are cached, of course

Page 118: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Insert strategy, why is it so important ?

118

Page 119: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Simple INSERT prepared statement

119

INSERT INTO contacts(contact_id,name,age,address,gender,avatar,…) VALUES(?, ?, ?, ? … ?);

Page 120: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Runtime values binding

•  some columns are optional

120

preparedStatement.bind(49374,’John DOE’,33, null, null, …, null);

Page 121: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

121

Wait … are you saying inserting null in CQL???

😳

Page 122: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

122

Inserting null ≡ creating tombstones

Page 123: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

123

Inserting null ≡ creating tombstones × 7 denormalizations

Page 124: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

124

Inserting null ≡ creating tombstones × 7 denormalizations

× billions of contacts created

😱 not even counting replication factor …

Page 125: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Simple annotation

125

@Entity(table = "contacts_by_id ») @Strategy(insert = InsertStrategy.NOT_NULL_FIELDS) public class ContactById {

}

Page 126: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Runtime dynamic INSERT statement

126

INSERT INTO contacts(contact_id, name, age, address,) VALUES(:contact_id, :name, :age, :address);

prepared statements are cached, of course

Page 127: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Remember the contactId ⇄ byte[ ] conversion ?

127

@PartitionKey @Column(name = "contact_id") @TypeTransformer(valueCodecClass = ContactIdToBytes.class) private ContactId contactId;

BYOC ☞ Bring Your Own Codec

Page 128: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles

128

public interface Codec<FROM, TO> { Class<FROM> sourceType(); Class<TO> targetType(); TO encode(FROM fromJava) FROM decode(TO fromCassandra); }

Page 129: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dynamic logging in action

129

2014-12-01 14:25:20,554 Bound statement : [INSERT INTO contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES (:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]

2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND (modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL [LOCAL_QUORUM] 2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]

Page 130: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Achilles •  Dynamic logging

•  runtime activation

•  no need to recompile/re-deploy

•  save us hours of debugging

•  TRACE log level ☞ query tracing

130

Page 131: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Take Away

Page 132: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Conditions for success •  Data modeling is crucial

132

Page 133: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

133

Page 134: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

134

Page 135: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

135

Page 136: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Conditions for success •  Data modeling is crucial

•  Double-run strategy & timestamp trick FTW

•  Data type conversion can be tricky

•  Benchmark !

•  Mindset shifts for the team

136

Page 137: Migration Best Practices: From RDBMS to Cassandra without a Hitch

#Cassandra @doanduyhai

Thank You

! " "