Webinar: Strongly Typed Languages and Flexible Schemas

Strongly Typed Languages and Flexible Schemas

Agenda

Strongly Typed Languages

Flexible Schema Databases

Change Management

Strategies

Tradeoffs

Strongly Typed Languages

"A programming language that requires a variable to be defined as well as the variable it is"

Traditional RDMS

create table users (id int, firstname text, lastname text);

Table definition

Column structure

Traditional RDMS

Table with checks

create table cat_pictures(

id int not null,

size int not null,

picture blob not null,

user_id int,

primary key (id),

foreign key (user_id) references users(id));

Null checks

Foreign and Primary key checks

Traditional RDMS

users cat_pictures

Is this Flexible?

• What happens when we need to change the schema?– Add new fields– Add new relations– Change data types

• What happens when we need to scale out our data structure?

Flexible Schema Database

Document Graph Key Value

Flexible Schema

• No mandatory schema definition• No structure restrictions• No schema validation process

We start from code

public class CatPicture {

int size;byte[] blob;

public class User {

int id;String firstname;String lastname;

CatPicture[] cat_pictures;

Document Structure

{ _id: 1234, firstname: 'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ]}

Rich Data Types

Embedded Documents

• Challenges–Different Versions of Documents–Different Structures of Documents–Different Value Types for Fields in

Documents

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

First Version

Second Version

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}]}

Third Version

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }

Different Structure

Different Structures of Documents

Different documents coexisting on the same collection

{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

Within same collection

Different Data Types for Fields

Different documents coexisting on the same collection

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}

{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}

{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}

Same field, different data type

Change Management

Versioning Class Loading

How to set correct data format versioning?

What mechanisms are out there to make this work ?

Strategies

• Decoupling Architectures• ODM'S• Versioning• Data Migrations

Decoupled Architectures

Strongly Coupled

Becomes a mess in your hair…

Coupled Architectures

DatabaseApplication A

Application C

Application B Let me perform some schema

changes!

Decoupled Architecture

DatabaseApplication A API

Application C

Application B

Decoupled Architectures

• Allows the business logic to evolve independently of the data layer

• Decouples the underlying storage / persistency option from the business service

• Changes are "requested" and not imposed across all applications

• Better versioning control of each request and it's mapping

• Reduce impedance between code and Databases• Data management facilitator • Hides complexity of operators• Tries to decouple business complexity with "magic"

recipes

Spring Data

• POJO centric model• MongoTemplate || CrudRepository

extensions to make the connection to the repositories

• Uses annotations to override default field names and even data types (data type mapping)

public interface UserRepository extends MongoRepository<User, Integer>{

public class User {

@Idint id;

@Field("first_name")String firstname;String lastname;

Spring Data Document Structure

{ "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

Spring Data Considerations

• Data formats, versions and types still need to be managed

• Does not solve issues like type validation out-of-box• Can make things more complicated but more

"controllable"@Field("first_name")String firstname;

Morphia

• Data source centric• Will do all the discovery of POJO's for

given package• Also uses annotations to perform

overrides and deal with object mapping

@Entity("users")public class User {

@Idint id;String firstname;String lastname;

morphia.mapPackage("examples.odms.morphia.pojos");

Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");datastore.save(user);

Morphia Document Structure

{ "_id": 1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

Class Definition

Morphia Considerations

• Enables better control at Class loading• Also facilitates, like Spring Data, the field overriding (tags

to define field keys)• Better support for Object Polymorphism

Versioning

Versioning of data structures (specially documents) can be very helpful

Recreate documents over time

Flow Control

Data / Field Multiversion Requirements

Archiving and History Purposes

Versioning – Option 0

Change existing document each time there is a write with monotonically increasing version number inside

{ "_id" : 174, "v" : 1, "firstname": "Juan" }

{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )

Increment field value

Store full document each time there is a write with monotonically increasing version number inside

{ "docId" : 174, "v" : 1, "firstname": "Juan" }

{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.insert( {"docId":174 …})

> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);

Find always latest version

Store all document versions inside a single document.

> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )

Current value

{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ]}

Previous values

Keep collection for "current" version and past versions

> db.users.find( {"_id": 174 })

> db.users_past.find( {"pid": 174 })

{ "pid" : 174, "v" : 1, "firstname": "Juan" }

{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

Previous versions collection

Current collection

Versioning

Schema Fetch 1 Fetch Many Update Recover if Fail

0) Increment Version

Easy, Fast Fast Easy Medium N/A

1) New Document

Easy, Fast Not Easy, Slow

Medium Hard

2) Embedded in Single Doc

Easy, Fastest

Easy, Fastest Medium N/A

3) Separate Collection

Easy, Fastest

Easy, Fastest Medium Medium, Hard

Migrations

Several types of "Migrations":

Add/Remove Fields

Change Field Names

Change Field Data Type

Extract Embedded Document into Collection

Add / Remove Fields

For Flexible Schema Database this is our Bread & Butter

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }

> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })

Change Field Names

Again, programmatically you can do it

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

{ "_id" : 174, "first": "Juan", "last": "Olivo" }

> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })

Change Field Data Type

Align to a new code change and move from Int to String

{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}

1) Batch Process

2) Aggregation Framework

3) Change based on usage

Change Field Data Type1) Batch Process – bulk api

public void migrateBulk(){DateFormat df = new SimpleDateFormat("yyyy-MM-DD");...List<UpdateOneModel<Document>> toUpdate =

new ArrayList<UpdateOneModel<Document>>();for (Document doc : coll.find()){

String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));Document filter = new Document("_id", doc.getInteger("_id"));Document value = new Document("bdate", dateAsString);Document update = new Document("$set", value);

toUpdate.add(new UpdateOneModel<Document>(filter, update));}coll.bulkWrite(toUpdate);

public void migrateBulk(){...for (Document doc : coll.find()){

...}coll.bulkWrite(toUpdate);

Is there any problem with this?

public void migrateBulk(){...//bson type 16 represents int32 data typeDocument query = new Document("bdate", new Document("$type", "16"));for (Document doc : coll.find(query)){

coll.bulkWrite(toUpdate);More efficient filtering!

Extract Document into CollectionNormalize your schema

{"size": 10, picture: BinData("0x133334299399299432")}{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

> db.users.aggregate( [ {$unwind: "$cat_pictures"}, {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, {$out:"cats"}])

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]}

{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}

Tradeoffs

Positives Penalties

Decoupled Architecture - Should be your default approach

- Clean Solution - Scalable

Data Structures Variability - Reflects Nowadays data structures

- You can push decisions for later

- More complex code base

Data Structures Strictness - Simple to maintain- Always aligned with your

code base

- Will eventually need Migrations

- Restricts your code iterations

• Flexible and Dynamic Schemas are a great tool– Use them wisely – Make sure you understand the tradeoffs– Make sure you understand the different strategies and

options

• Works well with Strongly Typed Languages

Free Educationhttps://university.mongodb.com/courses/M101J/about

Obrigado!• Norberto Leite• Technical Evangelist• http://www.mongodb.com/norberto• norberto@mongodb.com• @nleite

Webinar: Strongly Typed Languages and Flexible Schemas

pictures id int

different data types

primary key id

fields different documents

table users id int

data structure

firstname text

schema changes

Technology

2. 추상데이터타입 -...

· Q3) Attempt any four : [4 × 4 = 16] a) What does it...

Strongly-Typed Language Support for Internet- Scale...

Course Notes on From Entity-Relationship Schemas to...

Image Classification with DIGITS · 22 CAFFE FEATURES...

How do I Typed Data?Typed Data • Typed Data API is a meta....

Stanford University Jay Whang and Zach Maurer Python...

Strongly Typed Domain Specific Embedded Languages

Scripted Components · 2011. 2. 10. · Note that the...

F# 3.0: Strongly Typed Programming in the Information Rich.....

CapScript: Strongly Typed ECMAScript subset for Capitual ·...

Strongly typed metadata access in object oriented...

Java Reloaded - Deniz Oguz€¦ · Java, C, Scala are...

Interfaces for Strongly-Typed Object-Oriented...

SoA – Service oriented architecture. Web Services WSDL –...

O’Caml Introdpw/courses/cos326... · Thinking...