Top Banner
NoSQL Databases Introduction f [email protected] October, 2013
21

NoSQL Databases Introduction - UTN 2013

Jan 27, 2015

Download

Technology

facundis

This was one of the workshop that we gave at the UTN University, to the students of Computer Science.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NoSQL Databases Introduction - UTN 2013

NoSQL Databases

Introduction

[email protected]

October, 2013

Page 2: NoSQL Databases Introduction - UTN 2013

Agenda

Introduction

SQL overview

Why NoSQL?

Characteristics of NoSQL databases

Use Cases

A NoSQL database in action!

Summary

Page 3: NoSQL Databases Introduction - UTN 2013

Introduction

A database is an organized collection of data. The data are

typically organized to model relevant aspects of reality in a way

that supports processes requiring this information.

Management systems (DBMSs) are specially designed applications

that interact with the user, other applications, and the database

itself to capture and analyze data.

Formally, the term database refers to the data itself and

supporting data structures. Databases are created to operate

large quantities of information by inputting, storing, retrieving,

and managing that information.

Page 4: NoSQL Databases Introduction - UTN 2013

SQL Databases

Page 5: NoSQL Databases Introduction - UTN 2013

Characteristics

SQL is an ANSI and ISO standard computer language for creating and manipulating databases.

SQL allows the user to create, update, delete, and retrieve data from a database.

SQL is very simple and easy to learn.

High Speed: SQL Queries can be used to retrieve large amounts ofrecords from a database quickly and efficiently.

Well Defined Standards Exist: SQL databases use long-established standard,which is being adopted by ANSI & ISO. Non-SQL databases do not adhere to any clear standard.

No Coding Required: Using standard SQL it is easier to manage databasesystems without having to write substantial amount of code.

Transactions – ACID Properties (Atomic, Consistent, Isolated, Durable)

Page 6: NoSQL Databases Introduction - UTN 2013

What has happened?

Relational databases were introduced into the 1970s to allow applications tostore data through a standard data modeling and query language (SQL). Sincethe rise of the web, the volume of data stored about users, objects,products and events has exploded. Data is also accessed more frequently,and is processed more intensively – for example, social networks createhundreds of millions of customized, real-time activity feeds for users basedon their connections' activities.

In response to this demand, computing infrastructure and deploymentstrategies have also changed dramatically. Low-cost, commodity cloudhardware has emerged to replace vertical scaling on highly complex andexpensive single-server deployments. And engineers now use agiledevelopment methods, which aim for continuous deployment and shortdevelopment cycles, to allow for quick response to user demand forfeatures.

Page 7: NoSQL Databases Introduction - UTN 2013

NoSQL Databases

Page 8: NoSQL Databases Introduction - UTN 2013

But.. What’s NoSQL?

A NoSQL database provides a

mechanism for storage and retrieval

of data that employs less constrained

consistency models than traditional

relational databases.

NoSQL systems are also referred to as

"Not only SQL" to emphasize that

they do in fact allow SQL-like query

languages to be used.

Page 9: NoSQL Databases Introduction - UTN 2013

Characteristics Large data volumes (such as Google’s big data’)

Scalable replication and distribution

Potentially thousands of machines

Potentially distributed around the world

Queries need to return answers quickly

Mostly query, few updates

Asynchronous Inserts & Updates

Schema-less

ACID transaction properties are not needed – BASE (Basically Available, Soft-

State, Eventually Consistent).

CAP Theorem

Open source development

Page 10: NoSQL Databases Introduction - UTN 2013

CAP Theorem

According to the theorem, a distributed

system cannot satisfy all three of these

guarantees at the same time.

Eventual consistency guarantees that if no

new updates are made to a given data item,

eventually all accesses to that item will

return the last updated value.

Page 11: NoSQL Databases Introduction - UTN 2013

Taxonomy

The basic classification that most would

agree on is based on data model. A few

of these and their prototypes are:

Column: HBase, Accumulo

Document: MongoDB, Couchbase

Key-value : Dynamo, Riak, Redis, Cache,

Project Voldemort

Graph: Neo4J, Allegro, Virtuoso

Page 12: NoSQL Databases Introduction - UTN 2013

MapReduce

A MapReduce program is composed of a Map() procedure that performsfiltering and sorting (such as sorting students by first name into queues, onequeue for each name) and a Reduce() procedure that performs a summaryoperation (such as counting the number of students in each queue, yieldingname frequencies).

Page 13: NoSQL Databases Introduction - UTN 2013

NoSQL is not a magic solution

Inconsistent APIs between NoSQL providers.

Denormalized data requires you to maintain you own data relationships

in code.

Not a lot of real operational power for DevOps / IT.

Lack of complicated queries requires joins / aggregations / filters to be

done in code (except for MapReduce).

Need whole value from the key to read or write any partial information.

Page 14: NoSQL Databases Introduction - UTN 2013

NoSQL Use Cases:

SAP uses MongoDB as a core component of SAP’s platform- as-a-service

(PaaS) offering.

Foursquare uses MongoDB to store venues and user ‘check-ins’ into

venues, sharding the data over more than 25 machines on Amazon EC2.

MongoDB is used for back-end storage on the SourceForge front pages,

project pages, and download pages for all projects.

Codecademy is the easiest way to learn to code online.

Guardian.co.uk is a leading UK-based news website.

EA Sports: MongoDB is being used for the game feeds component.

Page 15: NoSQL Databases Introduction - UTN 2013

NoSQL Use Cases:

AOL: “We selected Couchbase after evaluating several open source products to power our next-generation backend ad serving platform”.

Zynga’s FarmVille, Café World, Mafia Wars and other games have over 235 million active users per month. We rely on technology from Couchbase to make that possible.

In the PayPal Media Network Advertising Pipeline, Couchbase is used to build a scalable cross channel audience profiling, segmentation, identity mapping & frequency capping.

LinkedIn built a durable and scalable index for it's metrics visualization engine using Couchbase.

Skyscanner scaled one of its flight search APIs from 100,000 searches a day to over 3 million, introducing Couchbase on its tech stack.

Page 16: NoSQL Databases Introduction - UTN 2013

Another use cases..

Netflix is using Amazon SimpleDB. Link

Twitter uses Cassandra, Hadoop, Hbase, amont others. Link

Facebook and Instagram, are both using Cassandra.

Google uses BigTable (equivalent to Hadoop HBase).

LinkedIn uses Voldemort.

Etc

Page 17: NoSQL Databases Introduction - UTN 2013

Summary

This is just the tip of an iceberg.

Now on, the rest it’s on you!

SQL works great, cant scale for

large data.

NoSQL works great, cant fit for

all.

Use SQL + NoSQL

Page 19: NoSQL Databases Introduction - UTN 2013

Thanks!

Page 20: NoSQL Databases Introduction - UTN 2013

Backup

Page 21: NoSQL Databases Introduction - UTN 2013

JSON

JSON or JavaScript Object Notation, is a text-based open standard

designed for human-readable data interchange. Derived from the

JavaScript scripting language, JSON is a language for representing simple

data structures and associative arrays, called objects. Despite its

relationship to JavaScript, JSON is language-independent, with parsers

available for many languages.

Sample: