Top Banner
Big Data Storages
19

Tatyana Matvienko,Senior Java Developer, Big data storages

Jan 24, 2017

Download

Education

Alina Vilk
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tatyana Matvienko,Senior Java Developer, Big data storages

Big Data Storages

Page 2: Tatyana Matvienko,Senior Java Developer, Big data storages

Agenda[Big]Data Source: when it becomes Big?What cluster is? Horizontal and vertical scaling[Big]Data Storage challengesDisadvantagesNoSQL = Not only SQLMost popular and trendyTech Example: Apache Cassandra architectureDemo

Page 3: Tatyana Matvienko,Senior Java Developer, Big data storages

Big Data Storage ConceptsOnly stores facts (events), doesn’t analyze itImmutableTime series data (based on timestamps and, maybe,

origin)Store everything, delete nothing

Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations

Page 4: Tatyana Matvienko,Senior Java Developer, Big data storages

Cluster. Horizontal and vertical scalingWhat cluster is?Load balancerCommunication:

master/slave architecture

Fault tolerance and replication factor

Page 5: Tatyana Matvienko,Senior Java Developer, Big data storages

Size (keep and search huge amount of data)

Speed (data acquisition, data search)

Availability (fault tolerance, partition tolerance)

Big Data Storage Challenges

Page 6: Tatyana Matvienko,Senior Java Developer, Big data storages

Disadvantages of Big Data Storages

No transactions (ACID)Less matureBig variety of concepts, lack of standardizationNo BI or analytics in queriesAdministration

Page 7: Tatyana Matvienko,Senior Java Developer, Big data storages

Distributed File storage

Amazon

Page 8: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 9: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Key-Value

Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB

Page 10: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Document oriented

Examples: Apache CouchDB, Couchbase, MongoDB

Page 11: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Graphs

Examples: Allegro, Neo4J, OrientDB, Titan

Page 12: Tatyana Matvienko,Senior Java Developer, Big data storages

Storages: Column basedExamples: Cassandra, HBase, Accumulo, Vertica

Page 13: Tatyana Matvienko,Senior Java Developer, Big data storages

Why Cassandra?

Page 14: Tatyana Matvienko,Senior Java Developer, Big data storages

Apache Cassandra: basicsMasterless architecture with read/write anywhere design

All nodes are the same

No single point of failure

Zone support

Linear scalability

CQL - cassandra query language

Availability and Partition Tolerance but Eventual Consistency

Page 15: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 16: Tatyana Matvienko,Senior Java Developer, Big data storages

Partitioning and Replication

Page 17: Tatyana Matvienko,Senior Java Developer, Big data storages

Data modeling

Page 18: Tatyana Matvienko,Senior Java Developer, Big data storages
Page 19: Tatyana Matvienko,Senior Java Developer, Big data storages

Demo