Top Banner
Databases in BIG Data --From an application perspective 9/28/2019 1
41

Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Databases in BIG Data--From an application perspective

9/28/2019 1

Page 2: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Dimensions of Big Data

9/28/2019 2Picture from https://www.techentice.com/the-data-veracity-big-data/

Page 3: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Organizing Big Data

• Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• DBMS: The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data.

• Data ecosystem: A data ecosystem is a collection of infrastructure, analytics, and applications used to capture and analyze data. Data ecosystems provide companies with data that they rely on to understand their customers and to make better pricing, operations, and marketing decisions.

9/28/2019 3Definitions from https://www.wikipedia.org/

Page 4: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Different Databases (ecosystem)

9/28/2019 4

Page 5: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 5Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 6: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 6Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 7: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 7Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 8: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 8Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 9: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 9Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 10: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 10Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 11: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

9/28/2019 11Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 12: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MongoDB

9/28/2019 12Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 13: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MongoDB

9/28/2019 13Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 14: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Application

• Lots of data• Copies of messages, reverse indices of messages, per user data.

• Many incoming requests resulting in a lot of random reads and random writes.

9/28/2019 14

Page 15: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

• Lots of data• Copies of messages, reverse indices of messages, per user data.

• Many incoming requests resulting in a lot of random reads and random writes.

9/28/2019 15

Page 16: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

9/28/2019 16Pictures from https://www.guru99.com/cassandra-tutorial.html

Page 17: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

9/28/2019 17Adopted from slides by By Perry Hoekstra, Jiaheng Lu, Avinash Lakshman, Prashant Malik, and Jimmy Lin

Page 18: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

9/28/2019 18Adopted from slides by By Perry Hoekstra, Jiaheng Lu, Avinash Lakshman, Prashant Malik, and Jimmy Lin

Page 19: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

Adopted from slides by By Perry Hoekstra, Jiaheng Lu, Avinash Lakshman, Prashant Malik, and Jimmy Lin

Page 20: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Cassandra

• Lots of data• Copies of messages, reverse indices of messages, per user data.

• Many incoming and time sensitive requests resulting in a lot of random reads and random writes.

9/28/2019 20

Page 21: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Some applications

• Lots of Lots of data• Copies of messages, reverse indices of messages, per user data.

• Many many incoming data and time in-sensitive requests resulting in a lot of random reads and random writes.

9/28/2019 21

Page 22: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop

• Lots of Lots of data• Copies of messages, reverse indices of messages, per user data.

• Many many incoming data and time in-sensitive requests resulting in a lot of random reads and random writes.

9/28/2019 22

Page 23: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop

9/28/2019 23Pictures from https://www.hdfstutorial.com/blog/hadoop-1-vs-hadoop-2-differences/

Page 24: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop

9/28/2019 24

Picture from https://blogs.msdn.microsoft.com/avkashchauhan/2012/02/24/master-slave-architecture-in-hadoop/

Page 25: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop

9/28/2019 25Picture from https://www.guru99.com/introduction-to-mapreduce.html

Page 26: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop

9/28/2019 26Pictures from https://www.hdfstutorial.com/blog/hadoop-1-vs-hadoop-2-differences/

Page 27: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Hadoop Ecosystem

9/28/2019 27Picture from http://blog.newtechways.com/2017/10/apache-hadoop-ecosystem.html

Page 28: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Spark

9/28/2019 28Picture from https://databricks.com/blog/2013/11/21/putting-spark-to-use.html

Page 29: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Spark Ecosystem

9/28/2019 29Picture from https://data-flair.training/blogs/apache-spark-ecosystem-components/

Page 30: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Comparison (Hadoop vs. SQL)

• Data Size

9/28/2019 30Picture from https://www.educba.com/hadoop-vs-sql/

Page 31: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• Scaling

9/28/2019 31

Comparison (Hadoop vs. SQL)

Picture from https://www.educba.com/hadoop-vs-sql/

Page 32: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• Query Speed

9/28/2019 32

Hadoop SQL

Comparison (Hadoop vs. SQL)

Picture from https://www.educba.com/sql-vs-hadoop/

Page 33: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

MySQL (Relational Database)

9/28/2019 33Pictures from MangoDB https://www.youtube.com/watch?v=EE8ZTQxa0AM

Page 34: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• Query Speed

9/28/2019 34

Hadoop SQL

Comparison (Hadoop vs. SQL)

Picture from https://www.educba.com/sql-vs-hadoop/

Page 35: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• Machine Learning Support

9/28/2019 35

Hadoop SQL

Comparison (Hadoop vs. SQL)

Picture from https://www.educba.com/sql-vs-hadoop/

Page 36: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

• Learning Curve

9/28/2019 36

Hadoop SQL

Comparison (Hadoop vs. SQL)

Picture from https://www.educba.com/sql-vs-hadoop/

Page 37: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Which Database is suitable for…

9/28/2019 37

Recording exam grades for student in UCONN CSE Dept?

Page 38: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Which Database is suitable for…

9/28/2019 38

Online AI/machine learning of all car trajectories?(Uber, Lyft)

Page 39: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Which Database is suitable for…

9/28/2019 39

Record and analyze e-commerce customer behavior offline?(Amazon, Target etc.)

Page 40: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Which Database is suitable for…

9/28/2019 40

Realtime gaming platforms? (WOW, PUBG, LOL, Dota2)

Page 41: Databases in BIG Data Applications...Organizing Big Data •Database: A database is an organized collection of data, generally stored and accessed electronically from a computer system.

Which Database is suitable for…

9/28/2019 41

Messaging system (IMessager, Wechat)