Top Banner
MIS 3500 Instructor: Bob Travica Newer DB Topics 2015
11

MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

Dec 26, 2015

Download

Documents

Laurence Oliver
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

MIS 3500Instructor: Bob Travica

Newer DB Topics2015

Page 2: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

2

Big Data

3 big V:

Volume: terabytes (15 zeroes), petabytes (18 zeroes)

Variety: Social media, communications, sensors everywhere*, Internet of Things, video feeds, GPS… Implication: various formats

Velocity: wired and wireless continuous feeds

Page 3: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

3

Goals and Uses

Goals:

Integrate data on the same object across sources (Customer, Citizen etc.; spatial mashups)

Analysis: Existing patterns, Predictive analysis

Application domains:

Monitoring for business & other purposes (sensors)

Marketing (relationship mktg., Sentiment analysis is social media…)

Energy grid management

Transportation networks management

Health (analysis of cancer cell behavior and of patient vital signs)

Science (human genome)

Policy analysis (United Nations’ system for predicting social problems)

Page 4: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

4

Big Data Tasks

Page 5: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

5

Machine-generated data (sensors); automatic creation and transfer *

Home appliances (security, energy consumption, heating, food, entertainment)

Monitoring/Control (cars, athletic equipment, machinery, appliances)*

Example: Smart power grid**

Smart meter; Internet & Wi-Fi connectivity

Page 6: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

6

Technologies

Hadoop (framework for file system and processing of large datasets on server clusters)*

Machine learning – automated construction of models to fit data (instead of hypothesis testing as with DW and Analytics)

Open source

Notable developers: Yahoo, Facebook, Yahoo!, Google, Microsoft

Microsoft Azure-based

Hadoop

Page 7: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

7

DATA

PROCESSING

Page 8: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

8

A database for Big Data

Distributed, non-relational, scalable

Based on Google’s BigTable *

Row Key (reversed URL)

Time Stamp

Column Key – “Anchor” (Family) + URLpart (Qualifier)

"com.cnn.www" t9 anchor:cnnsi.com = "CNN"

"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"

Row Key Time Stamp

Column Key – “Contents” + keyword in tagged content

"com.cnn.www" t6 contents:html = "<html>… "

"com.cnn.www" t5 contents:html = "<html>… "

"com.cnn.www" t3 contents:html = "<html>… "

DATA are cites of “CNN*” Referencing sites

DATA are webpages Compressed. There can be anyNumber of unbound Contents Columns.

All columns put together make a “BigTable”.

Page 9: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

9

NoSQL – Not Only SQL

Page 10: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

10

Modern Database environments

Page 11: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

11

Modern Database environments