YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

MIS 3500Instructor: Bob Travica

Newer DB Topics2015

Page 2: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

2

Big Data

3 big V:

Volume: terabytes (15 zeroes), petabytes (18 zeroes)

Variety: Social media, communications, sensors everywhere*, Internet of Things, video feeds, GPS… Implication: various formats

Velocity: wired and wireless continuous feeds

Page 3: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

3

Goals and Uses

Goals:

Integrate data on the same object across sources (Customer, Citizen etc.; spatial mashups)

Analysis: Existing patterns, Predictive analysis

Application domains:

Monitoring for business & other purposes (sensors)

Marketing (relationship mktg., Sentiment analysis is social media…)

Energy grid management

Transportation networks management

Health (analysis of cancer cell behavior and of patient vital signs)

Science (human genome)

Policy analysis (United Nations’ system for predicting social problems)

Page 4: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

4

Big Data Tasks

Page 5: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

5

Machine-generated data (sensors); automatic creation and transfer *

Home appliances (security, energy consumption, heating, food, entertainment)

Monitoring/Control (cars, athletic equipment, machinery, appliances)*

Example: Smart power grid**

Smart meter; Internet & Wi-Fi connectivity

Page 6: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

6

Technologies

Hadoop (framework for file system and processing of large datasets on server clusters)*

Machine learning – automated construction of models to fit data (instead of hypothesis testing as with DW and Analytics)

Open source

Notable developers: Yahoo, Facebook, Yahoo!, Google, Microsoft

Microsoft Azure-based

Hadoop

Page 7: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

7

DATA

PROCESSING

Page 8: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

8

A database for Big Data

Distributed, non-relational, scalable

Based on Google’s BigTable *

Row Key (reversed URL)

Time Stamp

Column Key – “Anchor” (Family) + URLpart (Qualifier)

"com.cnn.www" t9 anchor:cnnsi.com = "CNN"

"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"

Row Key Time Stamp

Column Key – “Contents” + keyword in tagged content

"com.cnn.www" t6 contents:html = "<html>… "

"com.cnn.www" t5 contents:html = "<html>… "

"com.cnn.www" t3 contents:html = "<html>… "

DATA are cites of “CNN*” Referencing sites

DATA are webpages Compressed. There can be anyNumber of unbound Contents Columns.

All columns put together make a “BigTable”.

Page 9: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

9

NoSQL – Not Only SQL

Page 10: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

10

Modern Database environments

Page 11: MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

11

Modern Database environments


Related Documents