MIS 3500 Instructor: Bob Travica Newer DB Topics 2015
Dec 26, 2015
2
Big Data
3 big V:
Volume: terabytes (15 zeroes), petabytes (18 zeroes)
Variety: Social media, communications, sensors everywhere*, Internet of Things, video feeds, GPS… Implication: various formats
Velocity: wired and wireless continuous feeds
3
Goals and Uses
Goals:
Integrate data on the same object across sources (Customer, Citizen etc.; spatial mashups)
Analysis: Existing patterns, Predictive analysis
Application domains:
Monitoring for business & other purposes (sensors)
Marketing (relationship mktg., Sentiment analysis is social media…)
Energy grid management
Transportation networks management
Health (analysis of cancer cell behavior and of patient vital signs)
Science (human genome)
Policy analysis (United Nations’ system for predicting social problems)
5
Machine-generated data (sensors); automatic creation and transfer *
Home appliances (security, energy consumption, heating, food, entertainment)
Monitoring/Control (cars, athletic equipment, machinery, appliances)*
Example: Smart power grid**
Smart meter; Internet & Wi-Fi connectivity
6
Technologies
Hadoop (framework for file system and processing of large datasets on server clusters)*
Machine learning – automated construction of models to fit data (instead of hypothesis testing as with DW and Analytics)
Open source
Notable developers: Yahoo, Facebook, Yahoo!, Google, Microsoft
Microsoft Azure-based
Hadoop
8
A database for Big Data
Distributed, non-relational, scalable
Based on Google’s BigTable *
Row Key (reversed URL)
Time Stamp
Column Key – “Anchor” (Family) + URLpart (Qualifier)
"com.cnn.www" t9 anchor:cnnsi.com = "CNN"
"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"
Row Key Time Stamp
Column Key – “Contents” + keyword in tagged content
"com.cnn.www" t6 contents:html = "<html>… "
"com.cnn.www" t5 contents:html = "<html>… "
"com.cnn.www" t3 contents:html = "<html>… "
DATA are cites of “CNN*” Referencing sites
DATA are webpages Compressed. There can be anyNumber of unbound Contents Columns.
All columns put together make a “BigTable”.