Top Banner
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411 http://www.cse.unsw.edu.au/~sbeheshti/ COMP9321, 15s2, Week 11 Tuesday, 13 October 2015
42

Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Jun 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

COMP9321 Web Application EngineeringSemester 2, 2015

Dr. Amin BeheshtiService Oriented Computing Group,

CSE, UNSW Austral ia

Week 11( P a r t I I )

http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411http://www.cse.unsw.edu.au/~sbeheshti/

COMP9321, 15s2, Week 11 Tuesday, 13 October 2015

Page 2: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Big Data: Challenges and Opportunities

http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411http://www.cse.unsw.edu.au/~sbeheshti/

COMP9321, 15s2, Week 11

http://www.intelli3.com/

Page 3: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

We are Generating Vast Amounts of Data !!

Healthcare

Remote patient monitoring

Manufacturing

Product sensors

Location-Based Services

Real time location data

Retail

Social media…

Digitalization of Artefacts

books, music, videos, etc.

3

COMP9321, 15s2, Week 11

Page 4: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

We are Generating Vast Amounts of Data !!

Air Bus A380: generate 10 TB every 30 min

Twitter: Generate approximately 12 TB of data per day.

Facebook: Facebook data grows by over 500 TB daily.

New York Stock: Exchange 1TB of data everyday.

4

COMP9321, 15s2, Week 11

Page 5: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

5

COMP9321, 15s2, Week 11

Page 6: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

We are Tracing everything: Who did What? When? Where? …

e.g. Twitter handles ~1.6 billion search queries per day.

6

COMP9321, 15s2, Week 11

Page 7: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

7

COMP9321, 15s2, Week 11

Page 8: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.

Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.

Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.

We are Generating Vast Amounts of Meta-data !!8

COMP9321, 15s2, Week 11

Page 9: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.

Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.

Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.

We are Generating Vast Amounts of Meta-data !!9

COMP9321, 15s2, Week 11

Page 10: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Big Data and Big Meta-Data

share, comment, review,crowdsource, etc.

10

COMP9321, 15s2, Week 11

Big

Page 11: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

So, What is Big Data?

Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!

Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

11

COMP9321, 15s2, Week 11

Page 12: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

So, What is Big Data?

Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!

Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

12

COMP9321, 15s2, Week 11

Page 13: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

So, What is Big Data?

Big data refers to our ability to collect and analysethe ever expanding amounts of data and meta-datathat we are generating every second!

Challenges: Capture,Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

13

COMP9321, 15s2, Week 11

Page 14: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

What Makes it Big Data?

Volume the vast amounts of data generated every second.

Velocity the speed at which new data is generated and moves around.

Variety the increasingly different types of data.

Veracity the quality of data, e.g. the messiness of the data. Needs detecting and correcting noisy and inconsistent data

Value Statistical, Events, Correlation, Hypothetical

14

COMP9321, 15s2, Week 11

Page 15: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: How to Store and Process?

Big data is high volume, high velocity, and/or high variety information assets.

Require new forms of storage and processing.

On-hand database management tools?

Traditional data processing applications?

15

COMP9321, 15s2, Week 11

Page 16: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Storage

NoSQL databases:

Employs less constrained consistency models. Simple retrieval and appending operations. Significant performance benefits.

Examples:• Key–value Store• Document Store• Graph Database• …

16

COMP9321, 15s2, Week 11

Page 17: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Storage17(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Docs

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

COMP9321, 15s2, Week 11

Page 18: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Storage18(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Docs

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

Page 19: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Storage19(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Docs

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

Page 20: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Storage20(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Docs

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

Page 21: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Processing

Apache Hadoop: Hadoop is an open source framework that uses a simple

programming model to enable distributed processing oflarge data sets on clusters of computers.

21

Who Use Hadoop?

Amazon Facebook Google IBM New York Times Yahoo! …

Apache Hadoop solution:• Distributed File System (HDFS)• MapReduce• Pig• HCatalog

COMP9321, 15s2, Week 11

Page 22: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Processing

Apache Spark:22

Efficient In-memory storage

Usable Rich APIs in Java,

Scala, Python

Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop

COMP9321, 15s2, Week 11

Page 23: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Processing

Apache Spark:23

Efficient In-memory storage

Usable Rich APIs in Java,

Scala, Python

Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop

COMP9321, 15s2, Week 11Resilient Distributed Dataset (RDD), Spark's data storage model

Page 24: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Integration

PeopleWeb ServicesIT SystemsWorkflows

Example Scenario: Business Processes (BPs)

..

24

BPsExecution

Log

COMP9321, 15s2, Week 11

Page 25: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Integration

PeopleWeb ServicesIT SystemsWorkflows

Example Scenario: Business Processes (BPs)

..

25

BPsExecution

Log

COMP9321, 15s2, Week 11

Page 26: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Integration

Messy, schema-less and complex Big Data world. Less than 10% of Big Data world are genuinely

relational.

e.g. Linked Data

26

COMP9321, 15s2, Week 11

Page 27: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Integration

Big Data-as-a-Service: Effective processing of big data within acceptable

processing time Easy access of the big data and the big data analysis

results

27

COMP9321, 15s2, Week 11

API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;

• www.programmableweb.com/

• DataSift….open data sources

API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;

• www.programmableweb.com/

• DataSift….open data sources

Page 28: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Reminder28

COMP9321, 15s2, Week 8

Seminars: API Engineering and Micro-Services

Thursday, 15 October from 15:00-17:00;Where: UNSW, Mathews Theatre D.

Two interesting talks: • API Engineering (Scientia Prof. Boualem Benatallah).• Micro-services (Mr. Graham Lea).

Page 29: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big data requires a broad set of skills29

COMP9321, 15s2, Week 11

Math and Operations Research Expertise

Develop analytic algorithms

VisualizationExpertise

Interpret data sets, determine correlations andpresent in meaningful ways

Tool DevelopersMask complexity and analytics to lower skills

boundaries

Industry VerticalDomain Expertise

Develop hypothesis, identifyrelevant business issues,

ask the right questions

Data Experts

Data architecture, management,

governance, policy

Decision MakingExecutive andManagement

Apply information to solvebusiness issues

Page 30: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics

Analytics can be defined in many ways, but what matters is the purpose of analytics.

Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.

30

Analyse Decide

Data Insight Action

COMP9321, 15s2, Week 11

Manage the Data Understand the Data Act on the Data

Page 31: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics

Analytics can be defined in many ways, but what matters is the purpose of analytics.

Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.

31

Analyse Decide

Data Insight Action

COMP9321, 15s2, Week 11

Manage the Data Understand the Data Act on the Data

Page 32: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 32

COMP9321, 15s2, Week 11

Page 33: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 33

COMP9321, 15s2, Week 11

Page 34: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 34

Example:• Beheshti et al., “Scalable Graph-based OLAP Analytics over Process Execution

Data”, DAPD Journal (2015).• Beheshti et al., “A Framework and a Language for On-Line Analytical Processing

on Graphs”, WISE Conference (2012).

OLAP, is an approach to answering multi-dimensional analytical queries swiftly.

Problem: • extension of existing OLAP techniques to analysis

of graphs is not straightforward.• key business insights remain hidden in the

interactions among objects.

Solution:• On-Line Analytical Processing on Graphs

COMP9321, 15s2, Week 11

Page 35: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 35

COMP9321, 15s2, Week 11

Page 36: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 36

Big Data Analytics benefits from:• NLP• Machine Learning

• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.

COMP9321, 15s2, Week 11

Examples:

• Healthcare• Social Networks

• e.g. Twitter• Education• Finance• …

Page 37: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Challenges: Big Data Analytics 37

Big Data Analytics benefits from:• NLP• Machine Learning

• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.

Beheshti , et al., “Big data and cross-document coreference resolution: Current state and future opportunities”...

COMP9321, 15s2, Week 11

Page 38: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Big Data Leadership !!

Industry has been in the lead Google, Amazon, Yahoo!, etc.

University researchers have been left behind !! due to lack of access to large-scale cluster computing

facilities

Government agencies are making heavy investments Investments in big-data computing will have extraordinary

near-term and long-term benefits. Cloud computing must be considered a strategic resource

38

COMP9321, 15s2, Week 11

Page 39: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Big Data: Opportunities39

COMP9321, 15s2, Week 11

• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors

• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Data Analytics• Support decision-making

• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph

• Big Data Performance• In memory• New Benchmarks and Architecture

• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling

Page 40: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Big Data: Opportunities40

COMP9321, 15s2, Week 11

• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors

• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Analytics• Support decision-making

• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph

• Big Data Performance• In memory• New Benchmarks and Architecture

• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling

Page 41: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

Conclusion

Why Big Data is different from past Very Large Datasets? Meta-Data !!

Having the ability to analyse Big Data is of limited value if users cannot understand the analysis.

How can the industry and academia collaborate towards solving Big Data challenges!!

What is big today maybe not be big tomorrow!

41

COMP9321, 15s2, Week 11

Page 42: Dr. Amin Beheshti - cse.unsw.edu.aucs9321/15s2/lectures/lec11/Lec-11-part2.pdf · NoSQL databases: Employs less constrained consistency models. Simple retrieval and appending operations.

42

COMP9321, 15s2, Week 11

Thank you!