This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
COMP9321 Web Application EngineeringSemester 2, 2015
Dr. Amin BeheshtiService Oriented Computing Group,
Challenges: Big Data Storage17(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
COMP9321, 15s2, Week 11
Challenges: Big Data Storage18(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Storage19(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Storage20(Graphs are Everywhere)
Use
r
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
Challenges: Big Data Processing
Apache Hadoop: Hadoop is an open source framework that uses a simple
programming model to enable distributed processing oflarge data sets on clusters of computers.
21
Who Use Hadoop?
Amazon Facebook Google IBM New York Times Yahoo! …
Apache Hadoop solution:• Distributed File System (HDFS)• MapReduce• Pig• HCatalog
COMP9321, 15s2, Week 11
Challenges: Big Data Processing
Apache Spark:22
Efficient In-memory storage
Usable Rich APIs in Java,
Scala, Python
Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop
COMP9321, 15s2, Week 11
Challenges: Big Data Processing
Apache Spark:23
Efficient In-memory storage
Usable Rich APIs in Java,
Scala, Python
Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop
COMP9321, 15s2, Week 11Resilient Distributed Dataset (RDD), Spark's data storage model
Challenges: Big Data Integration
PeopleWeb ServicesIT SystemsWorkflows
Example Scenario: Business Processes (BPs)
..
24
BPsExecution
Log
COMP9321, 15s2, Week 11
Challenges: Big Data Integration
PeopleWeb ServicesIT SystemsWorkflows
Example Scenario: Business Processes (BPs)
..
25
BPsExecution
Log
COMP9321, 15s2, Week 11
Challenges: Big Data Integration
Messy, schema-less and complex Big Data world. Less than 10% of Big Data world are genuinely
relational.
e.g. Linked Data
26
COMP9321, 15s2, Week 11
Challenges: Big Data Integration
Big Data-as-a-Service: Effective processing of big data within acceptable
processing time Easy access of the big data and the big data analysis
results
27
COMP9321, 15s2, Week 11
API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;
• www.programmableweb.com/
• DataSift….open data sources
API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;
• www.programmableweb.com/
• DataSift….open data sources
Reminder28
COMP9321, 15s2, Week 8
Seminars: API Engineering and Micro-Services
Thursday, 15 October from 15:00-17:00;Where: UNSW, Mathews Theatre D.
Two interesting talks: • API Engineering (Scientia Prof. Boualem Benatallah).• Micro-services (Mr. Graham Lea).
Challenges: Big data requires a broad set of skills29
COMP9321, 15s2, Week 11
Math and Operations Research Expertise
Develop analytic algorithms
VisualizationExpertise
Interpret data sets, determine correlations andpresent in meaningful ways
Tool DevelopersMask complexity and analytics to lower skills
boundaries
Industry VerticalDomain Expertise
Develop hypothesis, identifyrelevant business issues,
ask the right questions
Data Experts
Data architecture, management,
governance, policy
Decision MakingExecutive andManagement
Apply information to solvebusiness issues
Challenges: Big Data Analytics
Analytics can be defined in many ways, but what matters is the purpose of analytics.
Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.
30
Analyse Decide
Data Insight Action
COMP9321, 15s2, Week 11
Manage the Data Understand the Data Act on the Data
Challenges: Big Data Analytics
Analytics can be defined in many ways, but what matters is the purpose of analytics.
Most definitions agree on the following: Analytics is used to gain insights from data in order tomake better decisions, using mathematical or scientificmethods.
31
Analyse Decide
Data Insight Action
COMP9321, 15s2, Week 11
Manage the Data Understand the Data Act on the Data
Challenges: Big Data Analytics 32
COMP9321, 15s2, Week 11
Challenges: Big Data Analytics 33
COMP9321, 15s2, Week 11
Challenges: Big Data Analytics 34
Example:• Beheshti et al., “Scalable Graph-based OLAP Analytics over Process Execution
Data”, DAPD Journal (2015).• Beheshti et al., “A Framework and a Language for On-Line Analytical Processing
on Graphs”, WISE Conference (2012).
OLAP, is an approach to answering multi-dimensional analytical queries swiftly.
Problem: • extension of existing OLAP techniques to analysis
of graphs is not straightforward.• key business insights remain hidden in the
interactions among objects.
Solution:• On-Line Analytical Processing on Graphs
COMP9321, 15s2, Week 11
Challenges: Big Data Analytics 35
COMP9321, 15s2, Week 11
Challenges: Big Data Analytics 36
Big Data Analytics benefits from:• NLP• Machine Learning
• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.
COMP9321, 15s2, Week 11
Examples:
• Healthcare• Social Networks
• e.g. Twitter• Education• Finance• …
Challenges: Big Data Analytics 37
Big Data Analytics benefits from:• NLP• Machine Learning
• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.
Beheshti , et al., “Big data and cross-document coreference resolution: Current state and future opportunities”...
COMP9321, 15s2, Week 11
Big Data Leadership !!
Industry has been in the lead Google, Amazon, Yahoo!, etc.
University researchers have been left behind !! due to lack of access to large-scale cluster computing
facilities
Government agencies are making heavy investments Investments in big-data computing will have extraordinary
near-term and long-term benefits. Cloud computing must be considered a strategic resource
38
COMP9321, 15s2, Week 11
Big Data: Opportunities39
COMP9321, 15s2, Week 11
• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors
• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Data Analytics• Support decision-making
• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph
• Big Data Performance• In memory• New Benchmarks and Architecture
• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling
Big Data: Opportunities40
COMP9321, 15s2, Week 11
• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors
• Analytics• Organizing Big Data• Navigating through data• Summarizing Big Data• Process Analytics• Support decision-making
• Integration• Integrating enterprise and public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph
• Big Data Performance• In memory• New Benchmarks and Architecture
• User Experience• automation and intelligent guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling
Conclusion
Why Big Data is different from past Very Large Datasets? Meta-Data !!
Having the ability to analyse Big Data is of limited value if users cannot understand the analysis.
How can the industry and academia collaborate towards solving Big Data challenges!!