2010 Workshop on Massive Data Analytics on the Cloud (MDAC 2010) April 26, 2010 Raleigh, NC, USA In association with the 19th Annual World Wide Web Conference (WWW2010)
Mar 27, 2015
2010 Workshop on Massive Data Analytics on the
Cloud(MDAC 2010)
April 26, 2010Raleigh, NC, USA
In association with the 19th Annual World Wide Web Conference (WWW2010)
Dashboards
Embedded Analytics
Financial Planning
Mash ups
Scorecards
Search
Making Sense of Mountains of Data
Billions of mobile devicesSemi-struct
ClickSteam, CRM Claim data (text,
picture, video) Call data records Location Tracking
(GPS), iPhone, Vehicle Use
Data, $ Trans tracking
(Across borders & IP providers),
Feeds: Census Bureau
Data Market Data,
Weather Data Sensors data
Online Transaction Processing System
PetaBytes -> Exabytes
Auto/CrossCorrelationAnalytics, Predictive Analytics
Deep & WideAnalytics
Fine grained – individual product and customer at a
time and place
Feedback/Action
Semi-Un-struct
Structured
Continuous arrival of high volume information (evolving, highly variant)(struct-/semi--/un-structured
Web Data (for search)
Web Buz data (for reputation analysis)
Sem
i-U
n-s
tru
ct
Massive Data Analytic Platforms• Google: Original MapReduce implementation• Microsoft: Dryad• Yahoo!, Facebook, and many others: Hadoop
• Ecosystems: Hive, Pig, Jaql, Zookeeper,
• Alternatives to Map/Reduce, e.g. Pregel
M
M
M
R
R
Pa
rtiti
on
So
rtC
C
C
• “Easy” parallelism• Scalability• Fault-Tolerance • Elastic• Flexibility• Cost / Performance
• 1000’s processors• Petabytes of data
• …and growing
Chairpeople Perspective
• Other parallel systems technology and customers– Parallel Database – enterprise data warehousing– Parallel ETL (extraction, transformation, load)– Search and text analytics
• Hadoop and related technologies– Finance, Telco, Healthcare, Retail, Government, …
Questions Posed in Call For Papers
• What kinds of problems are people trying to solve?
• How are existing massive-scaleout platforms used, and what extensions would be helpful?
• Other kinds of platforms for different problems?
• How to integrate with existing environments such as data warehouses?
• Challenges in managing massive datasets?
• Legal/moral challenges associated with mining these data sets?
Agenda (morning)9:00 - 10:30: Session 1
Introduction and Welcome
Invited Talk: "Hadoop: An Industry Perspective"Dr. Amr Awadallah, CTO, VP-Engineering, Cloudera
10:30 - 11:00: Coffee Break*
11:00 - 12:30: Session 2Distributed Indexing of Web Scale Datasets for the Cloud
Ioannis Konstantinou, Evangelos Angelou, Dimitrios Tsoumakos, Nectarios Koziris; National Technical University of Athens
Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-ReduceJoos-Hendrik Böse1, Artur Andrzejak2, Mikael Högqvist2; 1Intl. Comp. Sci. Institute, 2Zuse Institute Berlin (ZIB)
Efficient Updates for a Shared Nothing Analytics PlatformKaterina Doka3, Dimitrios Tsoumakos4, Nectarios Koziris3; 3National Technical
Universityof Athens, Greece, 4University of Cyprus
12:30 - 1:30: Lunch*
Agenda (afternoon)1:30 - 3:30: Session 3
Invited Talk: "Large Scale Applications on Hadoop in Yahoo"Dr. Vijay Narayanan, Yahoo! Labs Silicon Valley,
Extracting User Profiles from Large Scale DataMichal Shmueli-Scheuer, Haggai Roitman, David Carmel, Yosi Mass, David Konopnicki; IBM Research, Haifa
A Novel Approach to Multiple Sequence Alignment using Hadoop Data GridsSudha Sadasivam, G. Baktavatchalam; PSG College of Technology
3:30 - 4:00: Coffee Break*
4:00 - 5:30: Session 4
Towards Scalable RDF Graph Analytics on MapReduce Padmashree Ravindra, Vikas Deshpande, Kemafor Anyanwu; North Carolina State University
SPARQL Basic Graph Pattern Processing with Iterative MapReduce Jaeseok Myung, Jongheum Yeon, Sang-goo Lee; Seoul National University
Parallelizing Random Walk with Restart for Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng; National Chiao Tung
UniversityHsinchu, Taiwan
Acknowledgements
Workshop ChairsUllas Nambiar, IBM India Research
Lab, New Delhi, IndiaJohn McPherson, IBM Almaden
Research Center, USADavid Konopnicki, IBM Haifa Research
Lab, Israel
Steering CommitteeRakesh Agrawal, Microsoft Search
Labs, Mountain View, CA, USA Alon Halevy, Google Inc., Mountain
View, CA, USA
Invited SpeakersAmr Awadallah, CTO, VP-Engineering,
Cloudera, "Hadoop: An Industry Perspective"
Vijay Narayanan, Yahoo! Labs Silicon Valley, "Large Scale User Modeling on Hadoop"
Program CommitteeAmr Awadallah, Cloudera, USAAndrew McCallum, University of Massachusetts Amherst, USAAssaf Schuster, Technion - Israel Institute of TechnologyGautam Das, University of Texas, Arlington, USAJimeng Sun, IBM Watson Research Center, USAJohn Shafer, Microsoft Search Labs, USAKevin Chang, University of Illinois at Urbana-Champaign, USAKun Liu, Yahoo! Labs, USALouiqa Raschid, University of Maryland, College Park, USAMichal Shmueli-Scheuer, IBM Haifa Research Lab, IsraelMichael Sheng, University of Adelaide, AustraliaMong Li Lee, National University of Singapore, SingaporeRajeev Gupta, IBM India Research Lab, IndiaVanja Josifovski, Yahoo Research, USAYannis Sismanis, IBM Almaden Research Center, USAYi Chen, Arizona State University, USAWen-syan Li, SAP, China