Top Banner
CS226 Big-Data Management Instructor: Ahmed Eldawy 09/28/2018 1
24

CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

CS226

Big-Data Management

Instructor: Ahmed Eldawy

09/28/2018 1

Page 2: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Welcome (back) to UCR!

09/28/2018 2

Page 3: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Class information

Classes: Monday, Wednesday, Friday 2:10 –

3:00 PM at WCH 142

Instructor: Ahmed Eldawy

Office hours: Monday & Wednesday

4:00-5:00 PM @357 WCH. Conflicts?

Website:

http://www.cs.ucr.edu/~eldawy/18FCS226/

iLearn (Any UCRX students?)

Email: [email protected]

Subject: “[CS226] …”

09/28/2018 3

Page 4: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Course work

Active participation in the class (5%)

Reading and review tasks (10%)

Class presentation (15%)

Assignments (20%)

Project (50%)

09/28/2018 4

Page 5: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Project

Groups of 3-4 students

Group Selection

Project proposal

Literature survey

Report outline

Final report

Project presentation

09/28/2018 5

Page 6: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Course goals

What are your goals?

Understand what big data means

Identify the internal components of big data

platforms

Recognize the differences between different

big data platforms

Explain how a distributed query runs on big

data

09/28/2018 6

Page 8: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big-data Expert

Understand how the big-data platforms really

work

Control those thousands of processors

efficiently to carry out your task

09/28/2018 8

Page 9: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Syllabus

Overview of big data

Big-data storage

Big-data processing

Big-data indexing

Big-SQL processing

Programming packages

09/28/2018 9

Page 10: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Introduction

09/28/2018 10

Page 11: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

09/28/2018 11

Page 12: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

09/28/2018 12

Page 13: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Jan 2012: World Economic Forum Report

09/28/2018 13

Page 14: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Interest in Big Data in the US

■March 2012: Obama administration

unveils BIG DATA initiative: $200 Million

in R&D investment

■ June 2013:

Washington

Post is calling

Obama “The Big

Data President”

09/28/2018 14

Page 15: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Interest in Big Data in Europe

March 2014: David Cameron and Angela Merkel talking about

Big Data in a Computer Expo in Hannover, Germany

09/28/2018 15

Page 16: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

The Market of Big Data

09/28/2018 16

Page 17: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Four Three V’s of Big Data

09/28/2018 17

Page 18: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Vs Big Computation

Full scans (e.g., log processing)

Range scans

Point lookups

Iterations

Joins (self, binary, or multiway)

Proximity queries

Closures and graph traversals

09/28/2018 18

Page 19: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Applications

Web search

Marketing and advertising

Data cleaning

Knowledge base

Information retrieval

Internet of Things (IoT)

Visualization

Behavioral studies

09/28/2018 19

Page 20: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Publicly Available Datasets

Data.gov

Data.gov.uk

Twitter Streaming API

Yahoo! Webscope

[http://webscope.sandbox.yahoo.com/]

GDELT [http://www.gdeltproject.org/]

Instagram API

09/28/2018 20

Page 21: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Landscape 2012

09/28/2018 http://mattturck.com/2012/06/29/a-chart-of-the-big-data-ecosystem/21

Page 22: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Landscape 2014

09/28/2018 http://mattturck.com/2014/05/11/the-state-of-big-data-in-2014-a-chart/22

Page 23: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Landscape 2016

09/28/2018 http://mattturck.com/2016/02/01/big-data-landscape/ 23

Page 24: CS226 Big-Data Managementeldawy/18FCS226/slides/CS226-09...Big-data Expert Understand how the big-data platforms really work Control those thousands of processors efficiently to carry

Big Data Landscape 2018

09/28/2018 24