Top Banner
1 Big Data Computing Overview 2015.04.07 Youngsung Son
47

Big data computing overview

Feb 19, 2017

Download

Technology

Young-Sung Son
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data computing overview

1

Big Data Computing Overview

2015.04.07Youngsung Son

Page 2: Big data computing overview

2

Agenda

§ What am I doing?§ Big Data Computing History

– Supercomputer– Parallel Computing– Linux Cluster– Big Data Computing

§ Google File System (GFS)§ Hadoop Map and Reduce§ Spark Stream Processing§ References

Page 3: Big data computing overview

3

What am I doing?

Page 4: Big data computing overview

4

1. Personal Cloud Repository Access

2. Personal Health Record Retrieval

3. Case based Reasoning (Similar Case Search)

4. Comparisionamong Similar Patients (for Health Planning, Prediction, Advise)

1

2 3

4

Healing Platform

Page 5: Big data computing overview

5

Healing Platform

모바일 플랫폼

Open API

의료 데이터프로바이더 1..N

5000만명x17건/365일=~200만건/일;

라이프레코드프로바이더 1..N

5000만명x5회= ~3억/일

개인 힐링 레코드저장소 1..N

5000만명/일

요청

전송

저장

서비스분석 엔진

모바일 서비스 1..N

RES

Tful

lAPI

3초 이내

로드요청

표준변환

Targeted 데이터/힐링지식베이스

(NoSQL DB)

TD TD

TD KB

변환

/필터

스트림컴퓨팅(업데이트 관리)

고속계산용DB

DW

구축

DC DC

DC DC

Big DataPersonal DataControl

Service

분석플랫폼

데이터 중계기

요청

전송

공공 임상사례 빅데이터

개인 힐링레코드사례 빅데이터

원본 빅데이터 (HDFS)

유사사례검색

트렌드

플래닝

TD 구

지식베이스 구축 엔진Cluster, CBR, …

Page 6: Big data computing overview

6

Big Data Computing History

Page 7: Big data computing overview

7

Supercomputer

Page 8: Big data computing overview

8

Supercomputer

Page 9: Big data computing overview

9

Architecture of HyperCube

John P. Hayes, “Architecture of Supercomputer,” International Conference of Parallel Processing 1986.http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/1986.08.Architecture%20Of%20A%20Hypercube%20Supercomputer_Conf_Paralle l_Processing.pdf

Page 10: Big data computing overview

10

Architecture of HyperCube

Page 11: Big data computing overview

11

Architecture of HyperCube

Page 12: Big data computing overview

12

Architecture of HyperCube

http://web.eecs.umich.edu/~tnm/trev_test/papersPDF/1986.08.Architecture%20Of%20A%20Hypercube%20Supercomputer_Conf_Parallel_Processing.pdf

Page 13: Big data computing overview

13

Parallel Computing

§ MPI – Message Passing Interface

§ PVM – Parallel Virtual Machine

Page 14: Big data computing overview

14

Parallel Computing

§ MPI (Message Passing Interface)

Page 15: Big data computing overview

15

Parallel Computing

§ PVM (Parallel Virtual Machine)

Page 16: Big data computing overview

16

Architecture of HyperCube

Too much costy!!!!

Too much difficult!!!!

Page 17: Big data computing overview

17

Linux Cluster

Page 18: Big data computing overview

18

Berkeley NOW Project (1995)

Page 19: Big data computing overview

19

Linux Cluster Project

CROWN SystemClustering Resources of Workstation’s

Network(1997~1999)

Page 20: Big data computing overview

20

Page 21: Big data computing overview

21

Page 22: Big data computing overview

22

Linux Cluster Specifications

§ 16 PCs§ PC’s specification

– Pentium3– 16MB– 20GB

§ Myrinet (300Mbps)

Page 23: Big data computing overview

23

Linux Cluster’s Goals

Page 24: Big data computing overview

24

Linux Cluster’s Goals

Real-timeRendering

Page 25: Big data computing overview

25

Limitations of achieving this goal

§ Visible Human Project– Data Size : 40GB (~100GB)

§ Linux File System (ext2)– 16GB/1 file – IDE bandwidth : 33Mbps (66Mbps)– Ethernet bandwidth : 100Mbps (below 30Mbps)– RAM : not enough

§ Myrinet network interface– Too difficult to use– Kernel hooking required!!!

§ Programming Model– PVM or MPI – Too Slow & Difficult!!!

Page 26: Big data computing overview

26

Google File System

Page 27: Big data computing overview

27

Google File System (GFS, 2003)

SanjayGhemawat,“TheGoogleFileSystem,”http://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

Page 28: Big data computing overview

28

Google File System (GFS)

Distributed,Overlayed,ScalableFileSystem

Page 29: Big data computing overview

29

Hadoop System2005

Page 30: Big data computing overview

30

Map & Reduce

§ User Logs Counting

Page 31: Big data computing overview

31

Map & Reduce

Whydoingasthis?

Page 32: Big data computing overview

32

How about this example?

§ Count Phone Call Logs?– Each user’s total time for phone call– KT’s case : 40TB / month– No exception available

§ Oracle Database– HW cost : ?– SW cost : Over 400,000,000 Korean Won– Time cost : about 1 day.

Page 33: Big data computing overview

33

Solution?

§ Simple is best– Log Merge

for(int i=0;i<max_log;i++)user[log[i].id].usage_time +=log[i].usage_time;

But,Toomuchtimerequired!!!

Page 34: Big data computing overview

34

Map & Reduce

§ User Logs Counting

Page 35: Big data computing overview

35

Spark Stream Processing2009

Page 36: Big data computing overview

36

Hadoop’s Performance Problem

Page 37: Big data computing overview

37

Hadoop’s Peformance Problem

Page 38: Big data computing overview

38

Spark Stream Processing

Page 39: Big data computing overview

39

Spark Stream Processing

Page 40: Big data computing overview

40

Page 41: Big data computing overview

41

Page 42: Big data computing overview

42

Spark Code Example

Page 43: Big data computing overview

43

Conclusion

§ Big Data Computing?– Of course, it is needed!! But for us?

§ We did a lot.– We need to enhance our aspect?

§ What’s the next? – Trends are repeated!!!– Your major might be come again?

Page 44: Big data computing overview

44

Page 45: Big data computing overview

45

아이고 의미없다.

Page 46: Big data computing overview

46

References

§ John P. Hayes, “Architecture of Supercomputer,” International Conference of Parallel Processing 1986.

§ MPI code example, http://mpitutorial.com/tutorials/mpi-hello-world/

§ PVM code example, http://www.netlib.org/pvm3/book/node17.html

§ Sanjay Ghemawat, “Google File System,” SOSP 2003

§ Hadoop Code Example, http://azure.microsoft.com/en-us/documentation/articles/hdinsight-sample-wordcount/

§ Madhukara Phatak, Introduction to Apache Spark, http://blog.madhukaraphatak.com/introduction-to-spark/

Page 47: Big data computing overview

47

Thank you

Young-Sung [email protected]