Top Banner
SPARK SUMMIT 2017 후기 & SCALABLE한 기계 학습 이동진 ([email protected] / [email protected])
12

Spark Summit 2017 후기 + Scalable한 기계 학습

Jan 21, 2018

Download

Technology

Dongjin Lee
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark Summit 2017 후기 + Scalable한 기계 학습

SPARK SUMMIT 2017 후기 & SCALABLE한기계학습

이동진 ([email protected] / [email protected])

Page 2: Spark Summit 2017 후기 + Scalable한 기계 학습

ABOUT THE SPEAKER…

• Apache Committer

• Apache Horn

• Apache Spark …

• Alcacruz.com

• Senior Software Engineer

• VR Startup, Bay Area

• Machine Learning을활용한작업효율화, 최적화

Page 3: Spark Summit 2017 후기 + Scalable한 기계 학습

OUTLINE

• 최신트렌드정리

• 탈락된 Topic들

• Spark Summit 2017 ML Best 3 (Spark Seoul 2017)

• Spark Internals (Core, Dataframe, …)

• Spark ML 최신기술들 (GraphFrame, Dataframe Packages, R Integration, …)

Page 4: Spark Summit 2017 후기 + Scalable한 기계 학습

OUTLINE

• 오늘의수박겉핥기 (x) 수박구경 (o)

1. Graph Processing

2. Streaming

3. Real-time Service

4. Blockchain

5. Deep Learning

Page 5: Spark Summit 2017 후기 + Scalable한 기계 학습

GRAPH PROCESSING

• Challenging Web-Scale Graph Analytics with Apache Spark [slides, video]

• GraphFrame 기능에대한소개

• 기존 RDD 기반 GraphX API를대체

• 1.x : 2.x = RDD : Dataframe = MLLib : ML = GraphX : GraphFrames

• Version 0.5 (2017.7 현재)

Page 6: Spark Summit 2017 후기 + Scalable한 기계 학습

GRAPH PROCESSING

• 관련세션

• Large-Scale Text Processing Pipeline with Spark ML and GraphFrames (Princeton Univ.,

Spark Summit East 2017) [slide, video]

• Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud

Prevention (PayPal, DataWorks Summit 2017) [slide]

• Multi-Label Graph Analysis and Computations Using GraphX (Linkedin, Spark Summit

2017) [slide, video]

Page 7: Spark Summit 2017 후기 + Scalable한 기계 학습

Schema Serde

STREAMING

• Schema Registry (Hortonworks, Dataworks Summit Munich / San Jose 2017)

[slide, video]Schema Registry

Writer Reader

Page 8: Spark Summit 2017 후기 + Scalable한 기계 학습

STREAMING

• 관련세션

• Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax (Confluent,

Spark Summit 2017) [slide, video]

• The Top Five Mistakes Made When Writing Streaming Applications (Cloudera, Spark

Summit 2017) [slide, video]

• Stream All Things - Patterns of Modern Data Integration (Confluent ,Spark Summit 2017)

[slide, video]

• Future Architecture of Streaming Analytics: Capitalizing on the Analytics of Things (Intel ,

DataWorks Summit 2017) [slide, video]

Page 9: Spark Summit 2017 후기 + Scalable한 기계 학습

REAL-TIME SERVICE

• 한줄요약: Redis를 Model Storage로사용한다.

• 기본

• Getting Ready to Use Redis with Apache Spark [slide, video]

• 응용

• Building a Large Scale Recommendation Engine with Spark and Redis-ML [slide, video]

• Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more [slide,

video]

Page 10: Spark Summit 2017 후기 + Scalable한 기계 학습

BLOCKCHAIN

• 분산된형태로저장된 Chain 꼴 Block들의집합

• 기존 RDBMS에비해교환을모델링하는데유리

• 방향

• Blockchain에 포함된정보를분석

• Block 자체를일종의 Data storage로활용

BLOCK BLOCK

FACT

FACT

BLOCK

Blockchain:

Page 11: Spark Summit 2017 후기 + Scalable한 기계 학습

DEEP LEARNING

• 더설명이必要韓紙?

• TensorFrames: Deep Learning with TensorFlow on Apache Spark (Databricks,

Spark Summit 2016) [slide, video]

• Real-Time Image Recognition with Apache Spark (MemSQL, Spark Summit

2017) [slide, video]

Page 12: Spark Summit 2017 후기 + Scalable한 기계 학습

QUESTIONS?