SPARK SUMMIT 2017 후기 & SCALABLE한기계학습
이동진 ([email protected] / [email protected])
ABOUT THE SPEAKER…
• Apache Committer
• Apache Horn
• Apache Spark …
• Alcacruz.com
• Senior Software Engineer
• VR Startup, Bay Area
• Machine Learning을활용한작업효율화, 최적화
OUTLINE
• 최신트렌드정리
• 탈락된 Topic들
• Spark Summit 2017 ML Best 3 (Spark Seoul 2017)
• Spark Internals (Core, Dataframe, …)
• Spark ML 최신기술들 (GraphFrame, Dataframe Packages, R Integration, …)
OUTLINE
• 오늘의수박겉핥기 (x) 수박구경 (o)
1. Graph Processing
2. Streaming
3. Real-time Service
4. Blockchain
5. Deep Learning
GRAPH PROCESSING
• Challenging Web-Scale Graph Analytics with Apache Spark [slides, video]
• GraphFrame 기능에대한소개
• 기존 RDD 기반 GraphX API를대체
• 1.x : 2.x = RDD : Dataframe = MLLib : ML = GraphX : GraphFrames
• Version 0.5 (2017.7 현재)
GRAPH PROCESSING
• 관련세션
• Large-Scale Text Processing Pipeline with Spark ML and GraphFrames (Princeton Univ.,
Spark Summit East 2017) [slide, video]
• Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud
Prevention (PayPal, DataWorks Summit 2017) [slide]
• Multi-Label Graph Analysis and Computations Using GraphX (Linkedin, Spark Summit
2017) [slide, video]
Schema Serde
STREAMING
• Schema Registry (Hortonworks, Dataworks Summit Munich / San Jose 2017)
[slide, video]Schema Registry
Writer Reader
STREAMING
• 관련세션
• Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax (Confluent,
Spark Summit 2017) [slide, video]
• The Top Five Mistakes Made When Writing Streaming Applications (Cloudera, Spark
Summit 2017) [slide, video]
• Stream All Things - Patterns of Modern Data Integration (Confluent ,Spark Summit 2017)
[slide, video]
• Future Architecture of Streaming Analytics: Capitalizing on the Analytics of Things (Intel ,
DataWorks Summit 2017) [slide, video]
REAL-TIME SERVICE
• 한줄요약: Redis를 Model Storage로사용한다.
• 기본
• Getting Ready to Use Redis with Apache Spark [slide, video]
• 응용
• Building a Large Scale Recommendation Engine with Spark and Redis-ML [slide, video]
• Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more [slide,
video]
BLOCKCHAIN
• 분산된형태로저장된 Chain 꼴 Block들의집합
• 기존 RDBMS에비해교환을모델링하는데유리
• 방향
• Blockchain에 포함된정보를분석
• Block 자체를일종의 Data storage로활용
BLOCK BLOCK
FACT
FACT
…
BLOCK
Blockchain:
DEEP LEARNING
• 더설명이必要韓紙?
• TensorFrames: Deep Learning with TensorFlow on Apache Spark (Databricks,
Spark Summit 2016) [slide, video]
• Real-Time Image Recognition with Apache Spark (MemSQL, Spark Summit
2017) [slide, video]
QUESTIONS?