Top Banner
Data-Driven Development Era and Its Technologies Developers Summit 2015 Autumn (Oct 14, 2015) Satoshi Tagomori (@tagomoris)
25

Data-Driven Development Era and Its Technologies

Apr 14, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data-Driven Development Era and Its Technologies

Data-Driven Development Era and Its Technologies

Developers Summit 2015 Autumn (Oct 14, 2015)

Satoshi Tagomori (@tagomoris)

Page 2: Data-Driven Development Era and Its Technologies

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, Norikra, Hadoop, ...

Treasure Data, Inc.

Page 3: Data-Driven Development Era and Its Technologies
Page 4: Data-Driven Development Era and Its Technologies

HQ

Branch

Page 5: Data-Driven Development Era and Its Technologies

http://www.treasuredata.com/

Page 6: Data-Driven Development Era and Its Technologies

Main Topics around "Data"• Data collection

• Storage

• Data processing • Batch distributed processing • Stream processing • Machine Learning • Near real-time query & Data lake

• Visualization

Page 7: Data-Driven Development Era and Its Technologies

Data Analytics Flow

Collect Store Process Visualize

Data source

Reporting

Monitoring

Page 8: Data-Driven Development Era and Its Technologies

Where before What

Page 9: Data-Driven Development Era and Its Technologies

Using Services or Not• Using services fully-managed:

• Google BigQuery & Dataflow • Treasure Data services

• Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc

• Using your own environment & cluster

Page 10: Data-Driven Development Era and Its Technologies

Using Services or Not• Using services fully-managed:

• Google BigQuery & Dataflow • Treasure Data services

• Using services self-managed: • Amazon EMR & Redshift • Google Cloud Dataproc

• Using your own environment & cluster

a bit more cost extremely less efforts

fully controlled by self extremely more efforts

less cost less efforts

Page 11: Data-Driven Development Era and Its Technologies

Using Services or Not:

"Use Services!"

To concentrate DATA and Analytics,

NOT tools

Page 12: Data-Driven Development Era and Its Technologies

Why should we use services?

• About distributed systems: • hard to operate & upgrade • impossible to "small-start" • very hard to hire professional engineer

• Data Driven Development: • collect/store data at first! • consider output data at second! • "before building your own environment"

Page 13: Data-Driven Development Era and Its Technologies

Really? Are you TD guy?

• ...Really!

• But it requires very long discussions :P

• "スタートアップのデータ処理基盤、作るか、使うか"http://tsuchinoko.dmmlabs.com/?p=1770

Page 14: Data-Driven Development Era and Its Technologies

How to choose software/services in

Data-Driven Development

Page 15: Data-Driven Development Era and Its Technologies

"What" decides "How"

• Distributed systems are to solve problems • There're many kind of data • There're many problems

• Systems solve different problems from each other • There are no "Silver bullet"!

Page 16: Data-Driven Development Era and Its Technologies

What First, How Second

• What do you want to do? • Reporting? Analytics? Recommendation? or ...

• What type of data you wan to process? • Stored large log? Stream sensor data? or ...

• What is you need as result? • CSV? Spreadsheet? Graph? DB Relation? or ...

Page 17: Data-Driven Development Era and Its Technologies

How? (just for example)

• MapReduce, Tez • Large batch jobs, big JOINs, high stability

• Spark • Small/Middle batch jobs, machine learning

• Impala, Presto, Drill, Redshift, BigQuery • Near-real-time search, small-to-large analytics

• Storm, Spark streaming • Stream data conversion/aggregation

Page 18: Data-Driven Development Era and Its Technologies

"Processing" is just a part of whole dataflow!

Page 19: Data-Driven Development Era and Its Technologies

Data Analytics Flow (again)

Collect Store Process Visualize

Data source

Reporting

Monitoring

Page 20: Data-Driven Development Era and Its Technologies

Data Analytics Flow (again)

Collect Store Process Visualize

Data source

Reporting

Monitoring

Page 21: Data-Driven Development Era and Its Technologies

Data Collection• Data Driven Development -> collect at first!

• As batch: Data already exists as files • Easily integrated with existing batch systems • Sqoop, Embulk, ...

• As stream: Data just generated now • Easily connected with monitoring systems • Without burst network traffic • Flume, Logstash, Fluentd, ...

Page 22: Data-Driven Development Era and Its Technologies

Fluentd: Support Service by SRA OSS

with Treasure Data

Released TODAY!

Page 23: Data-Driven Development Era and Its Technologies

Other Important Topics

• Storage: Performance, Availability, Schema management • Apache Hadoop HDFS, Apache HBase, Amazon S3, Cloudera Kudu, ...

• Visualization: Functionality, Connectivity, Visibility • Tableau, Pentaho, Many other enterprise products, ...

• Distributed Queues: Performance, Stability, Connectivity • Apache Kafka, Amazon Kinesis, ...

Page 24: Data-Driven Development Era and Its Technologies

Get Familiar with Options NOT to Take Pains about Technology!

Page 25: Data-Driven Development Era and Its Technologies

Concentrate DATA and Analytics,

NOT tools.

Thanks!