Top Banner
Seeing With Your Eyes Closed Ellen Friedman No SQL Matters Barcelona 22 November 2014
67
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 1

Seeing With Your Eyes Closed

Ellen Friedman No SQL Matters Barcelona 22 November 2014

Page 2: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 2

Contact Information Ellen Friedman

Solutions Consultant and Commentator Apache Mahout committer, Apache Drill contributor

Email [email protected]

[email protected] Twitter @Ellen_Friedman @ApacheDrill

Hashtag today: #NoSQL14

Page 3: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 3

Thinking With Your Eyes Closed

When some people think…

… they close their eyes in order to “see”.

© 2014 Ellen Friedman

Page 4: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 4

Getting Past the Details •  Look at your data with an open mind

•  Listen to what data tells you •  Find the key concepts in what you do

•  Give yourself an opportunity for discovery

Page 5: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 5

NoSQL •  Founded on discovery

•  Solution-driven

•  Don’t be bound by the tool

•  Flexibility is important

•  How do you keep your ability for invention?

Page 6: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 6

! ! ! ! !Basic idea: ! !“Eyes open” !! !“Eyes closed” ! ! !Details ! ! ! !Discovery!

Page 7: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 7

Imagination, technology and careful reasoning

Think where this may take you.

Page 8: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 8

Things don’t always turn out the way you predict… With exploration into new frontiers, you may meet your goal in surprising ways.

A Perfect Red, by Amy Butler Greenfield

Spanish explorers came to the Americas in search for riches.

They were looking for gold and silver.

They found cochineal.

Red dye worth a fortune.

Page 9: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 9

Big Data and Open Source in the 19th Century Here’s a story with the power of vision (eyes closed thinking) plus keen observation and attention to detail (eyes open thinking) It’s got: •  Adventure on the high seas •  Time series data (a hot topic in the NoSQL world today) •  Clever community building for open source participation •  World speed record •  (but no pirates)

Page 10: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 10

Here’s the story!

Page 11: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 11

Oddly, that’s where the real adventure starts.!

Matthew Fountain Maury was a sailor in the 1830s. Injured at sea, the US Navy gave him a “desk job”.

Page 12: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 12

Time Series Data – An Old Idea Captain’s log book entry for the Steam Ship Bear, 1884 trip to Arctic From image digitized by www.oldweather.org and provided via www.naval-history.net . Image modified by Ellen Friedman and Ted Dunning.

Ship captains kept log books with various comments plus measurements recorded at specific times.

Page 13: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 13

Time Series Data – An Old Idea

The basis of a time series is the repeated measurement of parameters over time, together with the times at which the measurements were made.

Page 14: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 14

Time Series Data – An Old Idea

At his desk job in the U.S. Navy Office of Charts, Maury discovered boxes with hundreds of ship’s logs, largely forgotten.

Page 15: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 15

Big data project: Bring the data together •  Using the log data, Maury and his team built maps to indicate wind,

temperature, currents –  They extracted, transformed and aggregated this huge volume of data –  By hand!

•  Mariners would be able to predict conditions on various routes at different times of the year

•  His theory was that this would help navigation

•  Maury published his Winds and Currents charts to be widely available

Page 16: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 16

Big data project: Maury’s Wind and Currents charts

At first, no body was interested in them…

Page 17: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 17

Maury’s Wind and Currents charts

Using Maury’s carefully compiled data, Captain Jackson got back one month early on a trip from Baltimore in the US to Rio de Janeiro in Brazil.

Page 18: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 18

Maury’s Wind and Currents charts

Now everybody wanted one of his charts. Here’s where the open source parts comes in…

Page 19: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 19

Maury’s Open Source Project: The Abstract Log Maury wanted better data from the ship’s captains. To get one of Maury’s Winds and Currents charts: •  Captains first had to fill in a special template for one of their trips

•  They returned the template, called Abstract Log, to Maury and got a chart

•  Maury’s team collected new data that was better than before: regular and systematic time series data

Page 20: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 20

Data-Drive Decisions Set a World Record •  In 1853, clipper ship Flying Cloud set record for fastest sailing

from New York City to San Francisco

•  Maury’s charts played a key role in the navigator’s expert, data-driven decisions about the route

•  Surprisingly, the navigator was a woman, Eleanor Creesy

Page 21: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 21

Key Lessons from Maury’s Work •  Give to get

–  Give the Abstract Log to captains, get data collected in careful way

•  Big data consortium wins –  Merging data gives pictures nobody else can see

•  Building open source community is valuable –  The collective effort builds the basis for exploration and discovery

•  Lessons like today: Just 150 years before everybody else

Page 22: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 22

Where exploration is taking us now!

Page 23: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 23

Exploration takes you to surprising places The really scary part is knowing the amount of computing power in the Apollo 11 guidance system… Buzz Aldrin steps onto Moon

photo by Neil Armstrong, Apollo 11 20 July 1969 NASA photo http://1.usa.gov/1uXi53U

Page 24: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 24

Computing power in familiar objects

For comparison: SIM chip in smart card similar to the SIM chip in a cell phone Has about 0.5 kilobytes RAM

16.0 kilobytes ROM

Only a little less than Apollo…

Page 25: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 25

Computing power in familiar objects

SIM chip in smart card similar to the SIM chip in a cell phone Has about 0.5 kilobytes RAM

16.0 kilobytes ROM Phone processor is very powerful: 1.3 GHz, dual core,1 GB of RAM Much more powerful than Apollo

Page 26: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 26

Computing power in familiar objects

Arduino is a little microprocessor with enough power to interact with sensors in the IoT

The question is, what can you use these powerful, compact technologies to do?

Page 27: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 27

Things may not turn out the way you predict

Surprising use for a microprocessor: Family cat equipped with “smart collar” investigates neighborhood and reveals weak security for local wi-fi Humorous glimpse at the potential for IoT

https://www.mapr.com/blog/the-internet-of-cat-toys

Page 28: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 28

Who Needs Time Series Data?

Utility providers use smart meters to monitor very short term changes in energy usage

Page 29: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 29

Who Needs Time Series Data?

Manufacturers who monitor equipment on the assembly line

Manufacturers who produce “smart parts” that report back after the parts are in operation

Page 30: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 30

Unmanned Ocean Robot: Wave Glider •  Made by Liquid Robotics

http://liquidr.com/technology/waveglider/how-it-works.html

•  Powered by wave motion •  Onboard sensors solar powered •  Travelled from San Francisco to

Hawaii, Japan & Australia •  Survived shark attack and typhoon •  Cool

Page 31: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 31

Environmental Monitoring •  Big trend and growing

•  Companies to collect, store and analyze data

•  Example: Planet OS –  Multi-sensor, machine data –  Time series + spatial data –  https://planetos.com

Page 32: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 32

Smart Shirt •  Sensors embedded in fabric

–  Measures heart rate & movement –  Includes time stamp and geo data

•  Smart fabric uses smart phone as hub

•  Fabric also used for other industries

•  Made by Smart Sensing, part of Cityzen Sciences Consortium

•  Also cool.

Feb 2014 article in gizmag http://www.gizmag.com/cityzen-smart-shirt-sensing-fabric-health-monitoring/30428/

Page 33: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 33

Cityzen Data •  Spin-off from consortium Cityzen Sciences

•  Provides data platform for storage & analysis of sensor data inc smart shirt

•  http://www.cityzendata.com

•  Presentation by Cityzen Data CTO Mathias Herberts “From Thread to API” (Feb 2014 )https://www.youtube.com/watch?v=RV_Wgc-0yOs

•  Presentation in Silicon Valley in June 2014 http://www.slideshare.net/Mathias-Herberts/20140611-io-tsiliconvalley

Page 34: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 34

When is a NoSQL time series database useful?

Build a NoSQL time series database when •  Most of your scans are based on a time range •  Data is at large scale

Page 35: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 35

!

Page 36: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 36

Lesson: It’s scary to go the Moon with the computing power of a credit card!

Page 37: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 37

Lesson:

Modern computing + NoSQL methods = enormous potential!

Page 38: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 38

Communication matters… !

Page 39: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 39

Like monkeys trying to describe a Capybara…

Seen on Twitter: https://twitter.com/rudytheelder/status/500471789042954240

Page 40: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 40

Getting Past the Details It’s no longer acceptable for technical and non-technical teams to be unable to communicate

•  Data science team needs to clearly exchange ideas about project goals, resources and planning with domain experts

•  Find a new language to describe your work appropriately

•  Find the key concepts in what you do

•  Describe them in a way that makes sense to your audience

Page 41: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 41

Basic idea: Seeing key concepts leads to discovery and implementation!

Page 42: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 42

e-books currently available courtesy of MapR

Time Series Databases by Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly)

http://bit.ly/1GMk9yY

How to store & access time series data using NoSQL database (HBase or MapR-DB)

Page 43: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 43

Innovations in Recommendation by Ted Dunning and Ellen Friedman © Feb 2014 (published by O’Reilly)

Page 44: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 44

A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 45: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 45

! ! ! ! !Basic idea: ! !“Eyes open” !! !“Eyes closed” ! ! !Present ! ! ! ! !Future!

Page 46: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 46

Flexibility is a key aspect of NoSQL!

Page 47: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 47

How would you like to be able to… •  Query multiple data types including JSON or Parquet with SQL? •  Use directory name as a table name when you query so you don’t have to

know in advance the files you’re going for? •  Use standard SQL query on Hadoop or NoSQL, with low-

latency?

•  Go schema-less !? (shocking!)

•  Reduce the distance to your data?

•  This is where Apache Drill comes in…

•  That’s where Drill comes in…

Page 48: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 48

Apache Drill •  Low latency SQL query engine for Apache Hadoop and NoSQL

•  Extremely flexible: –  1st and only distributed SQL query engine that does not require schema –  Uses wide range of data types including nested, JSON, Parquet

•  Convenient: –  Uses familiar ANSI SQL commands –  Lets you continue to use standard BI tools

•  Open source community: –  Approaching graduation

Page 49: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 49

Real SQL instead of “SQL-like” •  May be surprising to boast in a NoSQL conference, but flexibility

is important – find solutions, not bound by one tool •  Sample TPC-H SQL benchmark query that Drill can run “as is”:

Page 50: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 50

Schema-less distributed SQL engine •  Save weeks or months

–  would have been spent on defining schema, ETL and maintaining schema

•  Drill automatically understands the structure of data •  Simply point Drill at data and run queries

–  Works on file, directory, Hbase or MapR-DB, table etc.

Page 51: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 51

Query complex, semi-structured data “as is” •  No need to flatten or transform data prior to query execution •  Intuitive extensions to SQL to work with nested data •  Here is simple query on a JSON file:

Page 52: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 52

Apache Drill •  Open source, open opportunities •  What would you use Drill to do? •  Best use case will be featured in upcoming book on Drill

Page 53: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 53

Looking Forward: Apache Drill SQL on NoSQL!

Page 54: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 54

Big Impact on Society!

Page 55: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 55

What if you needed to uniquely identify every person in India?!

All 1.2 billion of them?!

Page 56: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 56 PEOPLE

1.2 B

Largest Biometric Database in the World

PEOPLE PEOPLE The Aadhaar Project: •  Unique 12 – digit number for each person in India •  Proof of identity and address, authenticated anytime, anywhere •  Runs on NoSQL database MapR-DB

Page 57: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 57

A Day in the Life of the Aadhaar Project Data platform must handle: •  1 million new enrollments /day

–  After 4 years, ~ 600 million of the 1.2 billion already enrolled –  4+ PB of raw data

•  Each new enrollment needs de-duplication –  100s of millions of transaction over billions of records doing 100s of trillions of

biometric matches/day •  Online sub-second authentications

–  as many as 100 million per day From Pramod Varma, Chief Architect of UIDAI at Strata / Hadoop World NYC Oct 2014

http://strataconf.com/stratany2014/public/schedule/detail/36305 Official website of Unique Identification Authority of India (UIDAI)

http://uidai.gov.in

Page 58: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 58

What does Aadhaar mean for India? •  Better delivery of welfare services •  More open society

–  Identification without regard to cast, creed, religion or geography

•  Reduction in embezzlement – save billions in government funds •  NoSQL is changing society for the better

Page 59: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 59

! ! ! ! !Basic idea: ! !“Eyes open” !! !“Eyes closed” ! !Implementation !! !! !Vision!

Page 60: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 60

Exploration takes you to surprising places

Buzz Aldrin steps onto Moon photo by Neil Armstrong, Apollo 11 20 July 1969 NASA photo http://1.usa.gov/1uXi53U

Page 61: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 61

India’s Space Program: Mission to Mars •  India’s ISRO gets Mars orbit on 1st try •  US NASA & India’s ISRO look forward

to collaboration (while @MarsOrbiter chats with @MarsCuriosity)

•  Also cool

Page 62: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 62

India’s Women Engineers at ISRO •  ISRO and NASA have many women

engineers

•  Very cool

Page 63: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 63

European Space Agency: Rosetta Mission to Comet •  Mission took 10 years, 8 mo, 19 days

•  Philae lander touched down on comet on 12 November 2014

•  Outrageously cool!

Page 64: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 64

What do I predict for the NoSQL future?!

Page 65: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 65

What future do you want to build?!

Page 66: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 66

Contact Information Ellen Friedman

Solutions Consultant and Commentator Apache Mahout committer, Apache Drill contributor

Email [email protected]

[email protected] Twitter @Ellen_Friedman @ApacheDrill

Hashtag today: #NoSQL14

Page 67: Ellen Friedman - Keynote NoSQL matters Barcelona 2014

© 2014 Ellen Friedman 67

Thank you!!