Top Banner
© 2011 IBM Corporation 1 Big Data. New Physics. And Why Geospatial Data is Analytic SuperFood Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email protected] March 23rd, 2011
40

Big data new physics giga om structure conference ny - march 2011

Dec 02, 2014

Download

Technology

Jeff Jonas

Opening keynote @ Structure Big Data 2011 conference.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation1

Big Data. New Physics.And Why Geospatial Data is Analytic SuperFood

Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics

[email protected]

March 23rd, 2011

Page 2: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation2

Big Data. New Physics.

More data: better the predictions– Lower false positives

– Lower false negatives

More data: faster– The compute required decreases as database

gets bigger

Bonus: bad data … good– Suddenly glad your data is not perfect

Page 3: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation3

Background

Early 80’s: Founded Systems Research & Development

1989 – 2003: Built numerous systems for Las Vegas, including NORA

Designed and deployed +/- 100 systems, at least 5 systems containing multi-billions of records and 100’s of millions of entities

2005: IBM acquires SRD

Today: Focus on ‘sensemaking on streams’ with special attention towards privacy and civil liberties protections

Page 4: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation4

Time

Com

pu

tin

g P

ow

er

Gro

wth

Sensemaking

Algorithms

Available Observation

Space

Context

Trend: Organizations Are Getting Dumber

EnterpriseAmnesia

Every two days now we create as much information as we did from the dawn of civilization up until 2003.”

~ Eric Schmidt, CEO Google

Page 5: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation5

Time

Com

pu

tin

g P

ow

er

Gro

wth

Sensemaking

Algorithms

Available Observation

Space

Context

Trend: Organizations Are Getting Dumber

WHY?

Page 6: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation6

Algorithms at Dead End.

You Can’t Squeeze Knowledge

Out of a Pixel.

Page 7: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation7

[email protected]

No Context

Page 8: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation8

Context, definition

Better understanding something by taking into account the things around it.

Page 9: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation9

Information in Context … and Accumulating

Top 200Customer

Job Applicant

IdentityThief

CriminalInvestigation

[email protected]

Page 10: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation10

From Pixels to Pictures to Insight

Observations

Contextualization

Information inContext

Relevance

Consumer(An analyst, a system, the sensor itself, etc.)

Page 11: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation11

The Puzzle Metaphor

Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors

What it represents is unknown (there is no picture on hand)

Is it one puzzle, 15 puzzles, or 1,500 different puzzles?

Some pieces are duplicates, missing, incomplete, low quality, or have been misinterpreted

Some pieces may even be professionally fabricated lies

Point being: Until you take the pieces to the table and attempt assembly, you don’t know what you are dealing with

Page 12: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation12

How Context Accumulates

With each new observation … one of three assertions are made: 1) Un-associated; 2) placed near like neighbors; or 3) connected

Must favor the false negative

New observations sometimes reverse earlier assertions

As the working space expands, computational effort increases

Given sufficient observations, there can come a tipping point … thereafter, confidence improves while computational effort decreases!

Page 13: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation13

Observations

Un

iqu

e Id

enti

ties

True Population

Overstated Population

Page 14: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation14

Counting Is Difficult

Mark Smith6/12/1978

443-43-0000

Mark R Smith(707) 433-0000DL: 00001234

File 1

File 2

Page 15: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation15

Observations

Un

iqu

e Id

enti

ties

True Population

The Bigger, The More Accurate, The Faster

Page 16: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation16

Data Triangulation

Mark Randy Smith443-43-0000

DL: 00001234

New Record

Mark Smith6/12/1978

443-43-0000

Mark R Smith(707) 433-0000DL: 00001234

File 1

File 2

Page 17: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation17

Big Data … pile of … Big Data … in context

Page 18: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation18

One Form of Context is “Expert Counting”

Is it 5 people each with 1 account … or is it 1 person with 5 accounts?

Is it 20 cases of H1N1 in 20 cities … or one case reported 20 times?

If one cannot count … one cannot estimate vector or velocity (direction and speed).

Without vector and velocity … prediction is nearly impossible.

Page 19: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation19

“Key Features” Enable Expert Counting

People Cars Router

Name Make Device IDAddress Model MakeDate of Birth Year ModelPhone License Plate No. Firmware Vers.Passport VIN Asset IDNationality Owner Etc.Biometric Etc.Etc.

Page 20: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation20

Consider Lying Identical Twins

#123Sue3/3/84UberstanExp 2011

PASSPORT#123Sue3/3/84UberstanExp 2011

PASSPORT

Fingerprint

DNAMost Trusted

Authority

“Same person –

trust me.”

Most TrustedAuthority

Page 21: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation21

The same thing cannot be in two places … at the same time.

Two different things cannot occupy the same space … at the same time.

Page 22: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation22

Space & Time Enables Absolute Disambiguation

People Cars RouterName Make Device IDAddress Model MakeDate of Birth Year ModelPhone License Plate No. Firmware Vers.Passport VIN Asset IDNationality Owner Etc.Biometric Etc.Etc.

When When WhenWhere Where Where

Page 23: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation23

“Life Arcs” Are Also Telling

Bill Smith4/13/67

Salem, Oregon

Bill Smith4/13/67

Seattle, Washington

Address History

Tampa, FL 2008-2008

Biloxi, MS 2005-2008

NY, NY 1996-2005

Tampa, FL 1984-1996

Address History

San Diego, CA 2005-2009

San Fran, CA 2005-2005

Phoenix, AZ 1990-2005

San Jose, CA 1982-1990

Page 24: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation24

OMG

Page 25: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation25

Space-Time-Travel

Cell phones are generating a staggering amount of geo-locational data – 600B transactions per day being created in the US alone

This data is being “de-identified” and shared with third parties – in volume and in real-time

Your movement quickly reveals where you spend your time (e.g., evenings vs. working hours) and who you spend your time with

Re-identification (figuring out who is who) is somewhat trivial

Page 26: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation26

Space-Time-Travel is Prediction Super-Food

Prediction with 87% certainty where you will be next Thursday at 5:35pm

Names of the top 10 people you co-locate with, not at home and not at work

The Uberstan intelligence service preempts the next mass protest in real-time

A political opponent is crushed and resigns two days after announcing their candidacy

Page 27: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation27

Consequences

Space-time-travel data is the ultimate biometric

It will enable enormous opportunity

It will unravel one’s secrets

It will challenge existing notions of privacy

And, it’s here now and more to come

Page 28: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation28

Surveillance society

is irresistible.

And you are doing it.Location-based services (GPS), free email, Facebook, etc.

Page 29: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation29

2 Big Data Trends

Page 30: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation30

Will

ing

ness

to W

ait

The better the predictions … the faster they will be

wanted.

“Why did we have to wait until the

end of the day for the smart answer?”

Trend: Time Is Of The Essence

Relevance (Iffy) (Totally)

Day

Hour

200ms

Batch

Real-Time

Page 31: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation31

Acc

oun

tab

le a

nd R

ep

eata

ble

It appears the market is becoming

more tolerant of one-time results that cannot be

easily repeated or

substantiated

Trend: Growing Tolerance for Non-Repeatability

Going ForwardYesterday

Payroll

Now

Google

Facebook

Page 32: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation32

Acc

oun

tab

le a

nd R

ep

eata

ble

6:34pm Recommendation Shoot it6:35pm Action Taken Bang.Dead6:36pm Recommendation Oops.Send Flowers

Going ForwardYesterday Now

Trend: Be Careful What You Wish For

Page 33: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation33

Closing Thoughts

Page 34: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation34

Time

Com

pu

tin

g P

ow

er

Gro

wth

Sensemaking

Algorithms

Available Observation

Space

Context

Wish This On The Adversary

Page 35: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation35

Time

Com

pu

tin

g P

ow

er

Gro

wth

Context Accumulation: The Way Forward

Sensemaking

Algorithms

Available Observation

SpaceContext Context

Accumulation

Page 36: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation36

Related Blog Posts

Big Data. New Physics.

Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel

Puzzling: How Observations Are Accumulated Into Context

Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems

Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!

Data Finds Data

General Purpose Sensemaking Systems and Information Colocation

Sensemaking on Streams – My G2 Skunk Works Project: Privacy by Design (

PbD)

Page 37: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation37

Big Data. New Physics.And Why Geospatial Data is Analytic SuperFood

Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics

[email protected]

March 23rd, 2011

Page 38: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation38

“G2”My R&D Skunk Works Project

Page 39: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation39

My G2 Goals

General purpose, real-time, sensemaking engine

Performs ‘information colocation’ over diverse data types e.g., structured, unstructured, social, geospatial, queries, hypothesis, anonymized data and more

Exploiting the big data, new physics phenomenon

Delivers “data finds data, relevance finds you”

Engineered for grid compute for massive scalability– Dreaming about: 1T rows for breakfast – then sustaining 1M

context accumulating observations per second– While new observations reverse earlier assertions

Privacy by Design (PbD) – a number of exciting privacy and civil liberties enhancing features baked-in, by design

Page 40: Big data new physics   giga om structure conference ny - march 2011

© 2011 IBM Corporation40

Big Data. New Physics.And Why Geospatial Data is Analytic SuperFood

Jeff Jonas, IBM Distinguished EngineerChief Scientist, IBM Entity Analytics

[email protected]

March 23rd, 2011