Top Banner
The 7th Annual MIT Chief Data Officer & Informa9on Quality Symposium (CDOIQ) New Trends and Direc9ons in Data Science Moderator : Mario Faria July 19 th , 2013
45

New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Jan 20, 2015

Download

Business

Mario Faria

Panel I hosted at MIT for the 7th Information Quality Conference in July 2013, with J.Andrew Rogers (SpaceCurve) and Matt Piekarczyk (Cortix Systems)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

Moderator  :  Mario  Faria    

July  19th  ,  2013  

July  17,  2012  

Page 2: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

•  J.Andrew  Rogers  (SpaceCurve)  •  Ma?  Piekarczyk  (CorDx  Systems)  

Panelists  

Page 3: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Format  

•  Mario’s  introduc9on  on  the  subject  •  Each  panelist  will  have  20  minutes  to  present  a  point  of  view  

•  Mario  will  ask  a  few  ques9ons    •  Panelists  will  debate  among  each  other  or  answer  ques9ons  from  the  audience  

Page 4: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Data  Science      

The  process  of  taking  raw  data,  producing  informa9on  from  data,  and  using  this  informa9on  to  guide  ac9ons  that  will  bring  financial  benefits  to  business  

Page 5: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Quality  is  mandatory  for  Data  Science  to  

work          

Page 6: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Where  we  stand  today  

•  Fragmented  ecosystem  •  Over  usage  of  the  Big  Data  term  •  The  “how  to  compete  on  analy9cs”  is  s9ll  hard  to  achieve  

•  In  the  majority  of  companies,  data  is  s9ll  managed  with  an  IT  mind  set    

Page 7: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

7

The Big Data Fragmented Tech Vendors data life cycle process view

Page 8: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

8

Page 9: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 10: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Mario Faria

10

Page 11: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

J.Andrew  Rogers    Founder  and  CTO  

SpaceCurve  

Page 12: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

www.spacecurve.com

© 2013 SpaceCurve, Inc. All rights reserved. 12

Five Big Data Trends and Directions In Data Science

J. Andrew Rogers Founder & CTO

July 18, 2013

Page 13: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 13

The Evolution Of Data Science

§  1st Generation

–  An organization’s structured data

–  Example: OLAP / Data Warehouse

§  2nd Generation

–  An organization’s unstructured data

–  Example: Hadoop / MapReduce

§  3rd Generation

–  Real-time context and actionability of an organization’s data

–  Example: SpaceCurve

Page 14: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 14

Capturing and Fusing In-Motion Data

§  Monetization of data-in-motion –  Satellites, smartphones, sensor, social media, spatial, radar, …

§  Real-time processing and fusing §  Immediate insights from multiple layers of data in motion and

historical data at once §  Immersive intelligence with real-time location analysis

Page 15: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 15

Trend #1. Use of diverse data sources for better situational awareness

§  Proliferation of inexpensive sensors create new possibilities

–  Imagery and video: satellite, UAV, coincidental

–  GPS-tagged entities and entity motion vectors

–  Sensor networks, RF, radar

§  Many challenges

–  Integration and fusion of unrelated data sources

–  Domain expertise required to use data effectively

–  Standardization of data representation

Page 16: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 16

Trend #2. Leveraging machine-generated data to increase model quality

§  Machines continuously make measurements of reality

–  Sensor networks e.g. imaging, radar, GPS tracking, RF, seismic

–  Operational sensors on machines e.g. automotive and aircraft

–  Computer network activity and audit logs

§  Challenge is extreme data generation rates

–  Few big data platforms designed for continuous data ingest

–  Computers and sensors are not constrained by human biology

Page 17: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 17

Real-world scenario: Hurricane Sandy

Page 18: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 18

Trend #3. Real-time data ingestion concurrent with analysis (“round-trip real-time”)

§  Minimizing latency from new data availability to updated analytic models and actionable intelligence is a multi-faceted advantage

–  Leverage highly perishable contextual data before it expires

–  Identify operational risks as soon as they manifest in the data

–  Continuously evolve models to reflect operational environment

§  Challenges for traditional data science platforms

–  Moving from batch to on-line or near-line analytical models

–  Minimizing data movement in analytical processes

–  Scaling out analytic query performance with online updates

Page 19: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 19

Trend #4. Space and time relationships for data fusion and deeper insights

§  Space and time are primary keys of reality

–  Entities and events can be localized at a point in time

–  Robust method for fusing unrelated slow and fast moving data

–  Interactions and movement over time can be modeled as graphs

§  Powerful and unique analytical capability

–  Correlation of data by time and space relationships

–  Relationship discovery by analyzing unrelated entity vectors

–  Anomaly detection using vector analysis

Page 20: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 20

Real-world scenario: Correlating entities on social media with flight data

Page 21: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 21

Trend #5. Layering many data sources for data quality and immersive intelligence

§  Understanding the full context in which events occur for maximum model fidelity

§  Reinforce signal and cancel out noise by overlaying different measurements of the same event

–  Fill in incomplete or missing data from single data sources

–  Corroborate similar data sources against each other to detect errors and fraud

–  Corroborate a fact analytically from dissimilar data sources

–  Identify subtle semantic and representation differences across data sets

Page 22: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 22

New Big Data capabilities needed to meet future market requirements

Page 23: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

© 2013 SpaceCurve, Inc. All rights reserved. 23

Delivering immediately actionable intelligence

Page 24: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

www.spacecurve.com

© 2013 SpaceCurve, Inc. All rights reserved. 24

Thank You!

J. Andrew Rogers Office: +1 206.453.2236 Email: [email protected] Twitter: @jandrewrogers

For More Information, Please Contact:

Page 25: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

New  Trends  and  Direc9ons  in  Data  Science    

Ma]  Piekarczyk  President  

Cor9x  Systems  

Page 26: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

Matt Piekarczyk"President"(703) 740-9162 x701"[email protected]"

Let  knowledge  flow"

Page 27: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 28: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 29: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

17 hrs /week spent gathering and fusing data

Page 30: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

80% Effort 1/3 Cost 11% Integrated

Page 31: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

0  

1  

2  

3  

4  

5  

1   201   401   601   801  

x  100000  

Fundamental Law

Page 32: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Parse Clean Map Find

Use

Page 33: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 34: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 35: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 36: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 37: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 38: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

There is a better way

Page 39: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Learn Learn Learn Learn

Use Share

Page 40: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Learning  solu9ons  

Page 41: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Custom dynamic fused data go  

Data is the platform

Page 42: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Page 43: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Cost

Focus

Underpowered High Risk

Page 44: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

Cost

Focus

Optimize Resource Allocation and Focus

Page 45: New Trends and Directions in Data Science - MIT Information Quality Conference - July 19th 2013

 The  7th  Annual  MIT  Chief  Data  Officer  &  Informa9on  Quality  Symposium  (CDOIQ)  

•  Mario  Faria  (Moderator)  •  J.Andrew  Rogers  (SpaceCurve)  •  Ma?  Piekarczyk  (CorDx  Systems)  

The  Debate