Top Banner
BIG DATA #mac201
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mac201 big data

BIG  DATA   #mac201  

Page 2: Mac201 big data
Page 3: Mac201 big data
Page 4: Mac201 big data

100  petabytes  of  data  

Page 5: Mac201 big data

1  petabyte  =  1000  terrabytes  

Page 6: Mac201 big data

1  terrabyte  =  1000  gigabytes  

Page 7: Mac201 big data

100  petabytes  =  100,000,000  gigabytes  

Page 8: Mac201 big data
Page 9: Mac201 big data
Page 10: Mac201 big data
Page 11: Mac201 big data
Page 12: Mac201 big data

Theory  free?  

‘Google’s  engineers  didn’t  bother  to  develop  a  hypothesis  about  what  search  terms  –  “flu  symptoms”  or  “pharmacies  near  me”  –  might  be  correlated  with  the  spread  of  the  disease  itself.  The  Google  team  just  took  their  top  50  million  search  terms  and  let  the  algorithms  do  the  work.’  -­‐  HarLord,  2014      

Page 13: Mac201 big data
Page 14: Mac201 big data

IdenOfied  key  data  points  associated  with  pregnancy    Hit  enough  indicators  =  receive  vouchers  

Page 15: Mac201 big data

Data  is  cheap  

Page 16: Mac201 big data
Page 17: Mac201 big data

As  our  communicaOon,  leisure  and  commerce  have  moved  to  the  internet  and  the  internet  has  moved  into  our  phones,  our  cars  and  even  our  glasses,  life  can  be  recorded  and  quanOfied  in  a  way  that  would  have  been  hard  to  imagine  just  a  decade  ago  (Harford,  2014)  

Page 18: Mac201 big data

4  big  claims:  1.  That  data  analysis  produces  uncannily  accurate  results;    2.  That  every  single  data  point  can  be  captured,  making  old  

staOsOcal  sampling  techniques  obsolete;    3.  That  it  is  unnecessary  to  concentrate  upon  what  causes  

what,  because  staOsOcal  correlaOon  tells  us  what  we  need  to  know;    

4.  That  scienOfic  or  staOsOcal  models  aren’t  needed  

Page 19: Mac201 big data

This  is  a  world  where  massive  amounts  of  data  and  applied  mathemaOcs  replace  every  other  tool  that  might  be  brought  to  bear.    Out  with  every  theory  of  human  behavior,  from  linguisOcs  to  sociology.  Forget  taxonomy,  ontology,  and  psychology.  Who  knows  why  people  do  what  they  do?    The  point  is  they  do  it,  and  we  can  track  and  measure  it  with  unprecedented  fidelity.  With  enough  data,  the  numbers  speak  for  themselves.  -­‐  Anderson,  2008  

Page 20: Mac201 big data

Eisenhower  vs  Stevenson  1952    

Page 21: Mac201 big data

Remington  Rand  UNIVAC  Cost  $1m  in  1952  

$8m  inflaOon  adjusted  

Page 22: Mac201 big data

Predicted:  438  to  93    

Page 23: Mac201 big data

Predicted:  438  to  93  Actual:  442  to  89    

Page 24: Mac201 big data
Page 25: Mac201 big data

Richard  Whihaker,  Senior  Vice  President  at  SAP  Labs    

Page 26: Mac201 big data

The  amount  of  likes  on  Facebook  can  predict  a  screening's  chance  of  selling  out,  allowing  film  marketers  to  focus  their  ahenOon  on  gekng  the  word  out  amongst  relevant  demographics.  

Eugene  Hernandez,  Director  of  Digital  Strategy  at  the  Film  Society  of  Lincoln  Center  

Page 27: Mac201 big data
Page 28: Mac201 big data

three

Page 29: Mac201 big data
Page 30: Mac201 big data
Page 31: Mac201 big data
Page 32: Mac201 big data
Page 33: Mac201 big data
Page 34: Mac201 big data
Page 35: Mac201 big data
Page 36: Mac201 big data
Page 37: Mac201 big data

"With  the  purchase  of  series,  we  look  at  what  does  well  on  piracy  sites  …  Prison  Break  is  excepOonally  popular  on  piracy  sites.  Kelly  Merryman,  content  acquisiOon  exec  at  NeLlix,  2013  

Page 38: Mac201 big data
Page 39: Mac201 big data
Page 40: Mac201 big data
Page 41: Mac201 big data
Page 42: Mac201 big data

“in  large    data  sets,    

large  deviaOons    are  vastly  more    ahributable  to    

variance  (or  noise)    than  to  informaOon  

 (or  signal)”    Taleb,  2013  

Page 43: Mac201 big data
Page 44: Mac201 big data

Summary  

•  Big  data  is  being  employed  to  track  digital  media  users  and  their  behaviours  

•  Offers  new  insights  into  real  world  pracOces  and  preferences  

•  Incorrect  or  theory  free  assumpOons  can  mean  that  big  data  =  big  errors  

Page 45: Mac201 big data

Images  CC  •  JD  Hancock  (2012)  Big  Data  •  Michael  Donovan  (2009)  Large  Hadron  Collider  •  Torkild  Retvedt  (2009)  Server  room  at  CERN  •  Carlos  Luna  (2008)  Google  •  Peter  Kirkeskov  Rasmussen  (2014)  Social  Media  •  David  Telford  (2011)  082/365  man-­‐flu!  •  Mike  Mozart  (2014)  Target  •  R2hox  (2013)  data.path  Ryoji.Ikeda  –  4  •  Sean  MacEntee  (2010)  social  media  •  Kevin  P  Trovini  (2014)  Google  Glass  (Red)  •  Sean  MacEntee  (2014)  data  is  oil  •  Charis  Tsevis  (2012)  I  Like  Facebook  •  R2hox  (2013)  data.path  Ryoji.Ikeda  -­‐  3