Transcript

Change  Point  Detec.on  with  Bayesian  Inference  

By  Frank  Kelly  Py  data  

6th  January  2015  

Overview  

•  Nigeria,  oil  wells  &  drilling  •  Noisy  data  •  Some  maths  •  Python  implementaDon  •  Examples  in  different  domains  

FPSO  (oil  plaIorm  picture)  

Mud  pulse  telemetry  

•  InformaDon  encoded  digitally,  transmiOed  via  pressure  pulses  through  mud  fluid.  

•  Alert  drillers  that  they  have  reached  oil,  detect  rock  types  and  general  monitoring.  

The  problem  

•  Poor  bit  rate  and  resoluDon  

•  Time  consuming  analysis  

Approaches  to  staDsDcs  

•  FrequenDst  – Data  gathered  is  a  repeatable  random  sample.  “Frequency”  

– Underlying  parameters  are  constant  

– Fisher’s  0.05  

•  Bayesian  – Data  are,  fixed  and  observed  from  the  realised  sample  

– Parameters  unknown  and  described  probabilisDcally  

–  Introduce  “subjecDvity”  

 

FrequenDst  vs.  Bayesian  

The  Theory:  Bayesian  inference  

•  Methodology  of  mathemaDcal  inference:    –  Choosing  between  several  possible  models  –  ExtracDng  parameters  for  these  models  

•  Bayes’  Theorem:  

Rev  Thomas  Bayes  1702  -­‐  1761  

p(w |D) = p(D |w)p(w)p(D)

Likelihood  Prior  

Probability  

Posterior  Probability   Evidence  

-­‐  Remove  nuisance  parameters  by  marginalisaDon  

-­‐  InteresDng  ones  remain  

Modelling  the  problem  

µ2

m

N

0   20   40   60   80   100   120   140   160   180   200  0.5  

1  

1.5  

2  

2.5  

data  =  model  +  noise  

 •  a  sequence  of  N  

samples  of  data  from  a  piecewise  constant  source  with  added  Gaussian  noise.  

•  Noise  independent  of  mean,  idenDcally  distributed  and  S.D.  =  σ  

•  Heterogenous:  divide  into  two  homogenous  segments  

µ2⎩⎨⎧

+

+=

i

ii e

ed

2

1

µµ

Nimmi≤<

Nm

Single  changepoint  detector:  How  does  it  work?  

 •  SubsDtute  likelihood  into  Bayes’ Law  

–  Simple  model-­‐  consider  Ockham’s  Razor  

•  Interested  in  changepoint  locaDon  m,  integrate  w.r.t.  the  nuisance  parameters  (µ1,  µ2  and  σ)…rearrange  this…  

•  …get  a  BIG  expression  for  p({m}|dI),  code  in  Python  

•  On  running  obtain  most  likely  changepoint  locaDon  

Ockham’s  razor:  hOp://www.jstor.org/discover/10.2307/29774559?sid=21105568247973&uid=3738032&uid=4&uid=2    

The  maths  

More  maths  

•  Integrate  w.r.t.  (and  thereby  remove)  nuisance  parameters  

Other  applicaDons…  

hOp://moz.com/google-­‐algorithm-­‐change  

“Google’s  algorithm  is  the  “secret  sauce  recipe”  that  has  enabled  it  to  dominate  search.”      -­‐  FT.com  16th  Sept  2014  

hOp://www.p.com/cms/s/0/9615661c-­‐3ce1-­‐11e4-­‐9733-­‐00144feabdc0.html?siteediDon=uk#axzz3DSwXYAW8  

Any  business  with  an  online  presence  today  open  struggles  to  accurately  evaluate:      ●  The  quality  of  their  website  and  associated  linking  pages,  as  perceived  by  Google    ●  The  robustness  of  their  website  to  a  sudden  change  in  Google’s  search  algorithm  

Web  traffic  

30000  

35000  

40000  

45000  

50000  

55000  

60000  

raw  daily  google  search-­‐sourced  pageviews  

Web  traffic  (2)  

30000  

35000  

40000  

45000  

50000  

55000  

60000  

smoothed  data  using  moving  average  

Web  traffic  (3)  

30000  

35000  

40000  

45000  

50000  

55000  

60000  

smoothed  data  with  cyclicality  removed  

Web  traffic  (4)  

-­‐838  

-­‐837.5  

-­‐837  

-­‐836.5  

-­‐836  

-­‐835.5  

-­‐835  

-­‐834.5  

-­‐834  

-­‐833.5  

-­‐833  

30000  

35000  

40000  

45000  

50000  

55000  

60000  

likelihood  of  change  in  data  plo>ed  over  .me  

day  removed   likelihood  CP  

number  of  tropical  storms  per  year  in  the  North  AtlanDc  

Data  obtained  from  ibtracs  database:  hOps://www.ncdc.noaa.gov/ibtracs/  

"Amo  Dmeseries  1856-­‐present"  by  Rosentod,  Marsupilami  -­‐  hOp://www.cdc.noaa.gov/CorrelaDon/amon.us.long.data.  Licensed  under  Public  Domain  via  Wikimedia  Commons  -­‐  hOp://commons.wikimedia.org/wiki/File:Amo_Dmeseries_1856-­‐present.svg#mediaviewer/File:Amo_Dmeseries_1856-­‐present.svg  

Other  applicaDons  /  possibiliDes  

•  Financial  markets  and  poliDcal  events  

•  Combine  with  frequenDst  staDcal  methods:  – Use  of  GLR  in  online  (moving  window)  detecDon  applicaDon  

•  Your  own  data/  ideas  !  

Thank  you  •  Link  to  Python  code  on  github:  

hOps://github.com/swhustla/pydata-­‐bayes-­‐changepoint    –  Single  changepoint  detector  (as  seen  tonight)  –  Dual  changepoint  detector  –  Ramp  detector  

•  Further  reading:  –  Numerical  Bayesian  Methods  Applied  to  Signal  Processing  (StaDsDcs  and  CompuDng)  by  Fitzgerald,  O’Ruanaidh,  1996  :  hOp://www.amazon.co.uk/Numerical-­‐Bayesian-­‐Processing-­‐StaDsDcs-­‐CompuDng/dp/0387946292      

–  Bayesian  Inference  on  Change  Point  Problems  (2007)hOp://www.cs.ubc.ca/~murphyk/Students/Xuan_MSc07.pdf    

 TwiOer:  @norhustla  Email:  frank.kelly@cantab.net  

Thank  you  •  AddiDonal  links:  

–  Google  Algo  updates:    hOp://moz.com/google-­‐algorithm-­‐change    –  Mathsight  -­‐>  insights  into  algorithm  changes  hOp://mathsight.org    –  AtlanDc  mulD-­‐decadal  oscillaDon  spaDal  paOern:

hOp://commons.wikimedia.org/wiki/File:AMO_PaOern.png  –  NaDonal  climaDc  data  center  hOps://www.ncdc.noaa.gov/ibtracs/    –  Ockham’s  Razor  and  Bayesian  Inference:  

hOp://www.jstor.org/discover/10.2307/29774559?sid=21105568247973&uid=3738032&uid=4&uid=2  

–  ConverDng  from  Matlab  to  Python:  hOp://mathesaurus.sourceforge.net/matlab-­‐numpy.html    

 TwiOer:  @norhustla  Email:  frank.kelly@cantab.net  

top related