Top Banner
Stefan Arnborg, KTH http://www.nada.kth.se/~stefan Statistical Methods in Applied Computer Science DD2447, DD3342, spring 2011
40

Statistical Methods in Applied Computer Science

Mar 14, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Methods in Applied Computer Science

Stefan Arnborg, KTH http://www.nada.kth.se/~stefan

Statistical Methods in Applied Computer Science

DD2447, DD3342, spring 2011

Page 2: Statistical Methods in Applied Computer Science

SYLLABUS Common statistical models and their use: Bayesian, testing, and fiducial statistical philosophy Hypothesis choice Parametric inference Non-parametric inference Elements of regression Clustering Graphical statistical models Prediction and retrodiction Chapman-Kolmogoroff formulation Evidence theory, estimation and combination of evidence. Support Vector Machines and Kernel methods Vovk/Gammerman hedged prediction technology Stochastic simulation, Markov Chain Monte Carlo. Variational Bayes

Page 3: Statistical Methods in Applied Computer Science

LEARNING GOALS After successfully taking this course, you will be able to: -motivate the use of uncertainty management and statistical methodology in computer science applications, as well as the main methods in use, -account for algorithms used in the area and use the standard tools, -critically evaluate the applicability of these methods in new contexts, and design new applications of uncertainty management, -follow research and development in the area.

Page 4: Statistical Methods in Applied Computer Science

GRADING DD2447: Bologna grades Grades are E-A during 2009. 70% of homeworks and a very short oral discussion of them gives grade C. Less gives F-D. For higher grades, essentially all homeworks should be turned in on time. Alternative assignments will be substituted for those homeworks you miss. For grade B you must pass one Master's test, for grade A you must do two Master's tests or a project with some research content. DD3342: Pass/Fail Research level project, or deeper study of part of course

Page 5: Statistical Methods in Applied Computer Science
Page 6: Statistical Methods in Applied Computer Science
Page 7: Statistical Methods in Applied Computer Science
Page 8: Statistical Methods in Applied Computer Science

Applications of Uncertainty everywhere

Medical Imaging/Research (Schizophrenia) Land Use Planning

Environmental Surveillance and Prediction Finance and Stock

Marketing into Google Robot Navigation and Tracking

Security and Military Performance Tuning

Page 9: Statistical Methods in Applied Computer Science

Some Master’s Projects using this syllabus (subset)

•  Recommender system for Spotify •  Behavior of mobile phone users •  Recommender system for book club •  Recommender for job search site •  Computations in evolutionary genetics •  Gene hunting •  Psychiatry: genes, anatomy, personality •  Command and control: Situation awareness •  Diagnosing drilling problems •  Speech, Music, …

Page 10: Statistical Methods in Applied Computer Science
Page 11: Statistical Methods in Applied Computer Science

Aristotle: Logic Logic as a semi-formal system was created by Aristotle, probably inspired by current practice in mathematical arguments. There is no record of Aristotle himself applying logic, but probably the Elements of Euclid derives from Aristotles illustrations of the logical method.

Which role has logic in Computer Science??

Page 12: Statistical Methods in Applied Computer Science

Visualization •  Visualize data in such a way that the

important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970)

•  Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics

Page 13: Statistical Methods in Applied Computer Science

Probabilistic approaches

•  Bayes: Probability conditioned by observation •  Cournot: An event with very small probability

will not happen. •  Vapnik-Chervonenkis: VC-dimension and PAC,

distribution-independence •  Kolmogorov/Vovk: A sequence is random if it

cannot be compressed

Page 14: Statistical Methods in Applied Computer Science

Peirce: Abduction and uncertainty

Aristotles induction , generalizing from particulars, is considered invalid by strict deductionists. Peirce made the concept clear, or at least confused on a higher level. Abduction is verification by finding a plausible explanation. Key process in scientific progress.

Page 15: Statistical Methods in Applied Computer Science

Sherlock Holmes: common sense inference

Techniques used by Sherlock are modeled on Conan Doyle’s professor in medical school, who followed the methodological tradition of Hippocrates and Galen. Abductive reasoning, first spelled out by Peirce, is found in 217 instances in Sherlock Holmes adventures - 30 of them in the first novel, ‘A study in Scarlet’.

Page 16: Statistical Methods in Applied Computer Science

Thomas Bayes, amateur mathematician

If we have a probability model of the world we know how to compute probabilities of events. But is it possible to learn about the world from events we see? Bayes’ proposal was forgotten but rediscovered by Laplace.

Page 17: Statistical Methods in Applied Computer Science

An alternative to Bayes’ method - hypothesis testing - is based on

’Cournot’s Bridge’: an event with very small

probability will not happen

Antoine Augustine Cournot (1801--1877)���Pioneer in stochastic processes, market theory���and structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings).

Page 18: Statistical Methods in Applied Computer Science

Kolmogorov and randomness Andrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spaces and complex product spaces. Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL ���(Minimum Description Length) inference.

Page 19: Statistical Methods in Applied Computer Science

Normative claim of Bayesianism

•  EVERY type of uncertainty should be treated as probability

•  This claim is controversial and not universally accepted: Fisher(1922), Cramér, Zadeh, Dempster, Shafer, Walley(1999) …

•  Students encounter many approaches to uncertainty management and identify weaknessess in foundational arguments.

Page 20: Statistical Methods in Applied Computer Science

Foundations for Bayesian Inference •  Bayes method, first documented method

based on probability: Plausibility of event depends on observation, Bayes rule:

•  Bayes’ rule organizing principle for uncertainty •  Parameter and observation spaces can be extremely

complex, priors and likelihoods also. •  MCMC current approach -- often but not always

applicable (difficult when posterior has many local maxima separated by low density regions)

•  Variational Bayes –approximate posterior by factorized function – result also approximate.

Page 21: Statistical Methods in Applied Computer Science

Showcase application: PET-camera

f (! | D)" f (D | !) f (! )

Camera geometry&noise film scene regularity

and also any other camera or imaging device …

Page 22: Statistical Methods in Applied Computer Science

PET camera

D: film, count by detector pair j X: radioactivity in voxel i a: camera geometry

likelihood

prior

Inference about Y gives posterior, its mean is often a good picture

Page 23: Statistical Methods in Applied Computer Science

Sinogram and reconstruction

Tumour

Fruit Fly Drosophila family (Xray)

Page 24: Statistical Methods in Applied Computer Science

Introduction

GOMOS (Global Ozone Monitoring by Occultation ofStars)

The Royal Statistical SocietyLondon 10 December 2003

Page 25: Statistical Methods in Applied Computer Science

Markov chain Monte Carlo methods for highdimensional inversion in remote sensing

Heikki Haario1, Marko Laine1, Markku Lehtinen2

Eero Saksman3 and Johanna Tamminen4

1 University of Helsinki, Finland2 University of Oulu, Sodankylä, Finland

3 University of Jyväskylä, Finland4 Finnish Meteorological Institute, Helsinki, Finland

The Royal Statistical SocietyLondon 10 December 2003

Page 26: Statistical Methods in Applied Computer Science

!* WIRED on Total Information Awareness! WIRED (Dec 2, 2002) article "Total Info System Totally Touchy"! discusses the Total Information Awareness system.! !~~~ Quote:!"People have to move and plan before committing a terrorist act. Our!hypothesis is their planning process has a signature." !Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002. !!"What's alarming is the danger of false !positives based on incorrect data," !

!Herb Edelstein, in Wired, Dec 2, 2002. !

Page 27: Statistical Methods in Applied Computer Science

Combination of evidence f (! | D)" f (D | !) f (! )

f (! |{d1,d2}) " f (d1 | ! ) f (d2 | ! ) f (! )

In Bayes’ method, evidence is likelihood for observation.

Page 28: Statistical Methods in Applied Computer Science

Particle filter- general tracking

Page 29: Statistical Methods in Applied Computer Science

Chapman Kolmogorov version of Bayes’ rule

f (!t | Dt ) " f (dt | !t)# f (!t |!t$1) f (!t$1 | Dt$1 )d!t$1

Page 30: Statistical Methods in Applied Computer Science

Berry and Linoff have eloquently stated their preferences with ���the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more important than understanding how the model works".��� “Neural networks typically give the right answer”

Page 31: Statistical Methods in Applied Computer Science
Page 32: Statistical Methods in Applied Computer Science
Page 33: Statistical Methods in Applied Computer Science

1950-1980: The age of rationality. Let us describe the world with a mathematical model and compute the best way to manage it!! This is a large Bayesian Network, a popular statistical model

Page 34: Statistical Methods in Applied Computer Science

Ed Jaynes devoted a large part of his career to promote Bayesian inference. He also championed the use of Maximum Entropy in physics Outside physics, he received resistance from people who had already invented other methods. Why should statistical mechanics say anything about our daily human world??

Page 35: Statistical Methods in Applied Computer Science

Robust Bayes •  Priors and likelihoods are convex sets of probability

distributions (Berger, de Finetti, Walley,...): imprecise probability:

•  Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior.

•  For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate).

f (! | D)" f (D | !) f (! )F(! | D) " F(D | ! )F(! )

Page 36: Statistical Methods in Applied Computer Science

SVM and Kernel method Based on Vapnik-Chervonenkis learning theory Separate classes by wide margin hyperplane classifier, or enclose data points between close parallell hyperplanes for regression Possibly after non-linear mapping to highdimensional space Assumption is only point exchangeability

Page 37: Statistical Methods in Applied Computer Science

Classify with hyperplanes

Frank Rosenblatt (1928 – 1971) Pioneering work in classifying by hyperplanes in high-dimensional spaces. Criticized by Minsky-Papert, since real classes are not normally linearly separable. ANN research taken up again in 1980:s, with non-linear mappings to get improved separation. Predecessor to SVM/kernel methods

Page 38: Statistical Methods in Applied Computer Science

Find parallel hyperplanes Classification Red: true separating plane. Blue: wide margin separation in sample Classify by plane between blue planes

Page 39: Statistical Methods in Applied Computer Science

SVM and Kernel method

Page 40: Statistical Methods in Applied Computer Science

Vovk/Gammerman Hedged predictions

•  Based on Kolmogorov complexity or non-conformance measure

•  In classification, each prediction comes with confidence

•  Asymptotically, misclassifications appear independently and with probability 1-confidence.

•  Only assumption is exchangeability