1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.

1

Methods of Experimental Particle Physics

Alexei Safonov

Lecture #22

2

Maximum Likelihood• Likelihood for N measurements xi:

• It is essentially a joint p.d.f. seen as a function of the parameters q

• While xi could be any measurements, it’s easy to visualize it as a histogram of x with N bins; in each bin you calculate how probable it is to see xi for a particular q (or set of q’s)

• In ML, the estimators are those values of q that maximize the likelihood function• One could use MINUIT to find the position of

minimum for -log(L)

Example: Maximum Likelihood• If you look at a distribution of the number of enetries in

each bin, the data (in each bin) follows Poisson distribution• n=0,1,.. l>0

• Define Likelihood:

• Best parameters where L is at its maximum

• f is the gaussian signal + polynomial background

• Well defined for any size of signals and small Ni

• How do you know if your results make sense?• For any data, this method will find SOME minimum for any

function. It gives you “best” results, not “sensible” ones.

• Does the magnitude of L tell you how likely it is that things are ok?

Likelihood Function Magnitude• Let’s guesstimate (actually overestimate) Lmax for our

example:• 80 bins, average probability for outcomes in each bin ~50%• Lmax=(1/2)80 while data and fit would be clearly compatible

• Well, we didn’t ask the right question • Lmax gives the probability of this EXACT outcome, which is of

course very unlikely

• We wanted to know “can something like this happen”?

Hypothesis Testing• We are trying to figure out if the minimum we got makes

sense or not• Is it artificial or do we have a reasonable degree of belief that

these numbers at the minimum make sense?

• Ask a different (kind of opposite) question: • If the function with the parameters

you picked as best is true, is it probable to see the data you actually saw?

• Studying how the data should look seems easy: • Take the function, and generate fake “pseudo-data” using a Monte

Carlo generator following the function

• Hypothetically, you would want to say that if the real data and most of the “fake data” show a similar degree of similarity between the data and the function, you are presumably good

Hypothesis Testing• But how do you quantify whether this particular set of

fake data looks more function-like than the true data or not?• You can use the likelihood value you calculated:• where all parameters are taken at the position of the maximum

of L• For any function and data it’s a number, so easy to compare

• You could then build a scheme that looks like this:• Calculate L using real data at

the minimum, call it L0

• Generate fake data according to the function, calculate L and check if L>L0

• Repeat above step a million times• If 90% of the time fake data gives L>L0

that’s good• If 99% of the time fake data gives you

L>L0 that’s not good

The p-value• What we just calculated is called the p-value

• Probability for the data to look even less likely than what we observed in data assuming that the function is correct

• Calculate p-value:• Pseudoexperiment: take your function

( with “best” parameters) and “simulate” “data” according to the function (allowing statistical fluctuations), every time calculate and record likelihood L

• Do 1,000,000 pseudoexperiments, check how often you get L lower than the Lmax you actually observed.

• Small p-value (say less than 1%) tells you that your data does not follow the function you are trying to describe it with• When searching for smaller signals, can use p-value calculated using

pseudoexperiments following the “background only” model: small p-values tell you that data does not like the “background only” model• Caveat: as a single number, such p-value does not tell you if adding signal helps

8

Hypothesis Testing• What we have described, even better works if you have

a fixed “hypothesis” (no parameters) and want to check if the data is consistent with that hypothesis• In the initial fit we have found “best”

parameters and took it as L0; in pseudo-experiments we never had this additional flexibility to pick something that is “best”

• But for a fixed hypothesis there is never a problem

• Example:• You stare at data and you have a well

defined background prediction (say from MC simulation) • You suspect there may be a bump in the data but you are not

sure • A good question to ask is “does the data look like the

prediction?” – in other words what is the p-value?

9

Signal or No Signal?• When you start looking for Higgs, you don’t know its

mass or cross-section• Comparing data and the background prediction is great, but it

only tells you whether the data and background expectation look alike or not• If something fishy is going on, you will see a low p-value, which

tells you that data does not look like background prediction• But it does not tell you if it looks like you may have found a Higgs

(maybe it’s something else in there, which makes the data and predictions disagree)

• Need to answer a question “does it look more like X or more like Y”?• If both X and Y are fixed (X is background only, Y is background

plus signal with known mass and cross-section), one could probably calculate two p-values and then make a judgment

• The caveat is that in all real searches you almost never know the mass – what do you do then?

10

Hypothesis Testing – Unknown Mass• Say I don’t know the mass, but I think I know

the cross-section for each mass• As if I believe SM predictions and look for SM Higgs

• How do you account for unknown mass?• Proposal #1:

• Calculate p-value for Background and Background+Signal for every possible value of Higgs mass (say you scan over 100 values)• One p-value for Background only that tells you that data

does not look like background (say p=0.001%) and 100 p-values for each mass each saying you different things (most are small like p=0.001-1%, but the one at 140 is p=40%)

• One issue is having two p-values, but there is another one too

11

P-value for Comparing Hypotheses• In the past you were using L to say is something is function-like or

not• Strictly speaking, you could have picked a different metrics, L is not

the only possible choice• When comparing hypotheses, a good choice is the ratio of joint

p.d.f.’s

• It tells you at each point if something is more H1-like or H2-like• When you do pseudo-experiments in calculating the p-value, you will

still generate data according to background model (if you are determining if something is background-like), you will just use this statistic as a metrics in deciding if the pseudo-data is more “alike” than the true data or not

• Can successfully define p-value, p=1% will tell you that in only 1% of the cases the data, if it’s truly following H0, would look like yours• If 1% was your threshold, you will reject the hypothesis H0 in favor of H1

• Btw, what if both are wrong?

12

Unknown Mass• Say we did define this

relative p-value and calculated it for each mass• Note: the plot below is

something slightly different in that it allows for unknown cross-section of the signal

• There is a clear bump somewhere near 125 that has p-value of ~1%• Does it mean that there is

only 1% that this is background and 99% that this is Higgs?

A Caveat: Combinatorial Trial Factor

• Also called “Look Elsewhere Effect” or LEE:• Local p-value tells us how significant the deviation

is in this specific point• This would be a correct estimate of signal

significance if we knew Higgs mass ahead of time • But we didn’t

• It is like looking at 1,000 data plots: • Even if all of them truly

came from their expected distributions, on average one of them must appear only 0.1% probable From the X-Files of HEP

experimentalists: A bump that turned out to not be real

14

Combinatorial Trial Factors• Our p-value ignores the fact that

every time something jumps up it looks more like Higgs and less like background• Need to account for that in your

pseudoexperiments • Hense the word “local” in the

bottom plot

• When you do pseudo-experiments, you should also try all sorts of masses just like in data to see how bad the data can deviate from the prescribed expectation even if it is truly following the expectation• More Monte Carlo pseudo-experiments

calculating the p-value

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.

Documents

data look

fake data

real data

fake pseudodata

likelihood function

hypothesis testing slide

maximum likelihood slide

pvalue probability