Top Banner
Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart 1 Princeton August 31-September 4, 2020 1 These slides are heavily influenced by Matt Blackwell and Adam Glynn with contributions from Justin Grimmer and Matt Salganik. Illustrations by Shay O’Brien. Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 1 / 70
77

Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Sep 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Soc400/500: Applied Social Statistics

Week 1: Introduction and Probability

Brandon Stewart1

Princeton

August 31-September 4, 2020

1These slides are heavily influenced by Matt Blackwell and Adam Glynn with contributionsfrom Justin Grimmer and Matt Salganik. Illustrations by Shay O’Brien.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 1 / 70

Page 2: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 3: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 3 / 70

Page 4: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 5: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 6: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 7: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Why R?

It will give you super powers(but not at first)

It is free and open source

It is the de facto standard inmany applied statistical fields

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 7 / 70

Page 8: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Why RMarkdown?

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 8 / 70

Page 9: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 9 / 70

Page 10: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 11: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 12: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 13: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Ways to LearnPre-Recorded Lectures

Pre-Recorded Precept

Perusall

Course Meetings

Ed

Office Hours

Problem Sets

Instructor Office Hours

Final Exam Prep

External Consulting

Individual and Group Tutoring

Your Job: work hard and get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 13 / 70

Page 14: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Staying in Touch

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 14 / 70

Page 15: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 16: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 17: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 18: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 19: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Welcome To Class!

Be sure to read the syllabus for more details.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 19 / 70

Page 20: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 20 / 70

Page 21: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 21 / 70

Page 22: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 23: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Why study probability?

It enables inference.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 23 / 70

Page 24: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

In Picture Form

Data generatingprocess

Probability

Inference

Observeddata

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 24 / 70

Page 25: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

In Picture Form

Datagenerating

processObserved data

probability

inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 25 / 70

Page 26: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?I it tells us what the world would look like under a certain

assumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 27: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 28: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 29: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Preview: Connecting Theory and Evidence

“[Variables] empirically perform as theoretically predicted,by displaying statistically significant

effects net of other variables in the right direction”

Lundberg, Johnson, and Stewart. Setting the Target: Precise Estimandsand the Gap Between Theory and Empirics

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 30: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

The target tautology:

Research goals are defined by hypotheses about model coefficients

The goal is only defined within the statistical model

It becomes impossible to reason about other estimation strategies

Solution:

State the research goalseparately from the estimation strategy

Our diagnosis for the sourceof many methodological problems

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 31: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Connecting Theory and Evidence

Theory orgeneral goal

Theoreticalestimands

Empiricalestimands

Estimationstrategies

Seta specific

target

Linkto observable

data

Learnhow to estimate from

data we observe

By argument By assumption By data

Example tools: Target population,Causal contrast

Directed Acyclic Graphs,Potential outcomes

OLS regression,Machine learning

Evidence pointsto new questions

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 32: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 33: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 31 / 70

Page 34: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 32 / 70

Page 35: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

From ‘Probably’ to Probability

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 33 / 70

Page 36: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 37: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 38: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 39: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

A Running Visual Metaphor

Imagine that we sample one apple from a bag.Looking in the bag we see:

The sample space is:

S ==Ω { }, ,,Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 37 / 70

Page 40: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

EventsEvents are subsets of the sample space.

For example, if

S ==Ω { }, ,,then

{ }, ,

{ }and

are both events.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 38 / 70

Page 41: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Events Are a Kind of Set

Sets are collections of things, in this case collections of outcomes

One way to define an event is to describe the common property thatall of the outcomes share. We write this as

{ω|ω satisfies Property they share},

Example:

If A = {ω|ω has a leaf }:

A,A, A, A

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 39 / 70

Page 42: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Complement

A complement of event A, denoted Ac , is also a set.

Ac , is everything else not in A.

{ }, ,

{ }and

are complements.

Ac = {ω ∈ S|ω /∈ A}.

Important complement: Sc = ∅, where ∅ is the empty set.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 40 / 70

Page 43: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Unions and Intersections

The union of two events, A and B is the event that A or B occurs:

=

{ }, ,A ∪ B = {ω|ω ∈ A or ω ∈ B}.

The intersection of two events, A and B is the event that both A andB occur:

=

{ }A ∩ B = {ω|ω ∈ A and ω ∈ B}.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 41 / 70

Page 44: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Operations on Events

We say that two events A and B are disjoint or mutually exclusive ifthey don’t share any elements or that A ∩ B = ∅.

An event and its complement A and Ac are by definition disjoint.

Sample spaces can have infinite events where we will often write thedifferent events using subscripts of the same letter: A1,A2, . . .A∞(e.g. imagine an event that was the count of some object)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 42 / 70

Page 45: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1 normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

3. P( ) = P( ) P( )+when and aremutually exclusive.

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 46: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 47: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 48: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 46 / 70

Page 49: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 50: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 51: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

P( , ) = 4/10

P( ) = 7/10

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 49 / 70

Page 52: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 53: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 54: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 55: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 56: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Law of Total Probability (LTP)With 2 Events:

P(B) = P(B ,A) + P(B ,Ac)

= P(B |A)P(A) + P(B |Ac)P(Ac)

= P( ) P( )+P( )

= P( )| x P( ) + P( )| x P( )

In general, if {Ai : i = 1, 2, 3, . . . } forms a partition of the samplespace, then

P(B) =∑i

P(B ,Ai)

=∑i

P(B |Ai)P(Ai)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 52 / 70

Page 57: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 58: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 54 / 70

Page 59: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Bayes’ Rule

Often we have information about P(B |A), but want P(A|B).

When this happens, always think: Bayes’ rule

Bayes’ rule: if P(B) > 0, then:

P(A|B) =P(B |A)P(A)

P(B)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 55 / 70

Page 60: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 61: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 62: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 63: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 64: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Race and Names

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 57 / 70

Page 65: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 66: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 67: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 60 / 70

Page 68: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 69: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Independence, the Heroic Assumption

Deploy with Caution!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 62 / 70

Page 70: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 71: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 72: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 73: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 74: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 75: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

Example: Building a Spam Filter

This was a really advanced example (it is okay if you didn’tfollow all of it!).

Draws on all the probabilistic concepts we have introduced:I Bayes’ RuleI Law of Total ProbabilityI Conditional Independence

Shares the basic structure of many models particularly in use ofconditional independence.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 68 / 70

Page 76: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

This Week in Review

Course logistics

Core ideas in statistics

Foundations of probability

Three big probability concepts

Going Deeper:

Blitzstein, Joseph K., and Hwang, Jessica. (2019). Introductionto Probability. CRC Press. http://stat110.net/

Next week: random variables!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 69 / 70

Page 77: Soc400/500: Applied Social Statistics Week 1: Introduction and … · 2020. 9. 3. · Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton

References

Enos, Ryan D. “What the demolition of public housing teachesus about the impact of racial threat on political behavior.”American Journal of Political Science (2015).

Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart.”Setting the target: Precise estimands and the gap betweentheory and empirics.” (2020).

Salsburg, David. The Lady Tasting Tea: How StatisticsRevolutionized Science in the Twentieth Century (2002).

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 70 / 70