Top Banner
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 2: Introduction to probability basics Jason Mezey [email protected] Jan. 30, 2018 (T) 8:40-9:55
32

Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Jun 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Quantitative Genomics and Genetics

BTRY 4830/6830; PBSB.5201.01

Lecture 2: Introduction to probability basics

Jason [email protected]

Jan. 30, 2018 (T) 8:40-9:55

Page 2: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Announcements I• Registration updates / reminders:

• You must register for both the lecture and lab

• In Ithaca, undergrads register for 4830 / grads for 6830 (please register if you can)!

• For Post-docs, Continuing Education students - fill out, scan and send me any forms you need signed - I will sign, scan send back

• For students at Weill: please register through “LEARN” system

• For students at Rockefeller: email Kristen Cullen [email protected]

• For postdocs in NYC and students at MSK: Please fill out the “Application for Non-Degree Students” ASAP (!!) and get it to [email protected]

• You may take the class for a grade, S/U | P/F, or Audit (please register as an Audit if you can do so)

Page 3: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Announcements II• Remember that the locations of the lecture in NYC and the

computer lab can change lecture to lecture - week to week (!!) - see the class schedule now posted on the website (see next slide)

• First Computer lab is this week (!!):

• In Ithaca, regardless of your registration, computer labs in Ithaca will be either Thurs. 5-6PM (!!) in MNLB30A (Mann Library Basement) OR Fri. 8-9AM in 224 Weill Hall

• In Ithaca, you are welcome to go to either regardless of where you register (pending this working well for Manisha)

• In NYC, computer lab is Thurs. 4-5PM in LC-504 - 5th Floor Conference Room, 1300 York Ave

• Unless you are going to MNLB30A bring your laptop!

Page 4: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Announcements III

• Class website: http://mezeylab.cb.bscb.cornell.edu/

• First 2018 class materials are now posted!

• Check back often (!!)

• Jason will hold office hours every Thurs. 2-4PM (starting this Thurs.)!

• We currently plan to use zoom: https://cornell.zoom.us/j/724550601

• You must therefore sign up for a Zoom account

• If you cannot do this, please contact me

Page 5: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

• MAKE SURE YOU SIGN UP ON PIAZZA whether you officially register or not = all course communication (!!): https://piazza.com/class/jckpr075ilk5n4?cid=7

• Step 1: Sign up on Piazza (if you don’t have an account already)!

• Step 2: Enroll in BTRY 6830 (regardless if you are grad or undergrad)

• If you cannot enroll email Manisha directly!!

• Question Posting Protocol:

• Public posts (Let the community of students and instructors help out)

• Private posts (To Jason, Manisha, Zijun)

• Please note that expected response times to questions will be minimum >24hrs (sometimes longer...) depending on the availability of the instructors

• We encourage public posts so that your classmates can help you out as well

• ONCE YOU ENROLL ON PIAZZA PLEASE DON’T email Jason / Manisha / Zijun’s on their direct email (unless it’s an emergency)

Announcements IV

Page 6: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

• MAKE SURE YOU SIGN UP ON CMS if you plan to do work for the course (registered or unregistered)

• If you do not have a NetID you will need to email Manisha and she will get you signed up

• Assignments will be posted on CMS (https://cms.csuglab.cornell.edu/ ) for BTRY 6830

• All submissions should be made through the CMS website - DO NOT email your homework directly to Jason / Manisha / Zijun (!!)

Announcements V

Page 7: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

• Homework #1 is posted (!!) and will be due at 11:59PM, Feb. 5

• You must upload your homework by 11:59PM on Mon. 2/5 (otherwise it is late - no excuses!!)

• Answers must be typed (!!) - please talk to us if this is a problem...

• Homeworks are “open book” and you may work together but you MUST hand in your own work (i.e., a copy of someone’s written answer will not be accepted)

• Problems will be divided into “easy”, “medium, and “hard”

Announcements VI

Page 8: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Summary of lecture 2: Introduction to probability basics

• Last lecture, we provided a broad introduction to the field of quantitative genomics and genetics, which is concerned with modeling and the discovery of relationships between genomes (genotypes) and phenotypes

• In this class, we will be concerned with the most basic problem of the field: how to identify genotypes where differences among individual genomes produce differences in individual phenotypes (e.g. genetic association studies)

• Today, we will discuss critical foundational concepts in biology and the modeling framework for the field, which is developed from the fields of probability and statistics

Page 9: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Foundational biology concepts

• In this class, we will use statistical modeling to say something about biology, specifically the relationships between genotype (DNA) and phenotype

• Let’s start with the biology by asking the following question: why DNA?

• The structure of DNA has properties that make it worthwhile to focus on...

Page 10: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

It’s the same in all cells

with a few exceptions (e.g. cancer, immune system...)

• In multicellular organisms, the structure of the genome is (almost) perfectly copied duringthe replication of cells.

• The genome is the same in every non-cancer cell of a multicellular organism, with just a fewexceptions; So, we may refer to the genome of an individual ; In cancers, the genome di↵ersfrom cell to cell, such that it is more problematic to refer to the genome of a cancer.

• The genome provides instructions for how biological processes proceed (e.g., development,metabolism, environmental response); So, the genome is an important determinant of themeasurable characteristics of an organism or cancer.

• In the production of a new organism or o↵spring, either the entire genome (e.g., bacteria)or a subset of the genome (e.g., half from each parent in humans) is copied almost perfectlyfrom parent to o↵spring; The copying of genomes from parents to o↵spring is the primaryreason why o↵spring tend to resemble their parents.

Figure 1: A simplified schematic showing genome organization in human cells. The DNA of agenome is located within the nucleus of a cell. The genome is organized in long strings that aretightly coiled around protein structures to form chromosomes. Each string is a double helix wherethe building blocks are A-T and G-C nucleotide pairs c� kintalk.org.

2

Page 11: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

It’s passed on to the next generation

Page 12: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004

The Structure of DNA

Credit: Jones and Pevzner, An Introduction to Bioinformatics Algorithms, MIT Press, 2004

Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004 Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004

Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004 Credit: Watson et al., Molecular Biology of the Gene, CSHL Press, 2004

It has convenient structure for quantifying differences

Page 13: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

It’s responsible for the construction and maintenance of organisms

Note: other regions of genomes can impact phenotypes...

Page 14: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Statistics and probability I

• Quantitative genomics is a field concerned with the modeling of the relationship between genomes and phenotypes and using these models to discover and predict

• We will use frameworks from the fields of probability and statistics for this purpose

• Note that this is not the only useful framework (!!) - and even more generally - mathematical based frameworks are not the only useful (or even necessarily “the best”) frameworks for this purpose

Page 15: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Statistics and probability II

• A non-technical definition of probability: a mathematical framework for modeling under uncertainty

• Such a system is particularly useful for modeling systems where we don’t know and / or cannot measure critical information for explaining the patterns we observe

• This is exactly the case we have in quantitative genomes when connecting differences in a genome to differences in phenotypes

Page 16: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Statistics and probability III

• We will therefore use a probability framework to model, but we are also interested in using this framework to discover and predict

• More specifically, we are interested in using a probability model to identify relationships between genomes and phenotypes using DNA sequences and phenotype measurements

• For this purpose, we will use the framework of statistics, which we can (non-technically) define as a system for interpreting data for the purposes of prediction and decision making given uncertainty

Page 17: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Definitions: Probability / Statistics

• Probability (non-technical def) - a mathematical framework for modeling under uncertainty

• Statistics (non-technical def) - a system for interpreting data for the purposes of prediction and decision making given uncertainty

These frameworks are particularly appropriate for modeling genetic systems, since we are missing information concerning the complete set of components and relationships among components that determine genome-phenotype relationships

Page 18: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Conceptual OverviewSystem

Questi

on

Experiment

Sample

Assumptions

InferencePr

ob. M

odels

Statistics

Page 19: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Starting point: a system

• System - a process, an object, etc. which we would like to know something about

• Example: Genetic contribution to height

Genome Height

SNP {A

T

Taller (on average)

Shorter (on average)?

Page 20: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Starting point: a system

• System - a process, an object, etc. which we would like to know something about

• Examples: (1) coin, (2) heights in a population

Coin - same amount of metal on both sides?

Heights - what is the average height in the US?

Page 21: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Experiments (general)

• To learn about a system, we generally pose a specific question that suggests an experiment, where we can extrapolate a property of the system from the results of the experiment

• Examples of “ideal” experiments (System / Experiment):

• SNP contribution to height / directly manipulate A -> T keeping all other genetic, environmental, etc. components the same and observe result on height

• Coin / cut coin in half, melt and measure the volume of each half

• Height / measure the height of every person in the US

Page 22: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Experiments (general)

• To learn about a system, we generally pose a specific question that suggests an experiment, where we can extrapolate a property of the system from the results of the experiment

• Examples of “non-ideal” experiments (System / Experiment):

• SNP contribution to height / measure heights of individuals that have an A and individuals that have a T

• Coin / flip the coin and observe “Heads” and “Tails”

• Height / measure some people in the US

Page 23: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Experiments and samples

• Experiment - a manipulation or measurement of a system that produces an outcome we can observe

• Experimental trial - one instance of an experiment

• Sample outcome - a possible outcome of the experiment

• Sample - the results of one or more experimental trials

• Example (Experiment / Sample outcomes):

• Coin flip / “Heads” or “Tails”

• Two coin flips / HH, HT, TH, TT

• Measure heights in this class / 1.5m, 1.71m, 1.85m, ...

Page 24: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Modeling the results of (non-ideal) experiments

• Mathematics (while not the only approach!) provides a particularly valuable foundation for describing or modeling a system or the outcomes of an experiment

• The reason is that a considerable amount of mathematics is constructed (on purpose!) to provide a good representation of how we think about the world in a way that matches our intuition

• Once constructed, we can use this modeling approach to formalize our intuition in a manner that has currency for others and develop deeper understanding

• In general, mathematics useful for modeling (including probability) can be developed from foundations developed in set theory

• A lot of assumptions, called axioms, are at the foundation of set theory put in place so that set theory produces logical constructions that match our intuition

Page 25: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Sets / Set Operations / Definitions• Set - any collection, group, or conglomerate

• Element - a member of a set

• Set Operations:

• Important Definitions:

• A Special Set:

Union (⇧) � an operator on sets which produces a single set containing all elementsof the sets.

Intersection (⌃) � an operator on sets which produces a single set containing all ele-ments common to all of the sets.

Note that we can think of these as ‘or’ and ‘and’. A simple example of applying the unionoperator is {5�, 5�3��} ⇧ {5�3��, 5�5���} = {5�, 5�3��, 5�5���} and a simple example of intersectionis {5�, 5�3��} ⌃ {5�3��, 5�5���} = {5�3��}. Note that we can write the following generalizationsof these operators:

⇥�

i=1

Ai = A1 ⇧A2 ⇧ ... (1)

⇥⇥

i=1

Ai = A1 ⌃A2 ⌃ ... (2)

where each Ai is a set. Before we leave sets and sample spaces, let’s provide a few otherimportant definitions:

Subset (⇥) � a set that is contained within another set, e.g. {H} ⇥ {H,T}

Complement (Ac) � the set containing all other elements of a set other than A, e.g.{H}c = {T}.

Empty Set (⇤) � the set with no elements.

The empty set is unique and is sometimes represented as { }.

Disjoint Sets � sets with no elements in common.

Note that for disjoint sets Ai and Aj , the following holds: Ai ⌃Aj = ⇤.

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) � a mathematical operator that takes an input and produces anoutput.

5

Union ([) ⌘ an operator on sets which produces a single set containing all elementsof the sets.

Intersection (\) ⌘ an operator on sets which produces a single set containing all ele-ments common to all of the sets.

Note that we can think of these as ‘or’ and ‘and’. A simple example of applying the unionoperator is {50, 50300} [ {50300, 505000} = {50, 50300, 505000} and a simple example of intersectionis {50, 50300} \ {50300, 505000} = {50300}. Note that we can write the following generalizationsof these operators:

1[

i=1

Ai = A1

[A2

[ ... (1)

1\

i=1

Ai = A1

\A2

\ ... (2)

where each Ai is a set. Before we leave sets and sample spaces, let’s provide a few otherimportant definitions:

Subset (⇢) ⌘ a set that is contained within another set, e.g. {H} ⇢ {H,T}

Complement (Ac) ⌘ the set containing all other elements of a set other than A, e.g.{H}c = {T}.

Disjoint Sets ⌘ sets with no elements in common.

Empty Set (;) ⌘ the set with no elements (the empty set is unique and is sometimesand is sometimes represented as { }).

Disjoint Sets ⌘ sets with no elements in common.

Note that for disjoint sets Ai and Aj , the following holds: Ai \Aj = ;.

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

6

Union ([) ⌘ an operator on sets which produces a single set containing all elementsof the sets.

Intersection (\) ⌘ an operator on sets which produces a single set containing all ele-ments common to all of the sets.

Note that we can think of these as ‘or’ and ‘and’. A simple example of applying the unionoperator is {50, 50300} [ {50300, 505000} = {50, 50300, 505000} and a simple example of intersectionis {50, 50300} \ {50300, 505000} = {50300}. Note that we can write the following generalizationsof these operators:

1[

i=1

Ai = A1

[A2

[ ... (1)

1\

i=1

Ai = A1

\A2

\ ... (2)

where each Ai is a set. Before we leave sets and sample spaces, let’s provide a few otherimportant definitions:

Element of (2) ⌘ an object within a set, e.g. H 2 {H,T}

Subset (⇢) ⌘ a set that is contained within another set, e.g. {H} ⇢ {H,T}

Complement (Ac) ⌘ the set containing all other elements of a set other than A, e.g.{H}c = {T}.

Disjoint Sets ⌘ sets with no elements in common.

Empty Set (;) ⌘ the set with no elements (the empty set is unique and is sometimesand is sometimes represented as { }).

Disjoint Sets ⌘ sets with no elements in common.

Note that for disjoint sets Ai and Aj , the following holds: Ai \Aj = ;.

N = {1, 2, 3, ...} (3)

Z = {...� 3,�2,�1, 0, 1, 2, 3, ...} (4)

R = { 0!} (5)

�1 > x >1 (6)

6

Page 26: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Some Special Sets

• The following sets have properties that align with our intuitive conception about how we represent and use groups

• The Natural Numbers and the Integers:

• The Reals:

• Note that these sets are infinite (although they represent two different “sizes” of infinite: countable and uncountable), where we often make use of the following symbols in both cases:

Union ([) ⌘ an operator on sets which produces a single set containing all elementsof the sets.

Intersection (\) ⌘ an operator on sets which produces a single set containing all ele-ments common to all of the sets.

Note that we can think of these as ‘or’ and ‘and’. A simple example of applying the unionoperator is {50, 50300} [ {50300, 505000} = {50, 50300, 505000} and a simple example of intersectionis {50, 50300} \ {50300, 505000} = {50300}. Note that we can write the following generalizationsof these operators:

1[

i=1

Ai = A1

[A2

[ ... (1)

1\

i=1

Ai = A1

\A2

\ ... (2)

where each Ai is a set. Before we leave sets and sample spaces, let’s provide a few otherimportant definitions:

Subset (⇢) ⌘ a set that is contained within another set, e.g. {H} ⇢ {H,T}

Complement (Ac) ⌘ the set containing all other elements of a set other than A, e.g.{H}c = {T}.

Disjoint Sets ⌘ sets with no elements in common.

Empty Set (;) ⌘ the set with no elements (the empty set is unique and is sometimesand is sometimes represented as { }).

Disjoint Sets ⌘ sets with no elements in common.

Note that for disjoint sets Ai and Aj , the following holds: Ai \Aj = ;.

N = {1, 2, 3, ...} (3)

Z = {�3,�2,�1, 0, 1, 2, 3, ...} (4)

R = { 0!} (5)

�1 > x >1 (6)

6

Union ([) ⌘ an operator on sets which produces a single set containing all elementsof the sets.

Intersection (\) ⌘ an operator on sets which produces a single set containing all ele-ments common to all of the sets.

Note that we can think of these as ‘or’ and ‘and’. A simple example of applying the unionoperator is {50, 50300} [ {50300, 505000} = {50, 50300, 505000} and a simple example of intersectionis {50, 50300} \ {50300, 505000} = {50300}. Note that we can write the following generalizationsof these operators:

1[

i=1

Ai = A1

[A2

[ ... (1)

1\

i=1

Ai = A1

\A2

\ ... (2)

where each Ai is a set. Before we leave sets and sample spaces, let’s provide a few otherimportant definitions:

Subset (⇢) ⌘ a set that is contained within another set, e.g. {H} ⇢ {H,T}

Complement (Ac) ⌘ the set containing all other elements of a set other than A, e.g.{H}c = {T}.

Disjoint Sets ⌘ sets with no elements in common.

Empty Set (;) ⌘ the set with no elements (the empty set is unique and is sometimesand is sometimes represented as { }).

Disjoint Sets ⌘ sets with no elements in common.

Note that for disjoint sets Ai and Aj , the following holds: Ai \Aj = ;.

N = {1, 2, 3, ...} (3)

Z = {...� 3,�2,�1, 0, 1, 2, 3, ...} (4)

R = { 0!} (5)

�1 > x >1 (6)

6

0-1-3 1-2 2 3

l(✓̂1

|y) = l(�̂µ, �̂a, �̂d|y) (187)

l(✓̂0

|y) = l(�̂µ, 0, 0|y) (188)

x =

2

6664

1 x1,a x

1,d

1 x2,a x

2,d...

.... . .

1 xn,a xn,d

3

7775

�[t] =

2

64�[t]µ

�[t]a

�[t]d

3

75

F[2,n�3]

(y,xa,xd) = f

✓ SSE(

ˆ✓0

)�SSE(

ˆ✓1

)

2

SSE(

ˆ✓1

)

n�3

◆(189)

Pr(µ|y) / N

✓( �2

+Pn

i yi�2

)

( 1

�2

+ n�2

), (

1

�2

+n

�2

)�1

◆(190)

Pr(�a,�d|y) =Z 1

0

Z 1

�1Pr(�µ,�a,�d,�

2

✏ |y)d�µd�2

✏ (191)

�↵ = �a

✓a+

�d2(p

1

� p2

)

◆(192)

�̂µ,0 (193)

H0

: Cov(Y,X) (194)

;R =To see how this is accomplished in a permutation analysis, let’s first describe a permutation.If we write our data in a matrix as follows:

Data =

2

64z11

... z1k y

11

... y1m x

11

... x1N

......

......

......

......

...zn1 ... znk yn1 ... ynm x

11

... xnN

3

75

where the latter columns are the genotypes, a permutation is produced by randomizing thephenotype samples y keeping the genotypes in the same order, e.g.:

Y = �µ +Xa�a +Xd�d +Xz,1�z,1 +Xz,2�z,2 + ✏ (195)

21

Page 27: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Sample Spaces

• Sample Space ( ) - set comprising all possible outcomes associated with an experiment

• Examples (Experiment / Sample Space):

• “Single coin flip” / {H, T}

• “Two coin flips” / {HH, HT, TH, TT}

• “Measure Heights” / any actual measurement OR we could use

• Events - a subset of the sample space

• Examples (Sample Space / Examples of Events):

• “Single coin flip” / , {H}, {H, T}

• “Two coin flips” / {TH,}, {HH, TH,}, {HT, TH, TT}

• “Measure Heights” / {1.7m}, {1.5m, ..., 2.2m} OR [1.7m], (1.5m,1.8m)

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

l(✓̂1

|y) = l(�̂µ, �̂a, �̂d|y) (187)

l(✓̂0

|y) = l(�̂µ, 0, 0|y) (188)

x =

2

6664

1 x1,a x

1,d

1 x2,a x

2,d...

.... . .

1 xn,a xn,d

3

7775

�[t] =

2

64�[t]µ

�[t]a

�[t]d

3

75

F[2,n�3]

(y,xa,xd) = f

✓ SSE(

ˆ✓0

)�SSE(

ˆ✓1

)

2

SSE(

ˆ✓1

)

n�3

◆(189)

Pr(µ|y) / N

✓( �2

+Pn

i yi�2

)

( 1

�2

+ n�2

), (

1

�2

+n

�2

)�1

◆(190)

Pr(�a,�d|y) =Z 1

0

Z 1

�1Pr(�µ,�a,�d,�

2

✏ |y)d�µd�2

✏ (191)

�↵ = �a

✓a+

�d2(p

1

� p2

)

◆(192)

�̂µ,0 (193)

H0

: Cov(Y,X) (194)

;To see how this is accomplished in a permutation analysis, let’s first describe a permutation.If we write our data in a matrix as follows:

Data =

2

64z11

... z1k y

11

... y1m x

11

... x1N

......

......

......

......

...zn1 ... znk yn1 ... ynm x

11

... xnN

3

75

where the latter columns are the genotypes, a permutation is produced by randomizing thephenotype samples y keeping the genotypes in the same order, e.g.:

Y = �µ +Xa�a +Xd�d +Xz,1�z,1 +Xz,2�z,2 + ✏ (195)

21

l(✓̂1

|y) = l(�̂µ, �̂a, �̂d|y) (187)

l(✓̂0

|y) = l(�̂µ, 0, 0|y) (188)

x =

2

6664

1 x1,a x

1,d

1 x2,a x

2,d...

.... . .

1 xn,a xn,d

3

7775

�[t] =

2

64�[t]µ

�[t]a

�[t]d

3

75

F[2,n�3]

(y,xa,xd) = f

✓ SSE(

ˆ✓0

)�SSE(

ˆ✓1

)

2

SSE(

ˆ✓1

)

n�3

◆(189)

Pr(µ|y) / N

✓( �2

+Pn

i yi�2

)

( 1

�2

+ n�2

), (

1

�2

+n

�2

)�1

◆(190)

Pr(�a,�d|y) =Z 1

0

Z 1

�1Pr(�µ,�a,�d,�

2

✏ |y)d�µd�2

✏ (191)

�↵ = �a

✓a+

�d2(p

1

� p2

)

◆(192)

�̂µ,0 (193)

H0

: Cov(Y,X) (194)

;R To see how this is accomplished in a permutation analysis, let’s first describe a permu-tation. If we write our data in a matrix as follows:

Data =

2

64z11

... z1k y

11

... y1m x

11

... x1N

......

......

......

......

...zn1 ... znk yn1 ... ynm x

11

... xnN

3

75

where the latter columns are the genotypes, a permutation is produced by randomizing thephenotype samples y keeping the genotypes in the same order, e.g.:

Y = �µ +Xa�a +Xd�d +Xz,1�z,1 +Xz,2�z,2 + ✏ (195)

21

Page 28: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Functions

• Now that we have formalized the concept of a sample space, we need to define what “probability”means

• To do this, we need the concept of a mathematical function

• Function (formally) - a binary relation between every member of a domain to exactly one member of the codomain

• Function (informally) - ?

Page 29: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Example of a function

X

Y

Y = X2

Page 30: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Probability functions (intuition)

• Probability Function (intuition) - we would like to construct a function that assigns a number to each event such that it matches our intuition about the “chance” the event will happen (as a result of an experiment)

• To be useful, we need to assign a number not just to each individual element of the set but to EVERY event

• To accomplish this, we will need the concept of a Sigma Algebra (or Sigma Field)

• What’s more, we need to make sure the function that we use to assign these numbers adheres to a specific set of “rules” (axioms)

Page 31: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

Sample Spaces / Sigma Algebra• Sigma Algebra ( ) - a collection of events (subsets) of of interest with the following

three properties: 1. , 2. , 3.

Note that we are interested in a particular Sigma Algebra for each sample space...

• Examples (Sample Space / Sigma Algebra):

• {H, T} /

• {HH, HT, TH, TT} /

• / more complicated to define the sigma algebra of interest...

• Note that the pair is referred to as a measurable space

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

;, {H}, {T}, {H,T} (10)

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (11)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

7

⌦ (7)

F (8)

; 2 F (9)

This A 2 F then Ac 2 F

A1

,A2

, ... 2 F thenS1

i=1

Ai 2 F

5 Probability Functions

To use sample spaces in probability, we need a way to map these sets to the real numbers.To do this, we define a function. Before we consider the specifics of how we define a prob-

ability function or measure, let’s consider the intuitive definition of a function:

Function (intuitive def.) ⌘ a mathematical operator that takes an input and produces anoutput.

This concept is often introduced to us as Y = f(X) where f() is the function that mapsthe values taken by X to Y . For example, we can have the function Y = X2 (see figurefrom class).

We are going to define a probability function which map sample spaces to the real line(to numbers):

Pr(S) : S ! R (10)

where Pr(S) is a function, which we could have written f(S).

To be useful, we need some rules for how probability functions are defined (that is, not allfunctions on sample spaces are probability functions). These rules are are called the axioms

of probability (note that an axiom is a rule that we assume). There is some variation inhow these are presented, but we will present them as three axioms:

Axioms of Probability

1. For A ⇢ S, Pr(A) > 0.

2. Pr(S) = 1.

3. For A1

,A2

, ... 2 S, if Ai\Aj = ; (disjoint) for each i 6= j: Pr(S1

i Ai) =P1

i Pr(A).

These axioms are necessary for many of the logically consistent results built upon proba-bility. Intuitively, we can think of these axioms as matching how we tend to think about

7

l(✓̂1

|y) = l(�̂µ, �̂a, �̂d|y) (187)

l(✓̂0

|y) = l(�̂µ, 0, 0|y) (188)

x =

2

6664

1 x1,a x

1,d

1 x2,a x

2,d...

.... . .

1 xn,a xn,d

3

7775

�[t] =

2

64�[t]µ

�[t]a

�[t]d

3

75

F[2,n�3]

(y,xa,xd) = f

✓ SSE(

ˆ✓0

)�SSE(

ˆ✓1

)

2

SSE(

ˆ✓1

)

n�3

◆(189)

Pr(µ|y) / N

✓( �2

+Pn

i yi�2

)

( 1

�2

+ n�2

), (

1

�2

+n

�2

)�1

◆(190)

Pr(�a,�d|y) =Z 1

0

Z 1

�1Pr(�µ,�a,�d,�

2

✏ |y)d�µd�2

✏ (191)

�↵ = �a

✓a+

�d2(p

1

� p2

)

◆(192)

�̂µ,0 (193)

H0

: Cov(Y,X) (194)

;R To see how this is accomplished in a permutation analysis, let’s first describe a permu-tation. If we write our data in a matrix as follows:

Data =

2

64z11

... z1k y

11

... y1m x

11

... x1N

......

......

......

......

...zn1 ... znk yn1 ... ynm x

11

... xnN

3

75

where the latter columns are the genotypes, a permutation is produced by randomizing thephenotype samples y keeping the genotypes in the same order, e.g.:

Y = �µ +Xa�a +Xd�d +Xz,1�z,1 +Xz,2�z,2 + ✏ (195)

21

l(✓̂1

|y) = l(�̂µ, �̂a, �̂d|y) (187)

l(✓̂0

|y) = l(�̂µ, 0, 0|y) (188)

x =

2

6664

1 x1,a x

1,d

1 x2,a x

2,d...

.... . .

1 xn,a xn,d

3

7775

�[t] =

2

64�[t]µ

�[t]a

�[t]d

3

75

F[2,n�3]

(y,xa,xd) = f

✓ SSE(

ˆ✓0

)�SSE(

ˆ✓1

)

2

SSE(

ˆ✓1

)

n�3

◆(189)

Pr(µ|y) / N

✓( �2

+Pn

i yi�2

)

( 1

�2

+ n�2

), (

1

�2

+n

�2

)�1

◆(190)

Pr(�a,�d|y) =Z 1

0

Z 1

�1Pr(�µ,�a,�d,�

2

✏ |y)d�µd�2

✏ (191)

�↵ = �a

✓a+

�d2(p

1

� p2

)

◆(192)

�̂µ,0 (193)

H0

: Cov(Y,X) (194)

;R =(⌦,F) To see how this is accomplished in a permutation analysis, let’s first describe apermutation. If we write our data in a matrix as follows:

Data =

2

64z11

... z1k y

11

... y1m x

11

... x1N

......

......

......

......

...zn1 ... znk yn1 ... ynm x

11

... xnN

3

75

where the latter columns are the genotypes, a permutation is produced by randomizing thephenotype samples y keeping the genotypes in the same order, e.g.:

Y = �µ +Xa�a +Xd�d +Xz,1�z,1 +Xz,2�z,2 + ✏ (195)

21

;, {HH}, {HT}, {TH}, {TT}{HH [HT}, {HH [ TH}, {HH [ TT}, {HT [ TH}, {HT [ TT}, {TH [ TT}{HH [HT [ TH}, {HH [HT [ TT}, {HH [ TH [ TT}, {HT [ TH [ TT}

{HH [HT [ TH [ TT}

Pr(HH [HT ) = 0.6, P r(HH [ TH) = 0.5, P r(HH [ TT ) = 0.5Pr(HT [ TH) = 0.5, P r(HT [ TT ) = 0.5, P r(TH [ TT ) = 0.4

Pr(HH [HT [ TH) = 0.75, etc.

(⌦,F , P r)

x

PX1

(x1

) =

max(X2

)X

x2

=min(X2

)

Pr(X1

= x1

\X2

= x2

) =X

Pr(X1

= x1

|X2

= x2

)Pr(X2

= x2

)

(196)

EX =

max(X)X

i=min(X)

(X = i)Pr(X = i) (197)

EXk =X

XkPr(X) (198)

EXk =

ZXkfX(x)dx (199)

Var(X) = V(X) =

max(X)X

i=min(X)

((X = i)� EX)2Pr(X = i) (200)

C(Xk) =X

(X � EX)kPr(X) (201)

C(Xk) =

Z(X � EX)kfX(x)dx (202)

E(X1

|X2

) =

max(X1

)X

i=min(X1

)

(X1

= i)Pr(Xi = i|X2

) (203)

E(X1

|X2

) =

Z+1

�1X

1

fX1

|X2

(x1

|x2

)dx1

(204)

22

Page 32: Quantitative Genomics and Genetics - Cornell Universitymezeylab.cb.bscb.cornell.edu/labmembers/documents... · Announcements I • Registration updates / reminders: • You must register

That’s it for today

• Next lecture, we will introduce random variables, random vectors, and parameterized probability models