Top Banner
Biostatistics 666 Statistical Models and Numerical Methods in Human Genetics
32

Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Biostatistics 666

Statistical Models and Numerical Methods in Human Genetics

Page 2: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

How to find me…� Gonçalo Abecasis

� Assistant ProfessorDept. of Biostatistics

� SPH II, Room M4132

� Phone: (734) 763-4901� E-mail: [email protected]

� Office Hours: Monday 4:00pm - 5:00pm

Page 3: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Course Grading� Written Exams 50%

� In-class midterm - 20%� Final exam - 30%

� Problem Sets 20%

� Research Project or Review 30%� In-class presentation for 10% extra credit

Page 4: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Paper or Project� Clearly written paper

� No more than 2,000 words

� Original thinking� Critical evaluation of literature� Original data analysis� Interesting computer simulation� Investigate analytical model

� Extra credit for oral presentation to class

Page 5: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Paper or Project� Review Paper

� Critically evaluate a recently published research article

� I will provide list of suggested articles

� Research Project� Analyse your own data� Carry out a simple simulation project

Page 6: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Class Schedule� Wednesdays 3:00pm – 4:30pm

� Room M4332

� Fridays 3:00pm – 4:30pm� Room M4332� Occasionally in Computing Lab, SPH II A

� Please bring your schedules this Friday

Page 7: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA – Information Store� Encodes the information required for

cells and organisms to produce new cells and organisms and to function.

� DNA variation is responsible for many individual differences, some of which are medically important.

Page 8: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA – A string of bases� Each base is either a:

� Purine� (A) – Adenine� (G) – Guanine

� Pyrimidine� (C) – Cytosine� (T) – Thymine

� Has an orientation

Page 9: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

DNA Double Helix� Pair of DNA strands

� Each strand is a sequence of A, C, T and G

� Complementary Strands� Facilitate replication� Bound by Hydrogen Bonds

Page 10: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Chromosomes� Human DNA is protected by special

proteins� DNA is coiled around histone proteins

� Nucleosomes� Higher order structures

� Reference points in chromosomes� Telomeres� Centromeres

Page 11: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Human Genome� Multiple chromosomes

� Each one is a DNA double helix� 22 autosomes

� Present in 2 copies� One maternal, one paternal

� 1 pair of sex chromosomes� Females have two X chromosomes� Males have one X chromosome and one Y chromosome

� Total of ~3 x 109 bases

Page 12: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Inheritance� Offspring inherit one chromosome from

each parent

� Through meiosis, germ line cells produce haploid gametes

� These fuse to create an egg, and eventually a new human being

Page 13: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Meiosis� DNA is replicated

� Chromosomes are paired

� DNA stretches exchanged between chromosomes

� Successive cell divisions take place

Page 14: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Some Types of DNA Sequence� Genes (<5% of all human DNA)

� ~30,000-35,000 in humans� Exons, translated into protein� Introns, transcribed into RNA, but not protein

� Promoters� Enhancers� Repeat DNA� Pseudogenes

Page 15: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Central Dogma� Information in cells is stored in DNA.

� DNA can be transcribed into RNA.

� RNA can be translated into protein.� Proteins can catalyze chemical reactions.� Proteins receive and transmit signals.� Proteins constitute structural building blocks.

Page 16: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Genetic Code� DNA � RNA � Protein� DNA: 4 bases (A,T,C,G)� RNA: 4 bases (A,U,C,G)� Proteins: 20 amino-acids� Universal Genetic Code

� Translation between DNA/RNA and protein� Three bases code for one amino-acid

Page 17: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Genetic Code

Page 18: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Human Variation� When two chromosomes are compared most

of their sequence is identical� Consensus sequence

� About 1 per 1,000 bases differs between pairs of chromosomes in the population� In the same individual� In the same geographic location� Across the world

Page 19: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Types of Genetic Variation� Sequence Polymorphisms

� A single or few bases differ between individuals

� Length Polymorphisms� In some regions of DNA, the number of

copies of particular DNA repeat varies

Page 20: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Repeat Length Polymorphisms� Variable Number Tandem Repeats

� VNTRs� Typical repeat units of 10 – 100s bp� E.g.: ~110 bp repeat in IL1RN gene

� Microsatellites� Simple repeat sequences

� Most popular are 2, 3 or 4 bp

� E.g.: ACACACAC …� D naming scheme (e.g., D2S160)

Page 21: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Example VNTR� Picture of DNA on an

agarose gel� This is a repeat

sequence near IL1 gene

� Small fragments move faster towards positive pole

+

Page 22: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Microsatellites� Most popular markers for linkage

analysis� Large number of alleles (10 is common)� Can distinguish and track individual

chromosomes in families

� Relatively abundant� ~15,000 mapped loci

Page 23: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

SNPs� SNP is usually read as Snip� Single nucleotide polymorphisms

� Change one nucleotide� Replace� Insert� Delete

� Abundant, but traditionally hard to detect� Typically have one or a few alleles

Page 24: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Single Base Changes� Transitions

� A/G, C/T� Purine � Purine� Pyrimidine � Pyrimidine

� Transversions� Purine � Pyrimidine� A/T, A/C, C/G, G/T

Page 25: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

A little more on SNPs� Most SNPs have only

two alleles� Easy to automate their

scoring� Becoming extremely

popular� Typing Methods

� Sequencing� Restriction Site� Hybridization

Page 26: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Phenotypes� Can measured genetic variation indirectly

� E.g., Cystic Fibrosis� Patients must carry two mutations in CF gene� Parents of patients must carry one mutation� Normal individuals carry 0 or 1 mutations

Page 27: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

3 Stages of Genetic Mapping� Are there genes influencing this trait?

� Epidemiological studies

� Where are those genes?� Linkage analysis

� What are those genes?� Association analysis

Page 28: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Is a trait genetic?� Examine distribution of trait in the population

and among relatives

� E.g. Inflammatory Bowel Disease (Crohn’s)� General population

� 1-3 cases per 1,000 individuals

� Twins of affected individuals� 44% of monozygotic twins also have Crohn’s� 3.8% of dizygotic twins also have Crohn’s

Page 29: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Where are those genes?� Find genetic markers that co-segregate

with disease

� E.g. D16S3136co-segregateswith Crohn’s

Page 30: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

What are those genes?� Identify genetic variants that are associated

with disease…

� E.g. Disruptive mutations in NOD2 much more common in Crohn’s patient� Crohn’s Controls� Arg702Trp: 11% 4%� Gly908Arg: 4% 2%� Leu1007fs 8% 4%

Page 31: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Checking Assumptions…

Page 32: Statistical Models and Numerical Methods in …csg.sph.umich.edu/abecasis/class/2003/Lecture01.pdfStatistical Models and Numerical Methods in Human Genetics How to find me… Gonçalo

Take Home Reading!� An introduction to important issues in

genetics:

� Lander and Schork (1994)Science 265:2037-48