Sect 18 a Population Genetics 1

8/11/2019 Sect 18 a Population Genetics 1

http://slidepdf.com/reader/full/sect-18-a-population-genetics-1 1/28

Population and Evolutionary Genetics

The Nature and Origin of Biological Diversity

Subject of evolutionary biology is nature and origin of biological diversity:• diversity of individuals within species• diversity of species in the biosphere

Biological diversity matters! Population and evolutionary genetic data andtheory are used intensively in conservation, agriculture, medicine, physicalanthropology, and genomics.



Population and evolutionary genetics studies diversity at the level of genes:•

measure amount and kind and look for patterns. (How many genes or bp aredifferent in two humans? Are some genes more variable than others?)

•

explain it in terms of mutation, random drift, natural selection, sexualreproduction, migration, etc. (Why are the genes for fibrinopeptides more

variable than the genes for cytochrome c? Why do introns evolve faster thanexons?)

Population and evolutionary genetics reconstructs the history of life. (Arehumans more closely related to chimpanzees or gorillas? Did Homo sapiens andH. neanderthalensis interbreed? When did the multicellular animals arise?)



Population Genetics is the study of •

genetic diversity among individuals, within species (or populations)•

mechanisms that determine the amount of diversity•

short-term evolutionary changes in organisms (short-term = thousands ortens of thousands of generations)

Evolutionary genetics is the study of the•

genetic differences between species• long-term evolution of genes and genomes

Roughly speaking• population genetics compares different copies of a gene within a species• evolutionary genetics compares a gene in two different species

Both are governed by the same basic forces, of which the most important are:• mutation universal•

random genetic drift universal•

natural selection universal•

sexual reproduction in sexual species•

migration in subdivided populations

Although some processes are universial, they are not equally important!



AS ?

S/A

A

S/A

A

S/A

A

S

S/A

A

Homo sapiens?





Charles Darwin and Alfred Russell Wallace• Knew about evolution (as did many others at that time).• Knew that all organisms share a common ancestor and that life can be

portrayed as a tree.•

Knew that there is great variation among species, but didn’t know thatvariation originates by mutation.• Knew that variation is inherited, but didn’t know about genes, had poor

theory of inheritance.• Knew that natural selection acts on variation to cause adaptation in

organisms, but didn’t know about some other forms of selection, and didn’tknow about random drift.



Darwin would be happily amazed to see what we have learned about

evolutionary genetics. For example, my colleague Michael Nachman and hisstudents, especially Hopi Hoekstra (now at Harvard University), showed thatpocket mice living on lava flows are very dark while those living on sandy areasare light, and that this adaptive difference is due to one or a few mutations in aspecific pigmentation gene.



Start with an example.Marty Kreitman cloned and sequenced ll different copies of the Adh geneencoding alcohol dehydrogenase from different strains of Drosophila melanogaster.Nowadays the genes are amplified by PCR and the sample sizes are much larger,

e.g. 100 flies. The sequences must be aligned before comparison, but this is easilydone because the differences between them are small, and the alignments areunambiguous.

Computers will translate the DNA sequences into protein sequences. Below is analignment of exon 4 from three of Kreitman's sequences and from two differentspecies, D. willistoni and D. virilis.



• The D. simulans sequence is identical to one of the D. melanogaster sequences. D.

simulans is a close relative of D. melanogaster, based on similarity of other genes and

of morphology. Phylogeny:

Note evolutionary lineages leading to melanogaster and virilis split before the ones

leading to melanogaster and simulans. Differences reflect mutations that accumulated

along the branches. Differences proportional to branch lengths:

melanogaster -simulans = a + b = 2 a melanogaster-virilis = a + c + d = 2 (a + c)



Variation within a species:

•

Drosophila melanogaster is polymorphic = has ! 2 alleles of Adh. The sequencesfall into two classes, those with threonine (T) and those with lysine (K) at site#25. This difference was previously detected by using starch gelelectrophoresis. All D. melanogaster populations are polymorphic, with bothFast and Slow alleles.

S

F

AdhF/AdhF AdhF/AdhS



• In the entire coding sequence (exons 1 + 2 + 3 + 4), there is only site where different

copies of the Adh gene from D. melanogaster differ in amino acid sequences.





Patterns in molecular genetic variation to explain:•

DNA sequences > amino acid sequences.•

synonymous > nonsynonymous

Adh gene has introns and flanking sequences as well as exons.•

flanking ! introns > exons

Conserved sequences = (nearly) invariante.g.ATG start codonPromoterAll mutations very detrimental or lethal.

•

Functionally important regions tend to be conserved.So can look for conserved regions and they are likely to be important.

This illustrates recurring theme:3 kinds of mutations with respect to natural selection:•

Neutral: no effect on fitness (number of offspring produced by individualwith mutation)

•

Detrimental (= deleterious): decrease fitness, usually eliminated by naturalselection

•

Advantageous: increase fitness, favored by natural selection, rare

Neutral variation is most common type, because most non-neutral mutationsare detrimental and individuals carrying them reproduce less.

Mutations are more often detrimental in genes or regions that are less variable(more conserved).



We will spend the remainder of my lectures talking about biological diversitywithin species and about differences between species: how biological diversity ismeasured; and the mechanisms that govern it: mutation, random genetic drift,and natural selection.

Outline of remaining lectures:

Population genetics•

Parameters used to describe diversity within species•

More on patterns and phenomena• Factors determining amount and patterns of diversity

MutationRandom driftSelection (directional, balancing)Sex (if have time)

Evolution•

Measuring rates of evolution•

Patterns seen in rates•

How rates are determined by mutation, drift, and selection• Genome evolution

Variation in number and arrangement of genesHow new genes arise by gene duplication

Why some people think this subject is hard:1) Must be comfortable with stochastic models as well as deterministic; hopefully

we have already got past that hurdle.2) Must learn to think about populations of individuals and genes instead of

individuals.) Mathematical models … but we will consider only simple ones and try to get

intuitive understanding of them.



MEASURING DIVERSITY WITHIN SPECIES (POPULATIONS)

Defining populations:

A population is usually defined as a group of individuals of the same species. Ina sexual species, the members of the population are usually able to mate witheach other, at least potentially. Otherwise the definition is somewhat arbitrary,

being whatever group of organisms one is studying at the moment. When mystudent Jody Banks studied Texas bluebonnets, the population was the entire

species. Some people study the population of bacteria in a chemostat, whileothers analyze the people in one small religious group.

Populations may be strongly subdivided into local populations connected byinfrequent migration. We won’t cover this.



Gene (Allele) and Gene Frequencies

A gene or population is polymorphic if there are ! 2 alleles of the gene in thepopulation. Otherwise it is monomorphic.

If a locus is polymorphic, then we must ignore individuals and treat genes andgenotypes in a new way.

1908G. H. Hardy (prominent British mathematician, who responded to a questionraised by a geneticist)Wilhelm Weinberg (German physician interested in human hereditary diseases)

gene frequency = frequency of a particular allele of a gene in the population

Alleles could be identified by electrophoresis or by sequencing.



Hypothetical example:

•Population of Drosophila melanogaster is examined for genetic variation at the Adh locus.

•Sample of 100 flies subjected to electrophoresis.

•Calculate genotype and gene frequencies:

Numbers of genotypes 40 FF 40 FS 20 SS total 100

Genotype frequencies 0.40 0.40 0.20 1

Numbers of genes 80 F 40 F

40S 40S 200

Allele frequencies 0.4F 0.2F 0.6 F

0.2S 0.2S 0.4 S

1

The allele frequencies could have been calculated in two ways:

(1) There are 2 ! 100 = 200 genes in the sample gene pool. 80 + 40 = 120 are F and40 + 40 are S. Frequency of F = f(F) = 120/200 = 0.6; frequency of S = f(S) = 80/200

= 0.4.(2) f(F) = f(FF) + f(FS)/2 = 0.4 + 0.2 = 0.6; f(S) = f(SS) + f(FS)/2 = 0.2 + 0.2 = 0.4

Either way, it is absolutely crucial to check that the frequencies add up to 1: 0.6 +0.4 = 1. If they don't, either you made an error in the calculations, or there aremore than two alleles and you forgot to count some of them; i.e. you screwed up.



In population and evolutionary genetics, we never think or talk aboutindividuals or families or individual crosses or mating, only about populationsand gene or genotype frequencies (and occasionally about frequencies of differenttypes of matings or other events).

The abstraction process can be visualized as first making the flies disappear,leaving only the genes (two from each fly). Then the genes are mixed up.

This collection of genes is sometimes called the gene pool. You can visualize it asa swimming pool filled with genes if you wish.

“The trouble with the gene pool is that there is no life guard.”



Measures of Allelic Diversity

(1) The observed heterozygosity of a gene in a population is the frequency of individuals that are heterozygous for the gene.

Problem: idepends on whether the population is inbreeding or outbreeding.If Drosophila melanogaster was an extreme inbreeder, would have mainly twogenotypes, F F and S S, and observed heterozygosity would be ! 0.

(2) The expected heterozygosity of a gene is the probability that two copies of the gene, drawn at random from the population, are different alleles.

Terminology:f(x) = frequency of x P(x) = probability of x

e.g. 2 alleles f(A in gene pool) = p f(a in gene pool) = q

We can calculate the probability of drawing different genotypes (pairs of alleles)as follows:

P(draw A) = f(A) = p

P(draw a) = f(a) = q

P(draw A & A) = p2

P(draw a & a) = q2

P(draw A & a) = 2pq

Note that this is equivalent to a real population that is random mating. (Hardy-Weinberg law)

expected heterozygosity = h = 2pq = 1- (p2 + q2)



Inbreeding Produces Homozygotes

e.g. selfing

1/4 AA 1/2 Aa 1/4 aa

Aa

1/4 1/2 1/4

?



1/4 AA 1/2 Aa 1/4 aa

all Aa

1/4 1/2 1/4

3/8 AA 1/4 Aa 3/8 aa

1/4 1/2 1/4

7/16 AA 1/8 Aa 7/16 aa

1/4 1/2 1/4

How many heterozygotes after n generations?



1/4 AA 1/2 Aa 1/4 aa

all Aa

1/4 1/2 1/4

3/8 AA 1/4 Aa 3/8 aa

1/4 1/2 1/4

7/16 AA 1/8 Aa 7/16 aa

1/4 1/2 1/4

How many heterozygotes after n generations?

(1/2)n



hen there are just two alleles, it is common to designate the two frequencies asp and q. But especially in molecular biology, we often deal with more than twoalleles, and so we often use x1 , x2 , ... xn for n different alleles.

hen there are > 2 alleles, it is easier to calculate h "backwards". If one has m

different alleles and xi is the frequency of the ith allele,

h = 1 -

!i=1

m

x

i

2

= 1 – x

1

2

+ x

2

2

+ .... + x

m

2

)

e.g. for imaginary Adh data above: observed heterozygosity = 0.40

h = (2)(0.6)(0.4) = 0.48 or 1 - (0.62 + 0.4

2)

Note: The fact that h is greater than the observed heterozygosity in this examplesuggests that this population may be slightly inbred.

This is why expected heterozygosity is a better measure of diversity; apopulation could have many different alleles and genotypes and still have zeroobserved heterozygosity if it was strongly inbred.



Kreitman's sample of 11 copies of Adh gene had 6 S and 5 F alleles, so gene

frequencies were f(S) = 6/11 ! 0.55 and f(F) = 5/11 ! 0.45.

h = 0.495

Human population is obviously not perfectly random mating, but is closeenough so that in many cases, the observed and expected heterozygosity are verysimilar.

Expected heterozygosity is high for many genes, on order of 0.1 – 0.5.h = 0.1 means P(two random alleles differ in charge) = 0.1.



Sequence Diversity

Allelic heterozygosity based on electrophoresis or amino acid sequences ormorphology actually underestimates genetic diversity. In particular, as we sawabove that amino acid sequences don't detect synonymous base sequence

differences.

Kreitman actually sequenced over 2.6 kb from a larger sample of 11 genes andfound 8 different alleles, 7 singlets and 1 represented 3 times.

h = 0.86

cf. h = 0.50 detected with electrophoresis

Nucleotide Diversity, a Measure of Sequence Diversity

Use parameter analogous to expected heterozygosity:

! = P(a site has a different bp in 2 random copies of a gene) = proportion of bps different in 2 random copies of a gene = mean pairwise sequence difference

! is calculated by aligning the sequences of a sample of genes A, B, C, etc. andcomparing all possible pairs (A and B, A and C, B and C, etc.). For each pair,determine the proportion of sites that are different. Then ! is the average of these proportions.

Kreitman’s 11 Adh genes: ! = 0.007 differences/bp

! is smaller than h because it is differences per bp, and there are many bp in thegene.



1 2 3 4 5 mean

1 - 26 58

2 0.00338 -

3 0.00169 0.00169 -

4 0.03723 0.04061 0.03892 -

5 0.0423 0.04569 0.04399 0.00508 -

Calculation of ! is tedious by hand for large samples. Best to use calculator with statistical

functions or use Excel spreadsheet. Example of data from small sample of freshwater invertebrate

Keratella cochlearis. This is the nucleotide diversity of 590 bp of the mitochondrial cox1 gene.

Sequence diversity is high. If diversity is high when measured at the gene level,it is not surprising that it is also high at the sequence level. For humans,

nucleotide diversity ! ! 7 X 10-4 differences/bp

nterpretation:For any 2 random individuals:

0.07% of bp's differ0.7 bp differs in gene of 1 kbp2.1 " 106 bp's differ in genome of 3 " 109 bp's

We have long known that no two individuals of a species are geneticallyidentical, unless they are members of a clone (and even then they will differ inseveral mutations). But these data suggest that two humans chosen at randomwill differ in a large proportion of all genes, perhaps more than 1/3, and in abouttwo million base pairs!

What is also the expected difference between the two copies of the genome that

you got from Mom and Dad?



Sequence diversity is high. If diversity is high when measured at the gene level,it is not surprising that it is also high at the sequence level. For humans,nucleotide diversity ! ! 7 X 10-4 differences/bp

nterpretation:For any 2 random individuals:

0.07% of bp's differ0.7 bp differs in gene of 1 kbp2.1 " 106 bp's differ in genome of 3 " 109 bp's

We have long known that no two individuals of a species are geneticallyidentical, unless they are members of a clone (and even then they will differ inseveral mutations). But these data suggest that two humans chosen at randomwill differ in a large proportion of all genes, perhaps more than 1/3, and in abouttwo million base pairs!

What is also the expected difference between the two copies of the genomethat you got from Mom and Dad?

2.1 106 bp's



Phenomena to be explained:• Different species have different diversities.

E.g humans > cheetahs• Different genomes have different diversities.

E.g. human mitochondrial genes > human nuclear genes•

Different genes or regions of genome have different diversity.E.g. pseudogenes > noncoding regions > genesE.g. 3rd codon position > 1st codon position > 2nd codon position

Sect 18 a Population Genetics 1

Documents