Top Banner
How to Assemble a Human Genome? Mix generous amounts of Junk DNA and Indifferent DNA, add a Dollop of Garbage DNA and a Sprinkling of Functional DNA (Lazarus DNA optional) Dan Graur University of Houston
162

Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

May 10, 2015

Download

Education

Dan Graur

How to Assemble a Human Genome? Mix generous amounts of Junk DNA and Indifferent DNA, add a dollop of Garbage DNA and a sprinkling of Functional DNA (Lazarus DNA optional)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How to Assemble a Human Genome? Mix generous amounts of Junk DNA and Indifferent DNA, add a Dollop

of Garbage DNA and a Sprinkling of Functional DNA (Lazarus DNA optional)

Dan Graur University of Houston

Page 2: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Dan Graur (until 5 September 2012)

Page 3: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Dan Graur (from 6 September 2012 to the present time)

Page 4: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

In September 2012, 30 papers based on thousands of data sets were simultaneously published in high profile journals to describe the major findings from the ENCODE project.

Page 5: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The main finding of the main paper was picked up by news outlets all over the world.

Page 6: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

And what was the main finding of the main paper?

Page 7: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

80% of the human genome is functional

And what was the main finding of the main paper?

Page 8: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

442 authors +

594 collaborators

Page 9: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

On the thirtieth day of the month of September, in the Year of our Lord 2012, it was announced that “junk DNA” is “dead.”

Page 10: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

An epic media spin

Page 11: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

An epic media spin

Page 12: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

An example of epic media spin An epic media spin

Page 13: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Compiled by T. Ryan Gregory, Genomicron

An epic media spin

Page 14: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

An epic media spin (Una manipulación mediática épica)

Page 15: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Creationists had a ball

Page 16: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Three problems: (1) If the human genome is indeed devoid of junk DNA as implied by the ENCODE project, then a long, undirected evolutionary process, cannot explain the human genome. If, on the other hand, organisms are designed, then all DNA, or as much as possible, is expected to exhibit function. If ENCODE is right, then Evolution is wrong.

Page 17: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Three problems: (2) If ENCODE is right, then humans are the Goldilocks of the living world.

Organism C-value Junk Complexity Tetraodon fluvialis (pufferfish) 0.35 No Primitive Hyla nana (frog) 1.89 No Primitive Homo sapiens (human) 3.5 No Pinnacle of Creation

Extatosoma tiaratum (insect) 8.0 Yes Primitive Alium cepa (onion) 16.75 Yes Primitive Protopterus aethiopicus (lungfish) 132.83 Yes Primitive Paris japonica (canopy plant) 152.20 Yes Primitive

Page 18: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

80% of the human genome is functional

Nature. 2011. 478:476-482

Evolutionary constraint indicates that the fraction of the human genome that is functional is ~5%.

Three problems: (3) If ENCODE-2012 is right, then ENCODE-2011 is wrong.

Page 19: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Solution: Kill ENCODE

Page 20: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

We wrote a critical piece on ENCODE, and got a very negative review from Trends in Genetics. “Graur is mad, and not entirely without cause.” “It would be good for Trends in Genetics to publish a reasoned and dispassionate critical essay on this topic, preferably by someone of Graur’s stature, but not him.”

192 cm, 6’2”, 115 kg, 254 lb

angry? insane?

Page 21: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How did ENCODE reach the conclusion that 80% of the human genome is functional, when the evidence for selection constraint is ~5%? •  Equating hype with science. •  Wrong experimental systems. •  Inappropriate statistical analyses. •  A peculiar definition of function. •  A peculiar definition of junk. •  A lack of evolutionary perspective. •  A lack of objectivity about the study organism. •  Ignorance of everything that came before

ENCODE.

Page 22: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

A huge chunk of ENCODE data is derived from HeLa cells and other cancer cells. Does the HeLa karyotype look human to you?

Wrong experimental systems.

Page 23: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Wrong experimental systems.

Page 24: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Landry et al. 2013

Wrong experimental systems.

Page 25: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  Equating hype with science.

Page 26: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

26

6 S E P T E M B E R 2 0 1 2

“These data enabled us to assign biochemical functions for 80% of the genome…”

The birth of 80%

Page 27: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

27

6 S E P T E M B E R 2 0 1 2

“The vast majority (80.4%) of the human genome participates in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type. Much of the genome lies close to a regulatory event: 95% of the genome lies within 8 kilobases (kb) of a DNA–protein interaction..., and 99% is within 1.7 kb of at least one of the biochemical events measured by ENCODE.”

Implication that 80% may be 99%

Page 28: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

6 S E P T E M B E R 2 0 1 2

“junk DNA” is dead!

Page 29: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“junk DNA” is dead!

Page 30: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

ENCODE researcher Ewan Birney tells Ed Yong that that the 80 percent figure will increase, possibly reaching 100 percent. “We don’t really have any large chunks of redundant DNA,” Birney says. “This metaphor of junk isn't that useful.”

99% is not enough, 100% is better

Page 31: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The PR machine at work: “Virtually all of the DNA passed down from generation to generation has been kept for a reason.” An intelligent God, perhaps?

Page 32: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

I [went] back to ENCODE biologist John Stamatoyannopoulos, who was quoted in the first wave of news. He said he thought the skeptics hadn’t fully understood the papers… He did admit that the press conference mislead people by claiming that 80% of our genome was essential and useful. He puts that number at 40%. Otherwise he stands by all the ENCODE claims.

Faye Flam

99% disappears and 80% becomes 40%

Page 33: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

40% is actually 9%

Page 34: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

(The origin of 9%: 5% + 4% = 9%)

Page 35: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

(Oops: 9% reverts back to 5%)

Page 36: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

9% becomes 20%

Page 37: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

20% kills “junk DNA”

Page 38: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  20% of the genome is functional. •  Ergo, 80% must be junk. •  Yet, “junk DNA” should be “totally expunged” from the lexicon. •  In which universe does Ewan Birney’s logic work?

20% kills “junk DNA”

Page 39: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  20% of the genome is functional. •  Ergo, 80% must be junk. •  Yet, “junk DNA” should be “totally expunged” from the lexicon. •  In which universe does Ewan Birney’s logic work? •  In a universe in which 20% >> 80%!

20% kills “junk DNA”

Page 40: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

At the end of 2012, 20% becomes the favorite number in Nature

Page 41: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Science remains loyal to 80%.

Page 42: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

42

Genome

Transcribed Nontranscribed

Nontranslated Translated

Information flow within the genome

Page 43: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

43

Genome

Transcribed Nontranscribed

Nontranslated Translated

Information flow within the genome protein RNA

DNA

Page 44: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

44 Junk Functional

Junk

Genome

Transcribed Nontranscribed

Functional

Junk Functional

Nontranslated Translated

Page 45: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

45

Genome

Functional Junk nonfunctional

Junk has nothing to do with non-protein-coding. Junk is about function… actually lack of function.

Page 46: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

46

Genome

Functional Junk

ad hoc

ad hoc

Page 47: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

47

Genome

Functional Junk

= Pseudogene

Page 48: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

48

Genome

Functional Junk

= Lazarus DNA

Page 49: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

49

Lazarus DNA

Emmaus DNA Zombie DNA

Page 50: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

If the acquired function (Lazarus DNA) lowers the fitness of the carriers, it is called zombie DNA.

Page 51: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

If the acquired function (Lazarus DNA) is advantageous, it is called Emmaus DNA.

Page 52: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

52

Transcribed Nontranscribed Transcribed Nontranscribed

Genome

Functional Junk

Page 53: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

53

Transcribed Nontranscribed Transcribed Nontranscribed

Genome

Functional Junk

Transcriptome

Not all the transcriptome is functional.

Page 54: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

54

Translated Untranslated

Transcribed Untranscribed Transcribed

Genome

Functional Junk

Untranscribed

Untranslated Translated

Page 55: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

55

Translated Untranslated

Transcribed Untranscribed Transcribed

Genome

Functional Junk

Untranscribed

Untranslated Translated

Proteome Not all the proteome is functional.

Page 56: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

THE ORIGIN OF A SPECIES (smart & elegant) Seiko Astron: Like a smartphone, the Astron is GPS-enabled, allowing it to determine accurate time from atomic clocks and automatically update to any time zone in the world. Unlike a smartphone, however, it looks nice with a suit, won’t break if you drop it and uses solar power, so it never needs to be charged. Darwin would be proud. $2,300

Hemispheres Magazine. April 2013.

Page 57: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

This is an intelligently designed Dining Table

This is an evolutionary functional Dining Table

Evolution does not produce “smart and elegant”

Actually, Darwin would not be proud.

Page 58: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The story of the human genome: 1998 to the present

Page 59: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

59

agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaa 59

Bets: 281 Median: 61,302 Lowest: 25,947 Highest: 212,278 Pot: 1,200 US Dollars

The gene number game: Genesweep©

(started in Cold Spring Harbor, 1998)

Run by a Eton high-school boy called Ewan Birney. His guess was was in the very high range!

Page 60: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Page 61: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

1st draft

15 February 2001

Page 62: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Page 63: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

21 October 2004

From 30,000 protein-coding genes to less than 25,000.

Page 64: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

64

agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaa 64

Lee Rowen (Institute for Systems Biology) won half of the pot with a guess of 25,947 genes. She was at the bottom of the pool. Olivier Jaillon (26,500) & Paul Dear (27,462) shared the rest of the 600 dollars.

The gene number game: Genesweep©

Bets: 281 Median: 61,302 Lowest: 25,947 Highest: 212,278 Pot: 1200 US Dollars

Page 65: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

65

Genebuild last updated/patched: !May 2012!!Total length: 3,287,209,763 bp!Protein-coding genes: 21,065!Pseudogenes: ! ! ! 15,930!RNA-specifying genes: ! 12,955!

Genebuild last updated/patched: April 2013!!Total length: 3,320,602,130 bp!Protein-coding genes: 20,774!Pseudogenes: ! ! ! 14,445!RNA-specifying genes: ! ! 22,493!

Page 66: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The “end” of the Human Genome Project in 2004 ($3.8 Billion) was a big disappointment for scientists unversed in evolutionary biology

The human genome turned out to be: •  small in size •  sparsely populated with genes •  densely populated with dead genomic parasites •  unoriginal

Page 67: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

<< 3.5 billion letters in a four-letter alphabet = 7 billion bits = 0.81 GB (gigabytes)

1 DVD = 8.5 GB

Information content

Small

Page 68: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

≈ 1 DVD = 8.5 GB

3.5 billion letters in a four-letter alphabet = = 7 billion bits = 0.81 GB (gigabytes)

Information content

Small

Page 69: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

69

Sparsely populated with genes. Organism Gene Density

(# genes per 1 Mb)

Escherichia coli (bacterium)

911

Saccharomyces cerevisae (yeast)

483

Arabidopsis thaliana (mustard weed)

221

Drosophila melanogaster (fly)

197

Homo sapiens

12

Page 70: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

+plus at most 0.1% for RNA-specifying genes (non-coding RNA) +plus at most 0.1% for DNA switches.

Densely populated with dead transposable elements

45-67%

Page 71: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Densely populated with dead transposable elements

Page 72: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

0.15% nonsynonymous differences 1.22% synonymous differences

Unoriginal

Cost of sequencing your human genome = ~$25,000. Percent genome recovery = 90%. Error rate = 1-3%. I will provide you with your genome sequence with less error for half the price (and you can haggle).

Data: http://www.plosone.org/article/info:doi/10.1371/journal.pone.0030087

Page 73: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Comparing the human genome to other genomes has given rise to three complexity paradoxes.* Genomic paradox = A lack of correspondence between a measure of genome size and the presumed amount of genetic information “needed” by the organism (its complexity). *The paradoxes only exist under the assumption that humans are the most complex organisms and the pinnacle of creation.

Page 74: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

74

Defining complexity is difficult

The complexity of a system may be defined by the minimum number of independent characters required to describe it, where independence is defined as the ability of the character to assume any possible character state independently of any other character in the system.

Page 75: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

75

Defining complexity is difficult

Thus the wall on the right is more complex—it has a crack, than the wall on left.

Page 76: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Without doubt, is more complex than I

However, even if we cannot quantify organismal complexity very well, in many cases, it is possible to state unequivocally that A is more complex than B.

Page 77: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Without doubt, is more complex than

However, even if we cannot quantify organismal complexity very well, in many cases, it is possible to state unequivocally that A is more complex than B.

Without doubt, is more complex than I

Page 78: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

78

K-value paradox: Complexity does not correlate with chromosome number.

46 250

Ophioglossum reticulatum Homo sapiens Lysandra atlantica

~1260

Page 79: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

79

C-value paradox: Complexity does not correlate with genome size.

3.5 × 109 bp

Homo sapiens

6.7 × 1011 bp Amoeba dubia

1.5 × 1010 bp Allium cepa

Page 80: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

80

G-value paradox: Complexity does not correlate with protein-coding gene number.

~21,000 ~21,000 ~57,000 >94,000

Page 81: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

81

Total Number of Protein-Coding Genes

Drosophila melanogaster (fruitfly) 13,917 Pan troglodytes (chimpanzee) 18,746 Canis familiaris (dog) 19,856 Bos taurus (cow) 19,994 Caenorhabditis elegans (nematode) 20.517 Homo sapiens (human) 20,774 Arabidopsis thaliana (mustard weed) 27,416 Physcomitrella patens (moss) 35,938 Oryza sativa (rice) 40,577 Populus trichocarpa (poplar) 41,377 Manihot esculenta (cassava) 47,164 Malus domestica (apple) 57,386 Triticum aestivum (bread wheat) >94,000

Page 82: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

82

Mommy, mommy, a fern has 27 times as many chromosomes

as I do; an amoeba has 200 times more DNA than I do; and wheat has 5 times more

genes than me.

Page 83: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

83

agaaacggggagaggtatcaaataaacataattgacacacccggacacgttgacttctccgttgaagttgtacgttccatgaaagttctcgacggaatagttttcatattctccgcggttgaaggtgtgcaacctcagtccgaagcaaactggagatgggcggacaggttccaagttccgaggatagccttcataaacaagatggaccgtctgggtgcggatttttacagagtgtttaaggaaatagaagaaaagctaaccataaagcccgttgccattcaaatacccctgggagcggaggaccagtttgaaggtgttatagatctaatggaaatgaaggcaataaggtggctcgaagaaaccctcggagctaaatacgaagtagtagacattcctccagaataccaggaaaaggctcaagaatggcgcgaaaagatgatagaaaccatcgtagaaaccgacgacgagttaatggaaaagtacttagaaggacaggaaatatctatagatgaactaagaaaagctttaagaaaggcaacaatagagagaaagctcgttcccgttctttgcggttctgcattcaagaacaaaggtgttcaaccccttcttgacgcagttatagattacctgccttctcctatagaccttcctcccgttaaggggacaaatcccaagaccggggaagaagaggtcagacacccctctgacgacgaacccttctgcgcttacgcctttaaggttatgtccgacccgtatgccggacaacttacctacatcagagtgttctcaggaacgctaaaagcgggttcttacgtctacaacgcaaccaaggacgaaaagcaaagggctggaagacttcttctcatgcacgcgaactccagagaggaaatacagcaggtttccgcgggtgaaatttgtgcagttgtaggactagacgccgcaacgggtgatactctctgtgatgaaaagcaccccataatccttgaaaagcttgaattccctgaccccgttatatctatggctatagagccaaagaccaagaaggaccaagaaaaactctcacaagttctcaacaagttcatgaaagaggatccaaccttcagggcaacaaccgatcccgaaactggtcagatactcatacacggaatgggtgagctccacctcgaaataatggttgacagaatgaagagggaatacggaattgaagtgaacgtcggtaaaccgcaggttgcttacaaggaaaccatcaggaaaaaggcaattggtgagggtaagttcatcaagcaaactggtggtagagggcagtacggtcacgcgataatcgaaatcgaacccctccccagaggtgcgggatttgaattcatagacgacattcacggaggagttatccccaaagaattcataccctccgttgagaagggtgtaaaggaagctatgcaaaacggaattctcgcaggataccccgttgttgacgttagagttagactctttgacggttcttaccacgaagttgactcttcggacatagcattccaggttgcgggttccttggcattcaaagatgcagccaaaaaggcagatcccgttcttctggaacccataatggaagttgaagtggaaactcc

Conclusion: The human genome is mostly “junk.”

Ohno S. 1972. So much ‘junk’ DNA in our genome. Brookhaven Symp. Biol. 23:366-370.

?

Page 84: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is and what isn’t “junk DNA”

There are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown unknowns—there are things we do not know we don’t know.”

Donald Rumsfeld February 12, 2002

Page 85: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Junk DNA” misrepresented as a “known unknown”

Page 86: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is and what isn’t “junk DNA”

Junk DNA is a known known; it is a thing that we know what it does—it takes space. Junk DNA is any piece of DNA that has no function and does not affect fitness. NOT everything that is not translated or not transcribed is Junk DNA. Junk DNA is NOT a known unknown. Dark DNA is a known unknown.

Dan Graur June 22, 2013

Page 87: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong.

In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak.

Junk DNA is a consequence of population genetics considerations!

Page 88: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong.

In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak.

The majority of new mutations are mildly deleterious. In humans and elephants, selection is not sufficiently strong to eliminate many such deleterious mutations.

Junk DNA is a consequence of population genetics considerations!

Page 89: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

In organisms with LARGE effective population sizes, the strength of natural selection is relatively strong.

In organisms with SMALL effective population sizes, the strength of natural selection is relatively weak.

Humans and elephants are expected to accumulate numerous deleterious mutations in their genome.

Junk DNA is a consequence of population genetics considerations!

Page 90: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Junk DNA is a consequence of population genetics considerations!

Page 91: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“What would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own? If the number [of functional elements] were to stay more-or-less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk. If on the other hand the number of functional elements were to rise significantly with genome size, then organisms with genomes larger than ours should be more complex phenotypically than we are.”

Human exceptionalism? Genomic anthropocentrism?

W. Ford Doolittle (2013)

Page 92: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  A peculiar definition of function. •  A peculiar definition of junk. •  A lack of evolutionary perspective.

Page 93: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

In biology, there are two main concepts of function: •  A historical concept of function, also

referred to as the “selected effect function” or “proper function.”

•  A non-historical concept of function, also referred to as the “causal function.”

Page 94: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is the function of the heart?

The proper function is to pump blood.

Page 95: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is the function of the heart?

The proper function is to pump blood.

The causal functions of the heart are to add 300 grams to the body weight, to produce sounds, to be encased in the the pericardium, to partially fill the mediastinum, to provide an inaccurate logo for Valentine Day cards, etc.

Page 96: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  Evolutionary biologists use the proper or selected effect function.

•  ENCODE used the causal function.

“Operationally, we define a functional element as a discrete genome segment that encodes a defined product (for example, protein or non-coding RNA) or displays a reproducible biochemical signature (for example, protein binding, or a specific chromatin structure).”

Page 97: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  An example of a function that fits the ENCODE definition: shoes binding chewing gum.

“Operationally, we define a functional element as an entity that displays a reproducible signature (for example, chewing gum binding.”

Page 98: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“By the logic employed by ENCODE, following a collision between a car and a pedestrian, a car’s bonnet would be ascribed the 'function' of projecting a pedestrian many meters and the pedestrian would have the 'function' of deforming the car’s bonnet.” Laurence Hurst 2013. BMC Biol. 11:58

Page 99: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

ENCODE uses a know logical fallacy called affirming the consequent.

If a functional sequence is transcribed,

then, all transcribed sequences are functional.

Moreover, ENCODE uses the logical fallacy inconsistently.

Page 100: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The ENCODE Project: 74.7% of the genome is transcribed, 56.1% is associated with modified histones, 15.2% is found in open-chromatin areas, 8.5% binds transcription factors, 4.6% consists of methylated CpGs. The fraction of the genome that is functional (the Boolean union) is 80.4%.

Page 101: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Our additions to ENCODE 74.7% of the genome is transcribed, 56.1% is associated with modified histones, 15.2% is found in open-chromatin areas, 8.5% binds transcription factors, 4.6% consists of methylated CpGs. 84.8% binds histone 100% of the genome is replicated. The fraction of the genome that is functional is 100%.

Page 102: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Interesting Question: Why do people have problems with DNA that has no function?

Page 103: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“… nothing is so alien to the human mind as the idea of randomness.” John Cohen. 1960. Chance, Skill, and Luck: The Psychology of Guessing and Gambling. Baltimore, MD: Penguin Books.

Inability to deal with randomness

Apophenia /æpɵˈfiːniə/: The experience of seeing meaningful patterns or connections in random or meaningless data. A type of mild or incipient schizophrenia. In statistics, apophenia is known as Type I error (false positives). Klaus Conrad. 1958. Die beginnende Schizophrenie. Versuch einer Gestaltanalyse des Wahns [Incipient Schizophrenia: An Attempt to Analyze delusion]. Stuttgart: Georg Thieme Verlag.

Page 104: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

People like mysteries: such as hidden messages in the Bible.

Page 105: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

If you search long enough and hard enough for patterns in random texts, you will find patterns. Especially if you do not employ negative controls. This pattern, for instance, predicts on the vertical from the bottom up (in Hebrew) that MITROMNI(TAURA)NSIA, where NSIA is “president.” The 5 letters in between MITROMNI and NSIA are random. It also helps that Hebrew has no vowels.

Page 106: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

The Bible Code employs no negative controls. Someone else did and they found similar “prophecies” in Moby Dock by Herman Melville.

Page 107: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

ENCODE has no negative controls. Mike White provided them and showed in a paper published in PNAS that random DNA sequences cause reproducible regulatory effects on the reporter gene. Random genetic sequences have as much or a little a function as the human genome sequences analyzed by ENCODE.

Page 108: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is “junk”?

Page 109: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Some years ago I noticed that there are two kinds of rubbish in the world and that most languages have different words to distinguish them. There is the rubbish we keep, which is junk, and the rubbish we throw away, which is garbage. The excess DNA in our genomes is junk, and it is there because it is harmless, as well as being useless, and because the molecular processes generating extra DNA outpace those getting rid of it.”

Sydney Brenner. 1998. Refuge of spandrels. Current Biology 8:R669.

Page 110: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Were the extra DNA to become disadvantageous, it would become subject to selection, just as junk that takes up too much space, or is beginning to smell, is instantly converted to garbage by one’s wife, that excellent Darwinian instrument.”

Sydney Brenner. 1998. Refuge of spandrels. Current Biology 8:R669.

Page 111: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Graur’s garage: Functional but full of junk A garage in which junk became garbage

A garage according to ENCODE

Page 112: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Junk can Sometimes be Repurposed

Page 113: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Junk DNA can Sometimes be Repurposed

Norihiro Okada & Jürgen Brosius, specialists in the repurposing of junk DNA.

Page 114: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Functional DNA ✔ Junk DNA ✔ Garbage DNA ✔ Lazarus DNA ✔ Indifferent DNA Dark DNA

Page 115: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Sequence-indifferent DNA or indifferent DNA refers to DNA sites that are functional, but show no evidence of selection against point mutations. Deletion of these sites, however, are deleterious, and are subject to purifying selection.

Page 116: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Examples of indifferent DNA are spacers and flanking elements whose presence is required but the sequence is not important. One such case is the third position of four-fold redundant codons, which needs to be present to avoid a downstream frameshift.

Page 117: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Dark DNA refers to the fraction of the genome for which no good evidence exists as to its evolutionary impact on fitness. Dark DNA is an unknown unknown. The term “dark” is borrowed from the field of astrophysics.

An astrophysicist (Dr. Or Graur) whose research deals with dark energy. Unfortunately, he has no interest in dark DNA.

Page 118: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Interesting Question: How can one tell if a certain genomic sequence is functional or not?

Can we make the car on the left less fit for driving? Can make the car on the right less fit for driving?

Page 119: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

119

Mutation

Evolutionary change

Mutation Mutation Mutation

Mutation Mutation Mutation

Functional DNA (almost all mutations

are deleterious)

Page 120: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

120

Mutation

Mutation Mutation Mutation

Mutation Mutation Mutation

Nonfunctional DNA

(all mutations are neutral)

Evolutionary change

Page 121: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

121

Since most mutations in functional regions are deleterious and likely to impair the function, these mutations will tend to be eliminated by natural selection. Thus, functional regions of the genome should evolve more slowly, and therefore be more conserved among species, than nonfunctional regions.

How do we know if a particular genomic sequence is functional?

Page 122: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

122

Another indicator for the existence of a genomic function is that losing it has some consequence for the organism. Evolution has tested the functionality of every region of the human genome through mutation over millions of years of evolution.

Page 123: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

123

Page 124: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

124

Page 125: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Page 126: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Is it even possible that ENCODE is right? No! The main reason being that in humans, there is a huge difference between population size and effective population size.

Long-term Ne = 10,000

Page 127: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Is it even possible that ENCODE is right? Under such conditions selection is inefficient and most genetic variation is deleterious. Genomic “perfection” is unachievable.

Long-term Ne = 10,000

Page 128: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

128

Fact 1: It has been known for more than a century that the vast majority of non-neutral mutations are deleterious (Thomas Morgan 1903). Fact 2: Mutation rate is evolvable. These facts have led Alfred Sturtevant to raise the question “Why does the mutation rate not become reduced to zero?” (Sturtevant 1937).

Page 129: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

129

Motoo Kimura: Mutation rate cannot reach zero, because of the COST OF FIDELITY. In other words, the mutation rate in a lineage is a compromise between the benefits of complete fidelity in the replication of the genetic material and the cost of achieving complete fidelity.

The mutation rate modulation hypothesis

Page 130: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased

errors in favor of inflating estimates of functionality.

2.  It consistently and excessively favored sensitivity over specificity.

3.  It paid attention to statistical significance, rather than magnitude of the effect.

Page 131: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased

errors in favor of inflating estimates of functionality.

2.  It consistently and excessively favored sensitivity over specificity.

3.  It paid attention to statistical significance, rather than magnitude of the effect.

Page 132: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Example: Transcription factors binding sites (TFBS): So far, almost all known TFBSs range in length from 6 to 14 nucleotides. The TFBS entries in ENCODE range in size from 457 to 824 nucleotides. Thus, the estimates of the fraction of the human genome devoted to transcription factor bindings are extraordinarily inflated (sometimes by about two orders of magnitude).

Page 133: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased

errors in favor of inflating estimates of functionality.

2.  It consistently and excessively favored sensitivity over specificity.

3.  It paid attention to statistical significance, rather than magnitude of the effect.

Page 134: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)
Page 135: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Encode prefers false positives over false negatives, thus inflating the proportion of positives.

Page 136: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Example: ENCODE used a probability based alignment tool, and mapped RNA transcripts onto DNA when the statistical confidence exceeded 90%. This means that 10% of the correspondences between RNA and genome are erroneous. The total number of RNA transcripts in ENCODE is approximately 109 million. The mean transcript length is 564 nucleotides. Thus, a total of 6 billion nucleotides, or two times the human genome size, are potentially misplaced (false positives).

Page 137: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

How did ENCODE reach such ridiculous numbers? 1.  It used methodologies encouraging biased

errors in favor of inflating estimates of functionality.

2.  It consistently and excessively favored sensitivity over specificity.

3.  It paid attention to statistical significance, rather than magnitude of the effect.

Page 138: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”

Page 139: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”

p = 10−37

Magnitude of effect = 0.042%

Page 140: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Let’s examine the rationale and the methodology for dealing with the derived allele frequency spectrum in primate-specific elements

Page 141: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  If all alleles are neutral, a certain frequency distribution is expected.

•  If some alleles are under negative selection, an excess of rare derived alleles is expected.

•  This excess is expected to be detectable for only very short periods of evolutionary time.

The Why

Page 142: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

•  To deal with very short periods of evolutionary time, ENCODE decided to use primate specific sequences.

The Why

Page 143: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

human human human chimpanzee gorilla macaque rat mouse

Page 144: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

human human human chimpanzee gorilla macaque rat mouse

Primate Specific Sequences

Page 145: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

What is missing from the derived allele frequency spectrum of primate-specific elements in ENCODE? Genes! 3,296,458 SNPs that are in annotated coding regions are not found in the ENCODE sample.

Page 146: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Missing populations and their effect on estimates of derived alleles and ancestral alleles. Three human populations were available at the time ENCODE was submitted; ENCODE used only one.

Page 147: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Asia

ns

Cau

casia

ns

OUT

Derived alleles

Ancestral alleles Yo

ruba

Prim

ate

Spec

ific

Sequ

ence

s

Derived allele frequency (%) 40 60 60

Page 148: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Yoru

ba

OUT

Derived allele frequency (%) 100 20 0

Page 149: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Yoru

ba

OUT

Derived allele Frequency (%) 100 20 0

The ENCODE data includes 2,136 alleles with frequencies of exactly 0. In a miraculous feat of science, ENCODE was able to determine the frequencies of nonexistent alleles.

Page 150: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Frequency of derived allele = 40%

ENCODE uses multifurcated trees

Page 151: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Frequency of derived allele < 40%

ENCODE uses multifurcated trees

Page 152: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

There are no derived alleles

ENCODE uses only single species from primates.

Page 153: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data.”

p = 10−37

Magnitude of effect = 0.042%

Page 154: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Unwarranted extrapolations: Badly trained techincians tend to “kill” junk DNA whenever they find a new function in non-coding DNA.

Page 155: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Even supposing that all the 55,000 putative lincRNAs in this paper are functional and important, then

55,000 × 2000 bp = 110 MB (less than 4% of the human genome). Showing that 4% of the genome is functional is “cool,” but doesn’t bear on the questions of “junk DNA,” which has to do with the majority of the genome.

Page 156: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Conclusion: Badly trained technicians who do not understand (1) population genetics, (2) the concept of effective population size, (3) random genetic drift, and (4) the limitations of selection should be forbidden to even mention “junk DNA” let alone write papers on the subject.

Page 157: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

157

442 researchers + 288 million dollars. What have we learned from ENCODE?

6 S E P T E M B E R 2 0 1 2

Page 158: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“Data is not information, information is not knowledge, knowledge is not wisdom, wisdom is not truth,”

—Robert Royar (1994) paraphrasing Frank Zappa’s (1979) anadiplosis

Page 159: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

159

The onion test is a simple reality check for anyone who thinks they have come up with a universal function for 80% of the genome, or 100% of the genome. Whatever the proposed function, ask yourself this question: Can you explain why onions need about five times more DNA than humans?”

T. Ryan Gregory

3.5 × 109 bp

Homo sapiens

1.5 × 1010 bp Allium cepa

Page 160: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

“All science is either physics or stamp collecting.”

Ernest Rutherford “ENCODE is stamp collecting.”

Roderic Guigó

“I can think of better uses for 288 million dollars.”

Dan Graur

Page 161: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)

Acknowledgments: The Good Guys

Coauthors: Ricardo Azevedo, Becky Zufall, Nicholas Price, and Yichen Zheng (UH), and Eran Elhaik (Johns Hopkins).

Reviewers: Giddy Landan (Heirich Heine Universität, Germany), Michael Lynch (University of Indiana, USA), Naruya Saitou (National Institute of Genetics, Japan), David Penny (Massey University, New Zealand), W. Ford Doolittle (Dalhousie University, Canada + 2 reviewers who think I don’t know who they are.

Editor: Bill Martin (Genome Biology and Evolution)

Page 162: Update version of the SMBE/SESBE Lecture on ENCODE & junk DNA (Graur, December 2013)