Top Banner
Explaining the assembly model Valerie Schneider NCBI 21 September 2014
22

Explaining the assembly model

Nov 27, 2014

Download

Science

GRC Workshop at Churchill College on Sep. 21, 2014. This is Valerie Schneider's talk describing the assembly model.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Explaining the assembly model

Explaining the assembly model

Valerie SchneiderNCBI

21 September 2014

Page 2: Explaining the assembly model

Dilthey et al.Paten et al.

Scientific Models

Page 3: Explaining the assembly model

• Differences between the reference genome assembly and other assemblies• Features of the current reference assembly

model and their relationship to genomic analyses and tools• The changing reference genome assembly

Outline

Page 4: Explaining the assembly model
Page 5: Explaining the assembly model

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

GRC Assembly Model

Page 6: Explaining the assembly model

Assembly (e.g. GRCh38)

Primary Assembly

Unit

Non-nuclear assembly unit

(e.g. MT)

PAR

Genomic Region(MHC)

Genomic Region

(UGT2B17)Genomic

Region(MAPT)

Church et al., PLoS Biol. 2011 Jul;9(7):e1001091 GRC Assembly Model

The human reference genome assembly is not a haploid model

ALT 2

ALT 3

ALT 4

ALT 5

ALT 6

ALT 7

ALT 1

Alternate loci are not synonymous with haplotypes

Page 7: Explaining the assembly model

Assembly (e.g. GRCh38.p1)

Primary Assembly

Unit

Non-nuclear assembly unit

(e.g. MT)

ALT 1

ALT 2

ALT 3

ALT 4

ALT 5

ALT 6

ALT 7

PAR

Genomic Region(MHC)

Genomic Region

(UGT2B17)Genomic

Region(MAPT)

Church et al., PLoS Biol. 2011 Jul;9(7):e1001091

Patches

Genomic Region(ABO)

Genomic Region

(FOXO6)Genomic

Region(FCGBP)

GRC Assembly Model

Patches

FIX NOVEL

SCAFFOLD STATUS AT NEXTMAJOR ASSEMBLY RELEASE

ALT LOCI

--(integrated)

Page 8: Explaining the assembly model

1q32 1q21 1p21

Dennis et al., 2012

GRC Assembly Model

Fix patches are different than novel patches

Page 9: Explaining the assembly model

The alignments of the alternate loci scaffolds to the chromosomes are part of the assembly

Page 10: Explaining the assembly model

Anatomy of an alt

Alignment Legend

no alignmentmismatchdeletion

Page 11: Explaining the assembly model

Anatomy of an alt

AC012314.8

CU151838.1

ALT LOCI

AC012314.8

AC245052.3 CHR. 19

Alternate loci contain some sequence that is redundant to the primary assembly unit

Page 12: Explaining the assembly model

Alt Loci: Informatics Challenges

Page 13: Explaining the assembly model

Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning

reads to the full assembly

Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffoldsSimulated Reads

GRCh38: Alt Loci

Page 14: Explaining the assembly model

GRC: Assembly Model

GRCh38

• 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb)

• 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes

Page 15: Explaining the assembly model

GRCh38: Alt Loci

Page 16: Explaining the assembly model

chromosome

alt/patch

reads On-target alignment

Off-target alignments

(n=122,922)

GRCh38: Alt Loci

Page 17: Explaining the assembly model

The Changing Reference

Page 18: Explaining the assembly model

The Changing Reference

Page 19: Explaining the assembly model

Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes

GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Matthew Hurles• Richard Gibbs

GRC Credits

Page 20: Explaining the assembly model

Source/Recruitment of DNA Donors for Library Construction

Another implication of the fact that 99.9% of the human DNA sequence is shared by any two individuals is that the backgrounds of the individuals who donate DNA for the first human sequence will make no scientific difference in terms of the usefulness and applicability of the information that results from sequencing the human genome. At the same time, there will undoubtedly be some sensitivity about the choice of DNA sources. There are no scientific reasons why DNA donors should not be selected from diverse pools of potential donors.

http://www.genome.gov/10000921 (August 17, 1996)

Reference Composition

Page 21: Explaining the assembly model

Today’s reference assembly does not represent:1.The most common allele

2.The longest allele3.The ancestral allele

Page 22: Explaining the assembly model

Roles for the reference

• Getting the sequence• Cataloging genes (and other features)• Establishing a coordinate system• Humans vs. other organisms