Genomics: 1
Genomics-sequencing of microbial genomes
This lecture illustrates the strategies used in microbial genome sequencing projects, compares genome content and organisation amongst microbes, and shows how to derive information on gene function across genome.
Objectives for students:• Expected to describe strategies involved in microbial genome
sequencing and functional genomics• Provide examples of information that can be derived from
genomics
Genomics: 2
Microbial Genome Sequencing• Genome Sequencing Projects
– strategy & methods– annotation
• Comparative genomics– organisation– gene content
• Functional genomics– transcriptome– proteome– genome-wide mutation
• Concentrate on strategy & ideas
Genomics: 3
Bacterial genome projects• Many completed:
– Haemophilus influenzae– Escherichia coli– Bacillus subtilis– Mycoplasma genitalium– Helicobacter pylori (x2)– Campylobacter jejuni– Treponema pallidum– Neisseria menigitidis– Neisseria gonnorhoea– Vibrio cholerae– E. coli O157
• Good link to projects:– http://www.tigr.org/– http://www.ncbi.nlm.nih.gov/– http://www.sanger.ac.uk/– http://www.genomesonline.org/
Genome sequencing progress• Complete:
– Archaeal: 70 (2007&2008: 49&55)– Bacterial: 945 (554&728)– (Eukaryal: 121) (76&97)
• Ongoing: – Prokaryotic: 3498 Archaeal: 111– (Eukaryotic: 1223)
• Metagenome projects: 200
Genomics: 4www.genomesonline.org
Genomics: 5
Microbial eukaryote projects• Complete
– Yeast -Saccharomyces cerevisiae– Plasmodium falciparum– Aspergillus nidulans, A.niger, A.oryzae & A.fumigatus– Trypanosoma cruzi & brucei– Leishmania– Entamoeba histolytica– Giardia lamblia– Candida albicans & glabrata– Paramecium
• Underway– Pneumocystis carinii– Plasmodium vivax– some complete chromosomes finished– Other species and isolates from completed list
Genomics: 6
Why bother? -To sequence or not to sequence(considerations in the pre-genome era)
• piecemeal collection of sequenced genes– slow– costly– ever complete?
• genome project– rational approach– efficient and rapid– quality assurance– address novel questions
• problems/issues– ownership– strain choice– cost– approach– data release– some now less relevant
• Post genomic era– Comparative genomics– Functional genomics
Genomics: 7
Genome sequencing strategy• Strategy choice• large collaborative cosmid/BAC-based projects
– now better suited for larger genomes– slow
• small insert shotgun approach– centralised– rapid and efficient– choice for bacteria
• Strain choice– fresh isolate vs lab strain– clinical vs environmental– subsequent genetic analysis
Genomics: 8
Yeast genome sequence strategy• Yeast chromosomes (16) individually sequenced• several approaches used• Make genome library in cosmids• order cosmid library
– which cosmid overlaps with which– link cosmid to genome map– produced tiled set of cosmids– only sequence minimum number
• Use chromosome specific probe to identify chr-specific cosmids• sequence cosmid inserts by subcloning• Solve problems by direct PCR sequencing, walking and other libraries
(lambda)• Telomeres
Genomics: 9
Tiled set
Genomics: 10
c1A B
c2C D
c3E F
c4G H
c5I J
c1 c2 c3 c4 c5
A
B
C
D
E
F
G
H
I
J
OrderingClones
Genomics: 11
PH011
200100
80 100 120 140 160 180
70512
70449
70893
70515
70124
70266 7202
70265
70871
70463
Genomics: 12
Whole genome/chromosome shot-gun strategy (WGS)
• Rapid• Generation of small insert genomic library• Library is not initially ordered• DNA sequence ends of inserts• Depends on powerful computing to
assemble sequence reads
Genomics: 13
Main steps in generating a complete genome sequence
Isolation
Construction
Shotgun sequencing
Finishing
Annotation
Minimum time period (weeks)
2
4-6
2-4
12
12
Genomics: 14
bacterial chromosome
vectorplasmid
random shearing
size selection
libraryof
clones
sequenceend of
each clone
individual clones
Genomics: 15
Assembly
Sequencing individual clones
genome sequence with gaps
Genomics: 16
Automated sequencers: ABI 3700
• Made by Applied Biosystems
• Most widely used automated sequencers:– 96 capillaries– robot loading from
384-well plates• Two to three hours per
run• 600–700 bases per run
96–well plate
robotic arm and syringe
96 glass capillaries
load bar
Genomics: 17
Automated sequencers: MegaBACE• Made by Amersham• 96 capillaries• Robotic loading from
384–well plate• Two to four hours per
run• Can read up to 800
basesSource : GE Healthcare Life Science, Uppsala, Sweden
Genomics: 18
Automatic gel reading• Top image: confocal
detection by the MegaBACE sequencer of fluorescently labeled DNA
• Bottom image: computer image of sequence read by automated sequencer
Genomics: 19
Industrialization of sequencing• Most genome
sequencing projects divide tasks among different teams– Genome libraries– Production sequencing– Finishing
• Sequencing machines run 24/7
• Many tasks performed by robots
The Broad Institute of MIT and Harvard, www.genome.gov
Genomics: 20
The future is here?..454 sequencing
Reprinted by permission from Macmillan Publishers Ltd: [NATURE] (Margulies et al., 437: 376 copyright (2005)
454 sequencing: the system
Genomics: 21
DNA Library Preparation emPCR Sequencing
4.5 hours 8 hours 7.5 hours
• Well diameter: average 44μm• 400,000 reads obtained in parallel• A single cloned amplified sstDNA
bead is deposited per well
• 4 bases (TACG) cycled 100 times• Chemiluminescent signal generation• Signal processing to determine base
sequence and quality score
Source :454 Sequencing © Roche Diagnostics
Genomics: 22
WGS: Just how much effort?• individual sequencing reads accumulate
– each read about 500bp– computing used to assemble reads– contiguous sequences called contigs
• Aim for 8-10 read coverage of genome for accuracy
• example:– H.influenzae
• 19,687 templates• 24,304 reads assembled• 11,631,485 bp
• 9
Genomics: 23
Sequencing a genome
vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic
contiguous sequence
luatedgeneticsrel
tatedgene
ourcesforteachcisahubofevaluatedgenc
hprofessionalsandgeneralpub hprofessionalsandgeneralpub
cisahubofevaluatedgen
esforteachershealt
cisahubofevaluatedgenc chershealthprofession
luatedgeneticsrel
esforteachershealt
atedgene
ourcesforteach
chershealthprofession
atedgene
fragments of sequence luatedgeneticsrel ourcesforteachchershealthprofession
vgecisahubof bofevaluatedgenetics
icsrelatedresourcesforteachershealth lthprofessionalsandgeneralp
generalpublicoverlaps
Genomics: 24
Gaps
Physical Gap
Sequence Gap
Genome
Library cloneSequence read
contig
Genomics: 25
Bridging Gaps
• rise in contig number as amount of reads increases• steady fall as accumulating sequence bridges gaps between contigs• levels off as new reads more likely in known contig than gap• start finishing
Number of reads
Num
ber
of c
onti
gs
1
rapid gap bridging
difficult gap bridging
Finishing
Genomics: 26
Finishing• Why are gaps present?• Gap bridging
– sequence gaps• sequence gaps –choose appropriate clone and walk
– physical gaps• alternative libraries (which?)• PCR across gap
• Mistakes/poor sequence– areas where sequence reads are less than 8-10– repeated sequences -rRNA
• closure and completion
Genomics: 27
Finished Yet?atgaatccaagccaaatacttgaaaatttaaaaaaagaattaagtgaaaacgaatacgaaaactatttatcaaatttaaaattcaacgaaaaacaaagcaaagcagatcttttagtttttaatgctccaaatgaactcatggctaaattcatacaaacaaaatacggcaaaaaaatcgcgcatttttatgaagtgcaaagcggaaataaagccatcataaatatacaagcacaaagtgctaaacaaagcaacaaaagcacaaaaatcgacatagctcatataaaagcacaaagcacgattttaaatccttcttttacttttgaaagttttgttgtaggggattctaacaaatacgcttatggagcatgtaaagccatagcacataaagacaaacttggaaaactttataatccaatctttgtttatggacctacaggacttggaaaaacacatttacttcaagcagttggaaatgcaagcttagaaatgggaaaaaaagttatttacgctaccagtgaaaatttcatcaacgattttacttcaaatttaaaaaatggttctttagataaatttcatgaaaagtatagaaactgcgatgttttacttatagatgatgtacagtttttaggaaaaaccgataaaattcaagaagaatttttctttatatttaatgaaatcaaaaataacgatggacaaatcatcatgacttcagacaatccacccaacatgctaaaaggtataaccgaacgcttaaaaagtcgttttgcacatgggatcatagctgatataactccacctcaactagatacaaaaatagccatcataagaaaaaaatgtgaatttaacgatatcaatctttctaatgatattataaactatatcgctacttctttaggggataatataagagaaatcgaaggtatcatcataagtttaaatgcttatgcaaccatactaggacaagaaatcacactcgaacttgccaaaagtgtgatgaaagatcatatcaaagaaaagaaagaaaatatcactatagatgacattttatctttggtatgtaaagaatttaacatcaaaccaagcgatgtgaaatccaataaaaaaactcaaaatatagtcacagcaagacgcattgtgatttacctagctagggcacttacggctttgactatgccacaacttgcgaattattttgaaatgaaagatcatacagctatttcacataatgttaaaaaaatcacagaaatgatagaaaatgatgcttctttaaaagcaaaaatcgaagaacttaaaaacaaaattcttgttaaaagtcaaagttaagtgaaaggatgtgaaaaataaattctagagtgtgaaaaaaagaaattaagcaaagtatgataaaatacaaatttgattattttgctttgaaaaatttcacaatttcaacaagcttattattacaacgaatttaaaattaaaataaaccaaggagaaaaaatgaagttaagtatcaataaaaatactttagaatctgcagtgattttatgtaatgcttatgtagaaaaaaaagactcaagcaccattacttctcatcttttttttcatgctgatgaagataaacttcttattaaagctagtgattatgaaataggtatcaactataaaataaaaaaaatccgcgtagaatcaagtggttttgctactgcaaatgcaaaaagtattgcagatgttattaaaagcttaaacaatgaagaagttgttttagaaaccattgataattttttatttgtaagacaaaaaagtacaaaatacaaacttcctatgtttaatcatgaagattttccaaattttccaaatacagaaggaaaaaaccaatttgacattgattcaagtgatttaagccgttctcttaaaaagatattaccaagtattgatacaaataacccaaaatactccttaaatggtgcatttttagatataaaaacagataaaattaacttcgtaggaactgatacaaaacgccttgcaatctatactttagaaaaagcaaataatcaagaatttagttttagtatccctaaaaaagctattatggaaatgcaaaaacttttctatgaaaaaatagaaattttttatgatcaaaatatgcttattgccaaaaatgaaaattttgaattctttacaaaacttatcaatgataaatttccagattatgaaaaagttataccaaaaactttcaaacaagaactcagtttttcaactgaagattttatagatagtcttaaaaaaatcagcgttgtaactgaaaaaatgagacttcattttaacaaagataaaatcatctttgaaggtataagtttagacaatatggaagcaaaaacagaacttgaaattcaaacaggagtaagtgaagaatttaatcttactataaaaatcaaacatttacttgatttcttaacttctatagaagaagaaaaattcactttaagtgtaaatgaacctaattcagcatttatagtcaaatcccaaggactatcaatgattatcatgcctatgattttgtaataaaacaagtaaaagataaaggaaaaatatgcaagaaaattacggtgcgagtaatattaaagtcctaaaaggcttagaagctgttagaaaacgcccaggtatgtatataggagatacaaacataggcggacttcatcatatgatttatgaagttgtggataattctatcgatgaagctatggcaggacattgcgatactatagatgtagaaatcactactgaaggaagctgtatagttagtgataatggtcgtggtattcctgttgatatgcacccaactgaaaatatgccaactttaactgttgttttaactgtcctacatgcagggggaaaattcgataaagatacttataaagtttcaggcggtttgcacggtgttggggtttcggttgtaaatgcactctctaaaaaacttgtagctacagttgaaagaaatggagaaatttatcgtcaagaattttcagaaggtaaagttatcagtgaatttggtgtgataggaaaaagtaaaaaaacaggaacaactatagaattttggcctgatgatcaaatttttgaagtgactgaatttgattatgaaattttggctaaaagatttcgtgaacttgcatacttaaatccaaaaatcactataaattttaaagataaccgcgtaggcaaacatgaaagttttcactttgaaggtggaatttctcagtttgttacagacttaaataaaaaagaagctttaactaaagcaattttctttagtgtagatgaagaagatgtgaatgttgaagtagctttgctttacaatgatacttatagtgaaaatttactctcttttgtaaataatattaaaaccccagatggtggaacacacgaagctggttttagaatgggtttaactcgtgtgataagtaactatatagaagcaaatgcaagtgctagagaaaaggataataaaatcacgggtgatgatgtgcgtgaaggtttgatcgctattgtgagtgtaaaggtacctgaaccacaatttgaaggacaaaccaaaggaaaacttggttcaacttatgtgcgtcctatagtttcaaaagcaagttttgagtatttgactaaatattttgaagaaaatcctatcgaagctaaagctataatgaataaagctttaatggcagctagaggaagagaagcagcgaaaaaagctagagaattaacgcgcaaaaaagaaagtttaagcgtaggaactttaccagggaaattagctgattgtcaaagtaaagatccaagtgaaagtgaaatttatcttgtggaaggggattctgcaggaggttctgcaaaacaaggtagagaaagatctttccaagctatactgcctttgcgtggtaaaattttaaatgttgaaaaagcaagactagataaaattttaaaatctgagcaaattcaaaatatgattaccgcttttggctgtggtataggtgaagattttgatctttcaaaacttagatatcataaaatcatcatcatgacagatgcggatgttgatggatctcatatacaaaccttgcttttaactttcttcttccgttttatgaatgaacttgtggcaaatggacatatttatctagcacaaccacctttatatctttataaaaaagctaaaaagcaaatttatttaaaagatgaaaaagctttgagcgaatacctgatagaaacgggaatagaaggtttaaactatgaaggtataggaatgaatgatttaaaagattatttaaaaatcgttgcagcttatcgtgcgattttaaaagatcttgaaaagcgttttaatgtgatttctgtgatacgctatatgatagaaaattcaaatttagttaaaggaaataatgaagaattatttagtgtaatcaaacaatttttagaaacacaaggacacaatatcttaaatcattatatcaacgaaaatgaaattcgagctttcgttcaaactcaaaatggcttagaagaacttgtgatcaatgaagaacttttcactcatccactatatgaagaagcgagttatatttttgataagattaaagatagaagcttggaatttgataaagatattttagaagttcttgaagatgttgaaaccaatgctaaaaaaggtgctactatacaacgctataaaggtttaggggaaatgaatcctgagcaactttgggaaaccacaatggatccaagcgtaagaagacttttaaaaatcactattgaagatgcacaaagtgcaaatgatacctttaatctctttatgggtgatgaggttgaaccaagacgcgattatatccaagcgcacgctaaagatgtaaagcatttggatgtgtaaaaatttatcattgaagaaatcatttcttcaatgagttttgttttgtaagagtatagctagaggaattcttcttcttgtatcgtatttttctccataatatttttcaagataatttaaaattttttcttcatcttcaggttctatttcccaaagtccttcactatcttgcatccatcttatagctgctaaccaagcttttctacttgcatgcatattggtaatgagattggatccatgacaagctaaacaatttgcttccactaaaggtgaatcaggatcgataatcaatcctgtatcagggttaatttcaagattttgagcccaacttgcacttaaaaacaatgctaagatcaatataatttttttcatacttaaactccataaacattaactctatggcatgcattattgatatatcctcctggattccactgtgctaaaaccataggttgactgttaccttgactatcgatagctcttgcccaaatttcataatatccttttgttggtattgatatttgagcactccatttttgccatgctaatctatttaatggtttttctacctttgc ………………….
Genomics: 28
Sequencing a genome
VGEC is a hub of evaluated genetics related resources for teachers, health professionals and general public.
annotation
vgecisahubofevaluatedgeneticsrelatedresourcesforteachershealthprofessionalsandgeneralpublic
contiguous sequence
luatedgeneticsrel
tatedgene
ourcesforteachcisahubofevaluatedgenc
hprofessionalsandgeneralpub hprofessionalsandgeneralpub
cisahubofevaluatedgen
esforteachershealt
cisahubofevaluatedgenc chershealthprofession
luatedgeneticsrel
esforteachershealt
atedgene
ourcesforteach
chershealthprofession
atedgene
fragments of sequence luatedgeneticsrel ourcesforteachchershealthprofession
vgecisahubof bofevaluatedgenetics
icsrelatedresourcesforteachershealth lthprofessionalsandgeneralp
generalpublicoverlaps
Genomics: 29
Genome Annotation• Find ORFs
– look for ATG-Stop (+alternatives) – over certain size– overlaps– computer based (“Glimmer” & “Orpheus”) and
trained eye.• ORF function
– Search databases with predicted translated sequences –BLASTX
– Consider level of similarity and context– Domain comparisons
• Pfam/Prosite
• Other features
Genomics: 30
www.yeastgenome.org
Genomics: 31
http://mips.gsf.de/genre/proj/yeast/index.jsp
http://www.yeastgenome.org/MAP/GENOMICVIEW/GenomicView.shtml
Genomics: 32
Artemis: sequence viewer and annotation tool from the Sanger Centre (http://www.sanger.ac.uk/Software/Artemis/)
Genomics: 33
Genomics: 34
Genomics: 35
http://xbase.bham.ac.uk/
xBASE is a database for comparative genome analysis of all bacterial genome sequences
Chaudhuri RR, Pallen MJ. xBASE, a collection of online databases for bacterial comparative genomics. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D335-7.
Genomics: 36
Coordinator
DNA
Shotgun sequences
Finishinginstructions
Shotgun templates
Annotation tasks
Finishing sequences
Bioinformatics Lab
Annotations
SS
S
S
SS
S
SS
S
S
S
S
S S SS
S S
S
S
S
S
S
S
SS S
S
SS
SS
Working draft sequence
Finished sequence
Finished annotated sequence
A conceptual diagram of the flux and information in a network-based genome-sequencing project
Genomics: 37
Post Genome Sequence• Comparative genomics
– comparing genome organisation and content– genome size– genome repeats/Tn/phages– gene content– minimal gene content
• Functional genomics –ascribing gene function across a genome– gene function –knowns– phenotype prediction– gene function –unknowns– investigating function
• Bacteria-Yeast
Genomics: 38
Bacteria: Does size matter?• Link genome size to adaptive capability
– biosynthetic capability• synthesis of nutrients
– Stress resistance• resist environmental insults
– structural complexity• surface structures, sporogenesis
– Regulation –sensing signals and transcriptional responses• detect change or requirement and respond
appropriately• transcriptional regulation
Genomics: 39
Not just Size but how you use it….. • Small genomes
– Mycoplasma genitalium• 580,070 bp• smallest genome for self-replicating organism• free living but only just..infects host cells (guess which!)• few biosynthesis and regulatory systems• has replication & transcription & translation, metabolism etc
functions– Borrelia burgdorferi
• 910,725 bp• Lyme disease• few cellular biosynthetic systems
– Mycoplasma pneumoniae (0.8 Mbp); Chlamydia trachomatis (1.0 Mbp);
Genomics: 40
bigger genomes• Haemophilus influenzae
– 1.830 Mbp– colonises human respiratory tract– limited environment
• Helicobacter pylori– 1.667 Mbp– colonises human stomach– limited environment
• Campylobacter jejuni– 1.641 Mbp– colonises intestine– limited environment
Genomics: 41
and bigger….• Escherichia coli (K-12)
– 4.639 Mbp• Bacillus subtilis
– 4.214 Mbp– soil/plant organism– secondary metabolites
• Pseudomonas aeruginosa – incomplete (5.9 Mbp)
• Yersinia pestis (4.4 Mbp)• Clostridium spp (4-5 Mbp)• Mycobacterium tuberculosis
– 4.411 Mbp– slow growing (double in 24h)– large proportion of genome on lipid metabolism
• Streptomyces coelicolor (~8 Mbp)– secondary metabolites –antibiotics!
Genomics: 42
Organisation• Linear chromosomes
– Borrelia burgdorferi– Streptomyces coelicolor
• Multiple chromosomes– Vibrio cholerae
• Plasmids– Borrelia burgdorferi– 17 linear & circular plasmids– 50% genome size– plasmid replication, “decaying genes”, ?Ag variation
• Transposons, IS elements, phages– found in most genomes– Campylobacter has none
• Repeats
Genomics: 43
Replication• Origin (oriC) and termination (terC) of replication
– OriC often near dnaA gene (replication initiation protein)
– In Borrelia burgdorferi (linear) oriC (& dnaA) in centre
• strand bias– which strand is each gene on?– transcription in same direction as replication –more
efficient– variation in level of strand bias
• Mt 55% vs Bs 75%
Genomics: 44
Gene Content• Annotation
– sequence similarity• gene families• regulators, transport, biosynthesis
– domain matches• trans-membrane domains, DNA binding
• Paralogues and Orthologues– Paralogues:
• Members of same family (homologous) in same genome.• Likely to have different exact function
– Orthologues:• homologues (same family) in different genomes• May have identical function
Vibrio cholerae as predicted by genome........
Reprinted by permission from Macmillan Publishers Ltd: [NATURE]( Heidelberg et al, 406 ,477-483), copyright (2000) Genomics: 45
Genomics: 46
Gene content (cont.)• ORFans
– significant proportion of genome contains ORFs of unknown function
– some may be orthologues of unknowns in other organisms
– some unique to organism• important for biology of organism
– examples:• H.influenzae: 42%• H.pylori: 33%• E.coli: 38%• M.tuberculosis: 60% to 16%
– number decreasing• Gene size –most about 1kb
Genomics: 47
Genomic rearrangements
• Example comparison
• Comparison of:S.e Typhi CT18 withS.e Typhi Ty2
• inversion that spans terminus
http://www.sanger.ac.uk/resources/software/act/
Genomics: 48
Variation by gain and/or loss• Core regions
– shared by closely related species• Additional “flexible” gene pool
– variable regions– acquired from mobile genetic
elements• First described as pathogenicity
islands– in non-pathogens too– wider role
• Genomic Islands– pathogens– commensals– symbionts– environmental
• Gain of GI sometimes assoc with gene loss– reduction in obligate intracellular
pathogens• Genome organisation as well as
genome content correlates with microbial lifestyle
Genome reduction by deletion events
Gene acquisition by HGT
Mutations rearrangements
Common bacterial ancestor
Intracellular bacterium, obliagate intracellular
pathogen, endosymbiont
Extracellular bacterium, facultative pathogen,
symbiont
All lifestyles
GEIPlasmid
Genomics: 49
Other tRNA-associated elements: tRNAPProL
Black arrows=Sal+Ec; white arrows=Sal or Ec; grey=strain/serovar specificGC is for S. Typhi
Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5
Genomics: 50
Other tRNA-associated elements: tRNAArgU
Infection and Immunity, May 2002, p. 2351-2360, Vol. 70, No. 5
The supragenome• The distributed-genome hypothesis (DGH)• Bacteria have a (supra) genome much larger than
the genome of any single bacterium.• Core and non-core gene sets
– Example: Hiller et al. sequenced 8 strains of Streptococcus pneumoniae + 9 already available
– Core set of genes in all strains– 20-30% genes non-core (not present in all strains)
• Genetic recombination generates diversity across strains.
• Also for Haemophilus influenzae (Hogg et al.)– ~1400 in core set and ~1300 non-core in subset of strains
Genomics: 51
• Hiller et al. Journal of Bacteriology, November 2007, p. 8186-8195, Vol. 189, No. 22
• Hogg et al. Genome Biology 2007, 8:R103 (doi:10.1186/gb-2007-8-6-r103).
Genomics: 52
Yeast• 16 chromosomes totalling 12.068Mbp• 5885 orfs –6275 but 390 unlikely translated• Few introns ~4%• Avg gene size 2kb (worm ~6kb and human >30kb)• GC vary along chr length
– low GC at telomere & centromere– GC rich correlate with higher recombination
• Tn and remnants in genome– evidence of hotspots
• 50% orfs known function – some exact role unclear
• http://genome-www.stanford.edu/Saccharomyces/• http://mips.gsf.de/projects/fungi
Genomics: 53
Functional genomics
• Functional genomics –ascribing gene function across a genome
• function and inter-relationships• strategy
• [bioinformatic analysis -gene identification]• Transcriptome -expression pattern– Proteome -expression pattern– Mutantome -mutant phenotype– Interactome –protein-protein interactions
GENOME
TRANSCRITOME RNA
Copies of the active protein-coding genes
PROTEOMEThe cell’s repertoire
Genomics: 54
Arrays: micro and chip• Microarrays
– Glass slides with <10000 individual samples applied in known position
– Use of robotics– Samples can be PCR products or oligos– example: oligo/PCR product complementary to each
ORF• Chip arrays
– silicon based– >10,000 sequences– http://www.affymetrix.com/index.html
• Redundancy• fluorescent labels
Genomics: 55
One cell= one specific sequence
AC
GT
AT
AC
GT
AT
AC
GT
AT
TG
CA
TA
TG
CA
TA
TG
CA
TA
LaserChipArrays
Individual sequences &bound sample
Genomics: 56
Transcriptome• Genome-wide determination of expression
level of each ORF• when expressed relates to role• also assess mutants• compare expression of each ORF in
different conditions• Genome wide expression maps• global patterns of expression
Genomics: 57
AGGCAT AATGAA When expressed?
mRNAs
2 x ORF
Bacillus genieae
AATGAA
AGGCAT
orf 1 orf 2
orf 2orf 1
grow in conditions when only orf 2
expressed
isolate mRNAs and make cDNA
copy
AATGAATTACTT
TTACTT
Genomics: 58
extractmRNA
Grow underdifferent
conditions
Probe array with labelled copy of mRNA
Genomics: 59
Differentially labelled probes
Red channel
Green channel
Combined
Genomics: 60
http://www.bio.davidson.edu/courses/genomics/chip/chip.html
Genomics: 61
Expression profiling C. jejuni in low iron
Cj1659 (P19)
Cj0177
Cj0037c
Genomics: 62
Proteome• Genome-wide determination of protein expression• Gives information stimulons• protein expression linked to function• assess mutants (regulatory mutants affect several
proteins)
• Grow bacteria under defined conditions• Extract proteins• 2D-gel electrophoresis• Protein spot identification • Mass Spectrometry• peptide size predictions from Genome data
Genomics: 63
Defining the Campylobacter proteome –chasing spots
Which protein? Which conditions?
Which other proteins are co-expressed?
Genomics: 64
C. jejuni iron example
Genomics: 65
digest with
protease
pIM
ol m
ass
Mass Spec
* * ***
http://depts.washington.edu/yeastrc/pages/ms.html
Genomics: 66
Mass Mutagenesis: mutantome• Mutate every ORF in genome
– organism specific technology
• High throughput analysis of phenotype– need to analyse many 1000s of mutants under many
conditions
• Signature-tagged technology– enables analysis of mutant pools– requires array technology for genome-wide projects
• Association on ORF with mutant phenotypes• Regulators might be pleiotropic
Genomics: 67
Arrays: micro and chip• Microarrays
– Glass slides with <10000 individual samples applied in known position
– Use of robotics– Samples can be PCR products or oligos– example: oligos complementary to each unique Tag– example: oligo/PCR product complementary to each
ORF• Chip arrays
– silicon based– >10,000 sequences– http://www.affymetrix.com/index.html
• Redundancy• fluorescent labels
Genomics: 68
One cell= one specific sequence
AC
GT
AT
AC
GT
AT
AC
GT
AT
TG
CA
TA
TG
CA
TA
TG
CA
TA
LaserChipArrays
Individual sequences &bound sample
Genomics: 69
Signature Tagged
• Tags are short unique DNA sequences
• Tag linked to mutation
• Each individual mutant has unique tag
• Each mutant ORF has unique Tag
ORF X
Chromosomal Mutants
Genomics: 70
ORF X
Chromosomal Mutants Mutant Pools
compare
condition ‘normal’
functional role ?
Genomics: 71
Bar coding genes
mutant 2
mutant 3
mutant 4
and so on…to mutant 1654.
mutant 1mutant-
specific DNA sequence
“normal, un-mutatedCampylobacter
Genomics: 72
Which bar codes are missing?
• Which bar coded mutants are missing?
• Gene involved in process
mutant pool
post-treatmentmutant pool
copies of barcodes present
1 2 3 4……… 9 10
1121
91 100
+ + + + + ++++++
++ + +
++ - - --
- +-
-
Bar code Array
+ + +
www.freedigitalphotos.net/
Reprinted by permission from Macmillan Publishers Ltd: [NATURE REVIEWS GENETICS] (Mazurkiewicz et al. 7 929-939), copyright (2006) Genomics: 73
InteractomeYeast 2 hybrid
Genomics: 74
http://en.wikipedia.org/wiki/Two-hybrid_screening
Which proteins can interact?
•Expression library of binding-domain::protein 1 (bait)
•Expression library of activation-domain::protein 2 (prey)
•Test combinations of all genome orfs
•Which combinations turn on the reporter gene?
Protein-protein interaction networks
Genomics: 75
Parrish et al. 2007. A proteome-wide protein interaction map for Campylobacter jejuni. Genome Biol 8:R130.
Genomics: 76
Genomotyping or Genomic indexing
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
11 12 13 14
6 7 8 9
1 2 3 4
15
10
5
• Array of all known genes in microbe• Genes 1, 2, 3 &14 forms minimal gene set• Hybridise array with labelled chromosomal DNA
1
2
3
146
5
9
8
11
45
15
Isolate 1 Isolate 2 Isolate 3