DNA, Gene, and Genome
Jan 16, 2016
DNA, Gene, and Genome
Translating Machinery for Genetic Information
Transcription factorsmRNA levels
Automated DNA Sequencing
Data Increase (from NCBI web site)
Partial Display of Human Draft Sequence (Nature, 2001)
Human Genome Map at NCBI
MGALRPTLLPPSLPLLLLLMLGMGCWAREVLVPEGPLYRVAGTAVSISCNVTGYEGPAQQNFEWFLYRPEAPDTALGIVSTKDTQFSYAVFKSRVVAGEVQVQRLQGDAVVLKIARLQAQDQGIYECTPSTDTRYLGSYSGKVELRVLPDVLQVSAAPPGPRGRQAPTSPPRMTVHEGQELALGCLARTSTQKHTHLAVSFGRSVPEAPVGRSTLQEVVGIRSDLAVEAGAPYAERLAAGELRLGKEGTDRYRMVVGGAQAGDAGTYHCTAAEWIQDPDGSWAQIAEKRAVLAHVDVQTLSSQLAVTVGPGERRIGPGEPLELLCNVSGALPPAGRHAAYSVGWEMAPAGAPGPGRLVAQLDTEGVGSLGPGYEGRHIAMEKVASRTYRLRLEAARPGDAGTYRCLAKAYVRGSGTRLREAASARSRPLPVHVREEGVVLEAVAWLAGGTVYRGETASLLCNISVRGGPPGLRLAASWWVERPEDGELSSVPAQLVGGVGQDGVAELGVRPGGGPVSVELVGPRSHRLRLHSLGPEDEGVYHCAPSAWVQHADYSWYQAGSARSGPVTVYPYMHALDTLFVPLLVGTGVALVTGATVLGTITCCFMKRLRKR60-70 KDa Protein interacting with prostate cancer suppressor
Molecular biology databasesSequence databasesAnnotatedLow-annotationSpecializedStructural databasesMotif databasesGenome databasesProteome databasesRNA expressionLiteraturePopulationsMutationsPolymorphismsOrganismsPathways
PromotersESTsTissues and cellsGenome mapsDNA sequencesMolecular PhylogenyProtein sequencesProtein structuresDNA motifsProtein motifsSubstratesMetabolic pathwaysTranscription FactorsRNA expressionMutations/polymorphisms Gene Family
Databases formatsRelational databasesGDB, GSDB, MGD etc.Vender: Sybase, Oracle etc.Flat file databasesGenBank, SWISS-PROT etc.Object-oriented databasesACeDB, AtDB etc.
Molecular biology data typesOrganismsGenome maps
Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequences...AATGGTACCGATGACCTGGAGCTTGGTTCGA...
Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequences...TRLRPLLALLALWPPPPARAFVNQHLCGSHLVEA...
Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequencesProtein structuresRNA structuresPDB entry 1CISP.Osmark, P.Sorensen, F.M.Poulsen
Molecular biology data typesOrganismsGenome mapsDNA sequencesRNA sequencesProtein sequencesProtein structuresDNA motifsProtein motifsRNA expressionRNA structures
DNA microarrays measure variations in RNA levelsThe full Yeast genome on a chiphttp://cmgm.Stanford.EDU/pbrown/De Risi et al, Science 278:680Red dots: genes whose RNA level increased
Green dots: genes whose RNA level decreased
Substrates for High Throughput ArraysSingle label P33Single label biotinstreptavidinDual labelCy3, Cy5
GeneChip Probe Arrays24mMillions of copies of a specificoligonucleotide probe Image of Hybridized Probe Array>200,000 differentcomplementary probes
Single stranded, labeled RNA targetOligonucleotide probe1.28cmGeneChip Probe ArrayHybridized Probe Cell
GeneChip Expression Array Design GeneSequenceProbes designed to be Perfect MatchProbes designed to be Mismatch
Procedures for Target PreparationcDNAFragment(heat, Mg2+)LLLLWash & StainScanHybridize
(16 hours)Labeled transcriptPoly (A)+/Total RNAAAAAIVT
(Biotin-UTPBiotin-CTP)Labeled fragmentsLLLLCells
Microarray Technology
Printing Arrays on 50 slides
Ratio of expression of genes from two sourcesTotalor
GSI Lumonics
Beta ActinPKGHPRTBeta 2 microglobulinRubiscoAB binding proteinMajor latex proteinhomologue (MSG)Cattle and Soy ControlsArray of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green).1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).
IgMIgM heavy chainMYLKCOL1A2COL1A2MYLKIgMFetal Spleen-Cy3Adult Spleen-Cy5IgM heavy chain
Placenta vs. Brain 3800 Cattle Placenta Array cy3 cy5GenePix Image Analysis Software
Experimental DesignImage Analysis raw dataNormalization clean dataData Filtering informative dataModel buildingData Mining (clustering, pattern recognition, et al)ValidationMicroarray Data Process
Scatterplot of Normalized DataAdultFetal
Complexity Levels of Microarray Experiments:
Compare genes in a control situation versus a treatment situationExample: Is the level of expression (up-regulated or down-regulated) significantly different in the two situations? (drug design application)Methods: t-test, Bayesian approach
Find multiple genes that share common functionalitiesExample: Find related genes that are dependent?Methods: Clustering (hierarchical, k-means, self-organizing maps, neural network, support vector machines)
Infer the underlying gene and protein networks that are responsible for the patterns and functional pathways observedExample: What is the gene regulation at system level?Directions: mining regulatory regions, modeling regulatory networks on a global scale
Comparing data from two experiments.
Self Organizing Maps
Molecular Classification of Cancer
Gene Expression Profile of Aging and Its Retardation by Caloric Restriction
Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla
Data Mining Methods
Classification, Regression (Predictive Modeling)Clustering (Segmentation)Association Discovery (Summarization)Change and deviation detectionDependency ModelingInformation Visualization
Major considerations is how to normalize between 2 samples to account for variation encountered in different stages. Statistical, assumption that most genes are evenly expressed. Between tissues need spiking controls.Parallel soy project.Soy and cattle controls. Legend explains all. These all appear on 3800 placenta array.Why placenta vs. Brain More genes expressed in brain. But on placenta array more highly expressed in placenta.Set of control spots on every grid