Top Banner
Introduction to Bioinformatics: some definitions Peter K. Rogan, Ph.D. Laboratory of Human Molecular Genetics Children’s Mercy Hospital Schools of Medicine & Computer Science and Engineering, UMKC http://www.sce.umkc.edu/~roganp
25

Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Introduction to Bioinformatics:some definitions

Peter K. Rogan, Ph.D.Laboratory of Human Molecular Genetics

Children’s Mercy HospitalSchools of Medicine & Computer Science and

Engineering, UMKChttp://www.sce.umkc.edu/~roganp

Page 2: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Definition of Bioinformatics: What is bioinformatics?

• Roughly, bioinformatics describes any use of computers to handle biological information

• In practice, the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"---the use of computers to characterize the molecular components of living things.

Page 3: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

"Classical" bioinformatics

• Most biologists talk about "doing bioinformatics" when they use computers to store, retrieve, analyze or predictthe composition or the structure of biomolecules.

• As computers become more powerful you could probably add simulate to this list of bioinformatics verbs.

• "Biomolecules" include your genetic material---nucleic acids---and the products of your genes: proteins.

• These are the concerns of "classical" bioinformatics, dealing primarily with sequence analysis.

Page 4: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

• Richard Durbin, Head of Informatics at the WellcomeTrust Sanger Institute, expressed an interesting opinion on this distinction in an interview:– "I do not think all biological computing is

bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."

Page 5: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Monomers and polymers• Most large biological molecules that they are polymers;

ordered chains of simpler molecular modules called monomers.

• Monomers that can combine in a in a chain are of the same general class, but each kind of monomer in that class has its own well-defined set of characteristics.

• Many monomer molecules can be joined together to form a single, far larger, macromolecule. Macromolecules can have exquisitely specific informational content and/or chemical properties.

• The monomers in a given macromolecule of DNA or protein can be treated computationally as letters of an alphabet (strings), put together in pre-programmed arrangements to carry messages or do work in a cell.

Page 6: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Living in the " post-genomic" era • From multiple whole genome sequences we can look for

differences and similarities between all the genes of multiple species. From such studies we can draw particular conclusions about species and general ones about evolution. This kind of science is often referred to as comparative genomics [EXAMPLE].

• There are now technologies designed to measure the relative number of copies of a genetic message (levels of gene expression) at different stages in development or disease or in different tissues.

• Large-scale ways of identifying gene functions and associations will grow in significance and with them the accompanying bioinformatics of functional genomics.

Page 7: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Comparing 2 genomes vs. reference genomewith matches defined above 70% similarity

Reference Genome : E. coli K12- MG1655

This figure shows the protein matches of each comparison genome to the reference genome.Each ring of the circular display represents a genome and each tick mark represents a gene match along the length of the genome. The outer ring displays the reference genome and the inner rings display each comparison genome.

Summary statistics for reference genome3129 reference genes match a comparison genome at least once for the given criteria

125 reference genes match all comparison genomes for the given criteria

1160 reference genes match none of the comparison genomes for the given criteria

Staphylococcus epidermidis ATCC 12228 146 genes match reference

Salmonella typhimurium LT2 SGSC1412 3555 genes match reference

3701 total gene matches to reference genome

Page 8: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Results of Query: Gene sequences found in E. coli, S. typhimerium and S. epidermidis

70.212769putative 2-component transcriptional regulatorb2855

putative transcriptional regulator (LuxR/UhpA

familiy)STM3606

70.212769thiogalactosideacetyltransferaseb0342

UDP-3-O-(3-hydroxymyristoyl)-

glucosamine n-acyltransferase

STM0226

70.238098ATP-binding component of a transporterb0199

putative ABC-type transport system ATPasecomponent/cell division

STM0511

100uridylate kinaseb0171uridylate kinaseSTM0218

100dnaK suppressor proteinb0145dnaK suppressor proteinSTM0186

100

ATP-binding cell division protein,

septation process, complexes with FtsZ,

associated with junctions of inner and outer

membranes

b0094

ATP-binding cell division protein,

septation process, complexes

STM0132

100cell division protein; ingrowth of wall at

septumb0083cell division protein

ingrowth of wall atSTM0121

100transcriptional repressor of fru operon and othersb0080transcriptional repressor

of fru operon and othersSTM0118

% similaritymatching common namematching locuscommon nameLocus Accession

Number

Page 9: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Detailed analysis of conserved sequence region in Salmonella typhimerium

PrositeTIGR+

-strand

Genome coordinate

Gene Evidence:

GenBank

Page 10: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Living in the …(continued)• Shift in emphasis (of sequence analysis

especially) from genes themselves to gene products. – attempts to catalogue the activities and characterize

interactions between all gene products (in humans): proteomics

– attempts to crystallize and or predict the structures of all proteins (in humans): structural genomics

– fewer DNA double-helices in bad sci-fi movies!

Page 11: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Living in the…(continued 2)

• What some people refer to as research or medical informatics, the management of all biomedical experimental data associated with particular molecules or patients will move into the mainstream of cell and molecular biology and migrate from the commercial and clinical to academic sectors.

Page 12: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is medical informatics?

• Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information.

• The end objective of biomedical informatics is the coalescing of data, knowledge, and the tools necessary to apply that data and knowledge in the decision-making process, at the time and place that a decision needs to be made.

• The focus on the structures and algorithms necessary to manipulate the information separates Biomedical Informatics from other medical disciplines where information content is the focus.

Page 13: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

The distinction• This suggests that one difference between bioinformatics

and medical informatics as disciplines lies with their approaches to the data; there are bioinformaticistsinterested in the theory behind the manipulation of that data and there are bioinformatics scientists concerned with the data itself and its biological implications.

• Medical informatics, for practical reasons, is more likely to deal with data obtained at "grosser" biological levels---that is information from super-cellular systems, right up to the population level---while most bioinformatics is concerned with information about cellular and biomolecular structures and systems. University of Missouri-Columbia has a Ph.D. program in Medical Informatics

Page 14: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Seminars this semester at MU on Health InformaticsSam Schulz, Ph.D.Professor, Director of Health Informatics, HMI "ROI, Integration and the Effects of Receivership upon the IT Enterprise for AMCs“

Steven Waldren, MDNLM Postdoctoral Fellow, HMI"Determining Family Medicine Residency Needs and Expectations of an Electronic Medical Record“

Raman Seth"Biochemical Names Database“

Kathryn J. NelsonProject Director, Clinical Outcomes"Implementing an electronic medical error reporting system“

Timothy B. Patrick, Ph.D.Assistant Professor, HMI"A Text Corpus Approach to an Analysis of the Shared Use of Core Terminology“

John GormanNLM Fellow, HMI"Pursuing Best Practices for Information System Management in Academic Medical Centers"

Swetha SridharNLM Fellow, HMI"A comparison of face-to-face and virtual dermatology visits"

Jeannette Jackson-Thompson, MSPH, Ph.D.Resident Assistant Professor, HMI; Director, Missouri Cancer Registry"Quality of Life for Cancer Survivors-A Collaborative Research Project Involving the Missouri Cancer Registry, the Department of Family and Community Medicine and the National Office of the American Cancer Society"

Page 15: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is Genomics?

• Genomics is a field which existed before the completion of the sequences of genomes.

• In the crudest of forms, for example the oft-referenced estimate of 100,000 genes in the human genome derived from a(n) (in)famous piece of "back of an envelope" genomics, guessing the weight of chromosomes and the density of the genes they bear. [We now have evidence for ~35,000 genes in the human genome]

• Genomics is any attempt to analyze or compare the entire genetic complement of a species or species (plural). It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.

Page 16: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is Mathematical Biology?• Mathematical biology also tackles biological problems,

but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware.

• Indeed, such methods need not "solve" anything; in mathematical biology it would be considered reasonable to publish a result which merely establishes that a biological problem belongs to a particular general class.

• According to Alex Kasman:– bioinformatics "...seems to focus almost exclusively on specific

algorithms that can be applied to large molecular biological data sets..." whereas

– mathematical biology "...includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecularin nature, and are not necessarily useful in analyzing collecteddata."

Page 17: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Research at the Center for Mathematical Biology, University of OxfordSpatial and spatiotemporal pattern formationSpecifically, partial differential equation modelling of the chemical and mechanical aspects of the generation of pattern and form in embryology and development. Applications include skeletal patterning in the vertebrate limb, primitive streak formation, somitogenesis, skin organ formation (eg feather germ formation, tooth initiation); tissue movement during invagination processes; tissue-tissue interactions in, for example, determining lung morphology; cell aggregation in Dictyostelium, pattern generation in Hydra. We have recently begun to investigate discrete models to understand pattern formation on a cellular level (eg. Delta-Notch intercellular signalling).

Wound healingRecent biological advances in the understanding of foetal wound healing have shed new light on the role of the interaction between cells and their environment in both foetal and adult wound repair. We are investigating normal and abnormal wound healing. Applications include, modelling wound contraction, fibroproliferative diseases, scar tissue formation and corneal wound healing. This investigation is being carried out in collaboration with experimental colleagues in the Biology Department at Manchester University.

Mathematical Modelling to Improve Cancer TherapyThe CMB is member of the Research Training Network project "Using Mathematical Modelling and Computer Simulation to Improve Cancer Therapy." The aim of the network is to develop the whole modelling process from phenomenological observation to simulation and validation, through the development of mathematical models and their qualitative and quantitative study, in order to simulate the different aspects of tumor dynamics within the full range of scales: sub-cellular, cellular and macroscopic. Developing mathematical models at all the scales mentioned requires making use of a wide variety of theoretical tools from a range of disciplines (e.g., continuum mechanics, kinetic theory, stochastic processes, system theory, compartmental models, multiphase systems) and developing different mathematical tools to obtain both qualitative and quantitative results.

From Individual to Collective Behaviour in EcologyWe are using various mathematical methods to investigate the relationship between the behaviour of individual animals and the dynamics of their population. A focus of this research work has been on social insects and honey bees in particular. We are interested in how insects use simple rules and local information to generate complex and functional patterns. More recent work has concentrated on applying these techniques to the dynamics of populations in ecological systems. We are working in collaboration with researchers in the Zoology department on the behaviour and dynamics of locust swarms.

Page 18: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is proteomics?• Tyers & Mann Nature. 2003.13;422(6928):193-7:

– “The term proteome was first coined to describe the set of proteins encoded by the genome.

– The study of the proteome, called proteomics, now evokes

• not only all the proteins in any given cell, • but also the set of all protein isoforms and modifications, • the interactions between them, • the structural description of proteins and their higher-order

complexes, • and for that matter almost everything 'post-genomic'."

Page 19: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"
Page 20: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is Pharmacogenetics?• All individuals respond differently to drug treatments; some

positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions.

• Much of this variation is known to have a genetic basis. • Pharmacogenetics is a subset of pharmacogenomics which

uses genomic/bioinformatic methods to identify genomic correlates,– for example SNPs (Single Nucleotide Polymorphisms),

characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies.

– Strikingly, such approaches have been used to "resurrect" drugs thought previously to be ineffective, but subsequently found to work with in subset of patients.

– They can also be used for optimizing the doses of chemotherapy for particular patients.

Page 21: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"
Page 22: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is Pharmacogenomics?

• Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets.

• Examples include trawling entire genomes for potential receptors by bioinformatics means,

• or by investigating patterns of gene expression in both pathogens and hosts during infection,

• or by examining the characteristic expression patterns found in tumors or patients samples for diagnostic purposes.

Page 23: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"
Page 24: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

What is Bioinformatics?The Loose definition

• There are other fields---for example medical imaging / image analysis which might be considered part of bioinformatics.

• There is also a whole other discipline of biologically-inspired computation; genetic algorithms, AI, neural networks.

• Example: Neural networks, inspired by crude models of the functioning of nerve cells in the brain, are used to predict surprisingly accurately, the secondary structures of proteins from their primary sequences.

• What almost all bioinformatics has in common is the processing of large amounts of biologically-derived information, whether DNA sequences or breast X-rays.

Page 25: Introduction to Bioinformaticsr.web.umkc.edu/roganp/course lectures/bioinformatics... · 2003-09-04 · "Classical" bioinformatics • Most biologists talk about "doing bioinformatics"

Example: Sequence Annotation of The Prader-Willi and Angelman Syndromes Critical Region on Human Chromosome 15 (http://www.genome.ucsc.edu )