Bioinformatics: Bioinformatics: Knowledge Knowledge - - representation in molecular biology representation in molecular biology Sándor Pongor Protein Structure and Bioinformatics, ICGEB, Trieste
Bioinformatics: Bioinformatics:
KnowledgeKnowledge--representation in molecular biologyrepresentation in molecular biology
Sándor Pongor
Protein Structure and Bioinformatics, ICGEB, Trieste
Representation of biological knowledgeRepresentation of biological knowledge
Source: NCBI
BIBLIOGRPHY
0e+01e+62e+63e+64e+65e+66e+67e+68e+69e+61e+7
1965 1970 1975 1980 1985 1990 1995
Art
icle
s
NUCLEOTIDE SEQUENCES
0e+02e+54e+56e+58e+51e+61e+61e+62e+62e+62e+6
1965 1970 1975 1980 1985 1990 1995
Sequ
ence
s
BIBLIOGRAPHY-GENETICS
0e+01e+52e+53e+54e+55e+56e+57e+58e+59e+51e+6
1965 1970 1975 1980 1985 1990 1995
Art
icle
s
PROTEIN SEQUENCES
0e+0
1e+4
2e+4
3e+4
4e+4
5e+4
6e+4
7e+4
8e+4
1965 1970 1975 1980 1985 1990 1995
Sequ
ence
s
PROTEIN 3D STRUCTURES
0
1000
2000
3000
4000
5000
6000
7000
8000
1965 1970 1975 1980 1985 1990 1995
Stru
ctur
es
SWISS-PROT
GenBank PDB
PKRRSARLSA
MAPPED HUMAN GENES
0
5000
10000
15000
20000
25000
30000
35000
1965 1970 1975 1980 1985 1990 1995
Gen
es
Bioinformatics milestones 1Bioinformatics milestones 1
1962 1962 -- PaulingPauling’’s theory of molecular evolutions theory of molecular evolution1967 1967 -- Margaret Margaret Dayhoff'sDayhoff's Atlas of Protein Sequences Atlas of Protein Sequences 1970 1970 -- NeedlemanNeedleman--WunschWunsch algorithmalgorithm1977 1977 -- DNA sequencing and software to analyze it (DNA sequencing and software to analyze it (StadenStaden))1981 1981 -- The concept of a sequence motif (Doolittle)The concept of a sequence motif (Doolittle)1982 1982 -- Phage Phage labmdalabmda genomegenome1983 1983 -- Database search (WilburDatabase search (Wilbur--LipmanLipman))1985 1985 -- FASTP/FASTN: fast sequence similarity searchingFASTP/FASTN: fast sequence similarity searching1987 1987 -- Sequence profilesSequence profiles1987 1987 -- EMBL, EMBL, Genbank,SwissGenbank,Swiss--Prot databasesProt databases
Bioinformatics milestones 2Bioinformatics milestones 2
1988 1988 -- National Center for Biotechnology Information (US)National Center for Biotechnology Information (US)1988 1988 -- EMBnetEMBnet network for database distributionnetwork for database distribution1990 1990 -- BLAST: fast sequence similarity searchingBLAST: fast sequence similarity searching1991 1991 -- EST: expressed sequence tag sequencingEST: expressed sequence tag sequencing1993 1993 -- Sanger Centre, Sanger Centre, HinxtonHinxton, UK, UK1994 1994 -- EMBL European Bioinformatics Institute, EMBL European Bioinformatics Institute, HinxtonHinxton, UK, UK1995 1995 -- First bacterial genomesFirst bacterial genomes1996 1996 -- Yeast genomeYeast genome1997 1997 -- PSIPSI--BLASTBLAST1998 1998 -- Worm (Worm (multicellularmulticellular) genome ) genome 2000+ The rice and human genomes. 2000+ The rice and human genomes. MicroarraysMicroarrays, high throughput methods, new generation sequencing, high throughput methods, new generation sequencing……
The ingredientsThe ingredients
Data collection techniques (DNA sequencing, protein Data collection techniques (DNA sequencing, protein sequencing, microarrays)sequencing, microarrays)
Theoretical milestones (concepts of DNA structure, Theoretical milestones (concepts of DNA structure, protein structure, evolution)protein structure, evolution)
Algorithms and programs (BLAST, FASTA)Algorithms and programs (BLAST, FASTA)
DatabasesDatabases
InstitutionsInstitutions
Complex genomic and high throughput dataComplex genomic and high throughput data
NCBI, Washington DC
EBI, Hinxton, UK
The evolution of bioinformatics as seen in the 90’sThe evolution of bioinformatics as seen in the 90’s
??
Bioinformatics is an approach to biology…Bioinformatics is an approach to biology…
Systems theory
Cognitive sciences
COGNITIVE
SCIENCE
BIOINFORMATICSBIOINFORMATICS
BIOLOGICAL
DATA
INFORMATICS
Model, description and visualization
Why is bioinformatics important?Why is bioinformatics important?
““A paradigm shift in biology: from A paradigm shift in biology: from data collection to data processingdata collection to data processing””
““Biotechnology is the industrial use Biotechnology is the industrial use of biological informationof biological information””
Lee Hood, in The Economist, 1997
Walter Gilbert, Nature, 1991
Current trendsCurrent trends
Massive data processingMassive data processing
Massive generation of data: sequences (genomics) Massive generation of data: sequences (genomics) functions (functional genomics) and structures functions (functional genomics) and structures (structural genomics)(structural genomics)
Interpretation of data: data mining, data warehousing Interpretation of data: data mining, data warehousing techniquestechniques
Informatics strike back...
Structural genomicsStructural genomics
Classify proteins (Database of Classify proteins (Database of protein motifs)protein motifs)Choose and express Choose and express representative proteins from all representative proteins from all familiesfamiliesDetermine structure by XDetermine structure by X--ray or ray or NMRNMRPredict the rest by homology Predict the rest by homology modellingmodelling
Tom Terwilliger, Los Alamos National Labs
Gu et al, 1999
Bioinformatics
Functional genomicsFunctional genomics
Sequence complete genomeSequence complete genomeIdentify protein coding regionsIdentify protein coding regionsIdentify unique genesIdentify unique genesGene knockoutGene knockoutFunctional analysis Functional analysis (phenotype, detailed functional (phenotype, detailed functional characterization..)characterization..)Structural studies, drug Structural studies, drug developmentdevelopment
Bioinformatics
acaattgtaataggcgaacatgttacgcaaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgtcagtacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacaatatgttacaagtggtattgaggaactacattgtaacaacaattgtaataggcgaacatgttacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgttacgcaaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgtcagtacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacaatatgttacaagtggtattgaggaactacattgtaacaacaattgtaataggcattataagattat
aattgtaataggcattataagattatgcaaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgtcagtacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacaatatgttacaagtggtattgaggaactacattgtaacaacaattgtaataggcgaacatgttacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgttacgcaaagtggtattgaggaacattgtaacaacaattgtaataggcgaacatgtcagtacaagtggtattgaggaacattgtaacaacaattgtaataggcgaacaatatgttacaagtggtattgaggaactacattgtaacaacaattgtaataggcattataagattat
New data representationsNew data representations
Data (property) visualization
What bioinformatics is…What bioinformatics is…
USE
R
Data Analysis Interpretation
Processing of raw sequence data &
instrument output
Database maintenance
Biocomputing, biomathematics
Data management
nfrastructure
Research
Bioinformatics: managing information for the life sciences
Bioinformatics: managing information for the life sciences
For: Biomedicine, AgricultureFor: Biomedicine, Agriculture
In: Academic and Industrial Research and In: Academic and Industrial Research and Development, Medical PracticeDevelopment, Medical Practice
Scientific Infrastructure (service)Scientific Infrastructure (service)
Advanced Informatics (research)Advanced Informatics (research)
Education (biologists vs. Education (biologists vs. informaticiansinformaticians))
An independent field of study but a general approach to biology
Current challenges – the three gapsCurrent challenges – the three gaps
Understanding (Understanding (““annotatingannotating””) new data: ) new data: ““annotationalannotational gapgap””
Translating data to practice: personalized medicine, Translating data to practice: personalized medicine, epidemiesepidemies…… ““translational gaptranslational gap””
Making users (biologists, medical doctors) Making users (biologists, medical doctors)
aware of what is there and how to use itaware of what is there and how to use it……
((““communication gapcommunication gap””))
Protein Structure and BioinformaticsProtein Structure and Bioinformatics
Established as a resource Established as a resource group for protein chemistry group for protein chemistry and protein engineeringand protein engineering
In charge of bioinformatics In charge of bioinformatics services since 1991services since 1991
Research projects on Research projects on bioinformatics, structural bioinformatics, structural biology, systems modelingbiology, systems modeling
Currently includes 12 Currently includes 12 researchers and studentsresearchers and students
ICGEB bioinformaticsICGEB bioinformatics
Biological computing service Biological computing service for 800 users from 47 for 800 users from 47 countriescountries
22--3 training courses per year, 3 training courses per year, 1400+ students in 18 years...1400+ students in 18 years...
Methods development Methods development (classification, machine (classification, machine learning)learning)
WWWWWW--services: DNAservices: DNA--tools, tools, protein domain identificationprotein domain identification
EMBnet: A world wide network of bioinformatics
EMBnet: A world wide network of bioinformatics
32 national nodes, 35,000 registered users.
12 specialist nodes including all major European database producers.
Includes China, India, Australia
Education: High level courses organised in member countries. WWW-tutorials.
A coordinated network of bioinformatics services, a global technical and educational resource
of bioinformatics centres world wide
www.embnet.org
Bioinformatics summer course
-One of the loldest continuous teaching traditions in Europe, over 1300 students since 1991
- Introduction to theory and practice of bioinformatics
- Contact with the centers of (NCBI, EBI, SwissProt, KEGG)
Protein Structure and Bioinformatics
A take-home messageA take-home message
Bioinformatics is a a general approach (paradigm) in Bioinformatics is a a general approach (paradigm) in biology today.biology today.
A A bioinformaticianbioinformatician has to understand has to understand
The biological question, the biological modelThe biological question, the biological model
The dataThe data--collection technology, the data modelcollection technology, the data model
The mathematics/statistics of dataThe mathematics/statistics of data--evaluationevaluation
This is why we have this course This is why we have this course
Thank you for your attention…
Bioinformatics: Bioinformatics:
KnowledgeKnowledge--representation in molecular biologyrepresentation in molecular biology
Sándor Pongor
Protein Structure and Bioinformatics, ICGEB, Trieste
An overview of bioinformaticsAn overview of bioinformatics
History and development History and development Model, description and visualizationModel, description and visualization•• SequencesSequences•• 3D structures3D structures•• NetworksNetworks•• Text (abstracts)Text (abstracts)
Similarity and classification: Similarity and classification: •• similarity measures (structured, unstructured)similarity measures (structured, unstructured)•• database searchdatabase search•• consensus descriptionsconsensus descriptions
Integrated resources Integrated resources
The subjects: Molecular structuresThe subjects: Molecular structures
MARTKQTARKSTGGKAPRKQLATKAARKSA
Sequences
CIPKWNRCGPKMDGVPCCEPYTCTSDYYGNCS
Extended sequences(e.g. disulphide-topologies)
Domain-cartoons(sec. str. cartoons)
Diagrams (hydrophobicity plots, helical circles) 3D cartoons
3D structures
A structural modelA structural model
Structure
Substructures Relationships
Entity-relationship model Pongor, Nature, 1987
Susbstructures, relations, rules = onthology
Structures As Database RecordsStructures As Database Records
IdentificationName of proteinOrganismFunctionCross-references...Domain structureSec. structureDisulphides….
Sequence (structure)qfinetdttvivtwtpprarivgyrltvgllseegdepqyldlpstatsvnipdllpgrkytvnvyeiseegeqnlilstsqttapdappdptvdqvddtsivvrwsrprapitgyrivyspsvegsstelnlpetansvtlsdlqpgvqynitiyaveenqestpvfiqqettgvprsdkvppprdlqfvevtdvkitimwtppespvtgyrvdvipvnlpgehgqrlpvsrntfaevtglspgvtyhfkv
ANNOTATIONS
SEQUENCEOR STRUCTURE
CIPKWNRCGPKMDGVPCCEPYTCTSDYYGNC
Database record, fields
The subjects of bioinformaticsThe subjects of bioinformatics
ModelsModels
(knowledge)(knowledge)
Stored data = Stored data = descriptions for descriptions for
computerscomputers
Visualization, text = Visualization, text = simplified descriptions simplified descriptions
for humansfor humans
SEQUENCESSEQUENCES
SEQUENCESSEQUENCES
Model: Chemical Model: Chemical structurestructure
Description: Series Description: Series of charactersof characters
Simplified and/or Simplified and/or extended extended visualizationvisualization
IFPPVPGP
domain1 Binding site
Sequences as languageSequences as language
qfinetdttvivtwtpprarivgyrltvgllseegdepqyldlpstatsvnipdllpgrkytvnvyeiseegeqnlilstsqttapdappdptvdqvddtsivvrwsrprapitgyrivyspsvegsstelnlpetansvtlsdlqpgvqynitiyaveenqestpvfiqqettgvprsdkvppprdlqfvevtdvkitimwtppespvtgyrvdvipvnlpgehgqrlpvsrntfaevtglspgvtyhfkvfavnqgreskpltaqqatkldaptnlqfinetdttvivtwtpprarivgyrltvgltrggqpkqynvgpaasqyplrnlqpgseyavslvavkgnqqsprvtgvfttlqplgsiphyntevtettivitwtpaprigfklgvrpsqggeaprevtsesgsivvsgltpgveyvytisvlrdgqerdapivkkvvtplspptnlhleanpdtgvltvswersttpditgyritttptngqqgysleevvhadqssctfenlspgleynvsvytvkddkesvpisssfvvswvsasdtvsgfrveyelseegdepqyldlpstatsvnipdllpgrkytvnvyeisee
Query (name, length, self-score)
From To
HSP
Pattern
Score
L HSP, j
L HSP, i
From To
Subject (name, length, self-score)
LANGUAGE
Character strings, computer-languages, Chomsky et al, etc.
Alignments
3D STRUCTURES3D STRUCTURES
Chimie dans l’espaceChimie dans l’espace
Van t’Hoff1852-1911
1898
Some molecules are more equal then others…Some molecules are more equal then others…
…”This figure is purely diagrammatic. The two ribbons symbolize the the phosphate-sugar chains, and the horizontal rods the pairs of the bases holding the chains together. The vertical line marks the fibre axis”
Protein modelsProtein models3D OBJECTS
3D structures3D structures
Model: 3D chemical Model: 3D chemical structures structures
Description: 3D Description: 3D coordinatescoordinates
Simplified and/or Simplified and/or extended extended visualizationvisualization
(xi, yi, zi)n
!!!??
Surface, backbone
NETWORKSNETWORKS
Small molecules – classical graphsSmall molecules – classical graphs
Loschmidt, 1861 Kekulé, 1865
Crum Brown, 1861 Cayley, 1872
Van’t Hoff, 1898
TOPOLOGIES, GRAPHS
Genomes, assembliesGenomes, assemblies
Entity-relationship modelsTopological meta-modelsEntity-relationship modelsTopological meta-models
Similarity group Neighbourhood
Genome Metabolicpathway
Genetic network Food network
Tree-hierarchyComplexes
TOPOLOGIES, GRAPHS
The transcription regulatory networksThe transcription regulatory networks+ (up)- (down)
E. coli S. cerevisiae
TOPOLOGIES, GRAPHS
TEXTS (article abstracts)TEXTS (article abstracts)
The language of bibliographiesThe language of bibliographiesLANGUAGE
Structures As Database RecordsStructures As Database Records
IdentificationName of proteinOrganismFunctionCross-references...Domain structureSec. structureDisulphides….
Sequence (structure)qfinetdttvivtwtpprarivgyrltvgllseegdepqyldlpstatsvnipdllpgrkytvnvyeiseegeqnlilstsqttapdappdptvdqvddtsivvrwsrprapitgyrivyspsvegsstelnlpetansvtlsdlqpgvqynitiyaveenqestpvfiqqettgvprsdkvppprdlqfvevtdvkitimwtppespvtgyrvdvipvnlpgehgqrlpvsrntfaevtglspgvtyhfkv
ANNOTATIONS
SEQUENCEOR STRUCTURE
CIPKWNRCGPKMDGVPCCEPYTCTSDYYGNC
Database record, fields
Keyword-collecttions, onthologies, etc.
Texts (abstracts)Texts (abstracts)
Model: ?? Model: ??
Description: structured Description: structured files (records, fields), files (records, fields), standardized languagestandardized language
Simplified and/or Simplified and/or extended visualizationextended visualization
ModelsModels
A structural modelA structural model
Structure
Substructures Relationships
Entity-relationship mPongor, Nature, 19
SEQUENCES 3-D NETWORKS
tassfvvswvsasdtvsgfrveyelseegdepqyldlpstatsvnipdllpgrkytvnvyeiseegeqnlilstsqttapdappdptvdqvddtsivvrwsrprapitgyrivyspsvegsstelnlpetansvtlsdlqpgvqynitiyaveenqestpvfiqqettgvprsdkvppprdlqfvevtdvkitimwtppespvtgyrvdvipvnlpgehgqrlpvsrntfaevtglspgvtyhfkvfavnqgreskpltaqqatkldaptnlqfinetdttvivtwtpprarivgyrltvgltrggqpkqynvgpaasqyplrnlqpgseyavslvavkgnqqsprvtgvfttlqplgsiphyntevtettivitwtpaprigfklgvrpsqggeaprevtsesgsivvsgltpgveyvytisvlrdgqerdapivk
TEXT
An overview of bioinformaticsAn overview of bioinformatics
History and development History and development Model, description and visualizationModel, description and visualization•• SequencesSequences•• 3D structures3D structures•• NetworksNetworks•• Text (abstracts)Text (abstracts)
Similarity and classification: Similarity and classification: •• similarity measures (structured, unstructured)similarity measures (structured, unstructured)•• database searchdatabase search•• consensus descriptionsconsensus descriptions
Integrated resources Integrated resources
The concept of similarity IThe concept of similarity I
...easier if modular
Shared parts Shared context
The concept of similarity IIThe concept of similarity II
…Easy for humans, hard for computers
Multiple ObjectsMultiple Objects
Similarity groupsor neighborhoods
Metabolic pathwaysSubunit structures,
ligands Genomes
Evolutionary trees
Trajectories
CGPK-MDGVPCCEPYCGGQNWSGPTCCASGCSPTSYN---CCR--CSRLMY---DCCT--CIPYYL---DCCEPL
Multiple alignments
CGPK-MDGVPCCEPYCGGQNWSGPTCCASGCSPTSYN---CCR--CSRLMY---DCCT--CIPYYL---DCCEPL
Structural similarity
Context (function)
Shared context
Similarity of moleculesSimilarity of molecules
Shared parts
Shared relations
Quantitative comparisonQuantitative comparison
Unstructured models Structured models
Typical form: numbers, lists
vectors (x1, x2,…x3)
Similarity scoreClustering, classification etc.
Alignment (matching)
Similarity score
Typical form: sequences, networks etc.
Clustering, classification etc.
Quantification of sequence similarity: sequence alignment and its scoring
Quantification of sequence similarity: sequence alignment and its scoring
Mismatch Gap
Range of Alignment
ATTGTCAAAGACTTGAGCTGATGCAT|||| ||| ||||
GGCAGACATGA-CTGACAAGGGTATCG
Score = sum contributions of matches subtract penalties for mismatches
Substructure identity ~ similaritySubstructure identity ~ similarity
”The similarity of objects can be best described aspartial identities of components and relationships
Erich Goldmeier, The similarity of perceived forms, 1936
Twighlight zone
Using similarity: Comparing one sequence with a group (database)
Using similarity: Comparing one sequence with a group (database)
Ranked list of best similarities
1
2
3
4
SEQUENCE SCORE DESCRIPTION
SWISSALL:IAAI 457.36 ALPHA-AMYLASE INHIBITOR AAI. 2/9
SWISSALL:O426 152.82 CELLULOSE BINDING PROTEIN
SWISSALL:GUX 145.77 EXOGLUCANASE I PRECURSOR
SWISSALL:Q126 145.66 CELLULASE (EC 3.2.1.91)
Similarities??
EXPECTation Threshold(E parameter)
|V Observed Counts-->
10000 6336 1688 |============================================================6310 4648 1618 |=========================================================3980 3030 886 |===============================2510 2144 706 |=========================1580 1438 438 |===============1000 1000 272 |=========631 728 185 |======398 543 141 |=====251 402 103 |===158 299 63 |==100 236 43 |=
63.1 193 15 |:39.8 178 18 |:25.1 160 17 |:15.8 143 7 |:
>>>>>>>>>>>>>>>>>>>>> Expect= 10.0, Observed= 136 <<<<<<<<<<<<<<<<<10.0 136 2 |:6.31 134 3 |:3.98 131 2 |:2.51 129 2 |:1.58 127 0 |1.00 127 1 |:0.63 126 0 |0.40 126 4 |:0.25 122 0 |0.16 122 0 |0.10 122 0 |
0.063 122 0 |0.040 122 0 |0.025 122 0 |0.016 122 1 |:0.010 121 0 |
0.0063 121 1 |:0.0040 120 0 |0.0025 120 1 |:
BLAST program
Using similarity: comparing a group with itselfUsing similarity: comparing a group with itself
Similarity groupor neighbourhood
CGPK-MDGVPCCEPYCGGQNWSGPTCCASGCSPTSYN---CCR--CSRLMY---DCCT--CIPYYL---DCCEPL
Multiple alignment
Mathematical consensusfor database search
Regular expressionsConsensus sequenceFrequency matrixMarkov chainsNeural networksetc.
Publish
Nature
CLUSTAL program
Similarities: a practical overviewSimilarities: a practical overview
tassfvvswvsasdtvsgfrveyelseegdepqyldlpstatsvnipdllpgrkytvnvyeiseegeqnlilstsqttapdappdptvdqvddtsivvrwsrprapitgyrivyspsvegsstelnlpetansvtlsdlqpgvqynitiyaveenqestpvfiqqettgvprsdkvppprdlqfvevtdvkitimwtppespvtgyrvdvipvnlpgehgqrlpvsrntfaevtglspgvtyhfkvfavnqgreskpltaqqatkldaptnlqfinetdttvivtwtpprarivgyrltvgltrggqpkqynvgpaasqyplrnlqpgseyavslvavkgnqqsprvtgvfttlqplgsiphyntevtettivitwtpaprigfklgvrpsqggeaprevtsesgsivvsgltpgveyvytisvlrdgqerdapivk
SEQUENCES 3D NETWORKS
Bulk “Glycine-rich” “α-helical” “scale-free”
Substructure-alignment
Motifs G-RR
(metabolic pathways)
PAPERS
“genomics”
same author, common
references
“Joe Doe, folding”
An overview of bioinformaticsAn overview of bioinformatics
History and development History and development Model, description and visualizationModel, description and visualization•• SequencesSequences•• 3D structures3D structures•• NetworksNetworks•• Text (abstracts)Text (abstracts)
Similarity and classification: Similarity and classification: •• similarity measures (structured, unstructured)similarity measures (structured, unstructured)•• database searchdatabase search•• consensus descriptionsconsensus descriptions
Integrated resources Integrated resources
Biological knowledge as a network of data
Text (keyword)Similarity
Taxonomic Similarity
NucleotideSequence Similarity
Protein Sequence Similarity
Structural Similarity
Nucleotide sequences
Protein sequences
3-D Structure
3 -D Structure
Bibliograpy
Genomes
Phylogeny(Taxonomy)
actactgagaacat
MSLLDHRGDRGD
The world according to a PC...
Source: NCBI
Search on a preprocessed, integrated database:the importance of a good neighbourhood
Search on a preprocessed, integrated database:the importance of a good neighbourhood
Unknown DNA query
+
DNA
Proteins3D Structures
Literature, abstracts
Blast
Derived protein sequence
+
Blast Oops!
Models are human constructs...Models are human constructs...
THIS IS NOT A PIPE!
Models are human constructs...Models are human constructs...
THIS IS NOT A MOLECULE
RNADNA Protein
The central dogma:
Dogma, paradigm, mythology
RNA
Metabolites
DNA Protein
Growth rateExpression
Interactions
Polymers: Initiate, elongate, terminate, fold, modify, localize, degrade
New central dogma:Self-assembly, catalysis, replication, networks
Evolution + Self assembly, Systems biology
Summary of topics discussedSummary of topics discussed
History and development History and development
Model, description and visualizationModel, description and visualization•• SequencesSequences•• 3D structures3D structures•• NetworksNetworks•• Text (abstracts)Text (abstracts)
Similarity and classification: Similarity and classification: •• similarity measures (structured, unstructured)similarity measures (structured, unstructured)•• database searchdatabase search•• consensus descriptionsconsensus descriptions
Integrated resources Integrated resources
Summary of the introductionSummary of the introduction
Bioinformatics is the science of biological information or ratheBioinformatics is the science of biological information or rather a r a computercomputer--based approach to biological problems.based approach to biological problems.
All kinds of biological data are All kinds of biological data are structures defined with entities structures defined with entities and relationshipsand relationships (metabolites, genes, networks).(metabolites, genes, networks).
Typical tasks: Similarity search, categorization and clusteringTypical tasks: Similarity search, categorization and clustering
Simultaneous handling of many, complex dataSimultaneous handling of many, complex data--typestypes
On-line help to this lectureOn-line help to this lecture
Bioinformatics tutorials onBioinformatics tutorials on--linelinehttp://www.ebi.ac.uk/2can/http://www.ebi.ac.uk/2can/homehome.html.html
ICGEBnetICGEBnethttp://www.icgeb.org/~netsrv/http://www.icgeb.org/~netsrv/
The Trieste bioinformatics courseThe Trieste bioinformatics coursehttp://http://www.icgeb.org/~netsrv/netcourse.htmlwww.icgeb.org/~netsrv/netcourse.html
Reading about bioinformaticsReading about bioinformatics
In depth introduction
Genomics research problems Math principles
Evolutionary principles
Computer methods in Molecular BiologyTrieste June 21 - 26, 2010
Computer methods in Molecular BiologyTrieste June 21 - 26, 2010
Theoretical overview: STheoretical overview: Sáándorndor PongorPongor
Sequence database searching, theory and practice (Dave Judge andSequence database searching, theory and practice (Dave Judge and Jack Jack LeunissenLeunissen))
Nucleic acid databases, Medline, Nucleic acid databases, Medline, PubmedPubmed (David Landsman)(David Landsman)
Functional genomics databases, KEGG (Minoru Functional genomics databases, KEGG (Minoru KanehisaKanehisa) )
EBI Services (Jim Watson)EBI Services (Jim Watson)
Protein databases, Protein databases, SwissprotSwissprot, , PrositeProsite (Marie(Marie--Claude Claude BlatterBlatter))
Genome analysis (Martin Bishop)Genome analysis (Martin Bishop)