Introduction Introduction to to Bioinformatics Bioinformatics Dr. Dr. Taysir Taysir Hassan Hassan A. A. Soliman Soliman Lecturer, Information Systems Dept. Lecturer, Information Systems Dept. Faculty of Computer and Information Sciences Faculty of Computer and Information Sciences Ain Ain Shams University Shams University
45
Embed
Dr. Taysir Hassan A. Soliman - Assiut University · Dr. Taysir. Hassan A. Soliman. Lecturer, Information Systems Dept. Faculty of Computer and Information Sciences . Ain. Shams University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Introduction toto
Bioinformatics Bioinformatics
Dr. Dr. TaysirTaysir HassanHassan A. A. SolimanSolimanLecturer, Information Systems Dept.Lecturer, Information Systems Dept.
Faculty of Computer and Information Sciences Faculty of Computer and Information Sciences AinAin Shams University Shams University
AgendaAgenda
Definition ofDefinition of BioinformaticsBioinformaticsProgress of the Human Genome ProjectProgress of the Human Genome ProjectThe NCBI Data Model The NCBI Data Model Protein Databases Protein Databases Useful texts Useful texts
Progress of the Human Genome Project From June 2000- April 2003
Bioinformatics Bioinformatics
Roughly,Roughly, BioinformaticsBioinformatics describes use of computers to handle describes use of computers to handle biological information.biological information.In practice:In practice: BioinformaticsBioinformatics = =
"The mathematical, statistical and computing methods that "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino aim to solve biological problems using DNA and amino acid sequences and related information.acid sequences and related information.““Distinction between Biomedical &Distinction between Biomedical & Bioinformatics Bioinformatics
Bioinformatics Bioinformatics As technologies improve, we are able to extract more and more As technologies improve, we are able to extract more and more information encoded in a genomeinformation encoded in a genome
biological data
proteins
genes
complexes
pathways
whole cell
community
bio-complexity
organs
Bioinformatics Bioinformatics
1. What is Bioinformatics?
Bioinformatics is the application of computational techniques to
Analyze
Info.
Large Macromolecules
Biological Databases
Organize Understand
Why should I care?Why should I care?
SmartMoneySmartMoney ranksranks BioinformaticsBioinformaticsas #1 among next as #1 among next HotJobsHotJobs
Business Week 50 Masters of Business Week 50 Masters of InnovationInnovation
Jobs available, exciting research Jobs available, exciting research potentialpotential
Important information waiting to Important information waiting to be decoded!be decoded!
Why isWhy is bioinformaticsbioinformatics important?important?
Supply/demand: few people Supply/demand: few people adequately trained in both adequately trained in both biology and computer biology and computer sciencescience
Genome sequencing, Genome sequencing, microarraysmicroarrays, etc lead to large , etc lead to large amounts of data to be amounts of data to be analyzedanalyzed
Leads to important Leads to important discoveries discoveries
Saves time and moneySaves time and money
What skills are needed?What skills are needed?
WellWell--grounded in one of the following areas:grounded in one of the following areas:Computer scienceComputer scienceMolecular biologyMolecular biologyStatisticsStatistics
Working knowledge and appreciation in the Working knowledge and appreciation in the others!others!
In other words,In other words,BioinformaticsBioinformatics integratesintegrates
So, what Computer Scientists doSo, what Computer Scientists doforfor BioinformaticsBioinformatics? ?
Computer scientists are responsible for Computer scientists are responsible for INTEGRATING INTEGRATING and and ANALYZINGANALYZING all literature from both patents and all literature from both patents and publications in publications in PubMEDPubMED (MEDLINE) in NCBI(MEDLINE) in NCBI
12 million records
What else?What else?
Computer scientists are responsible for developing tools for Computer scientists are responsible for developing tools for performing various operations, such as BLAST at NCBI performing various operations, such as BLAST at NCBI
Annotation of Genes on the Annotation of Genes on the SaccharomycesSaccharomyces Genome DatabaseGenome Database
Other Computer Scientists JobsOther Computer Scientists Jobs
Prediction of: Prediction of: EvolutionEvolutionProtein foldingProtein foldingProtein functionProtein function
Computer Computer scienctistsscienctists build mathematical build mathematical models of these processes models of these processes --to infer relationships between components of to infer relationships between components of complex biological systemscomplex biological systems
Protein Databases Protein Databases
BINDINTERACTQuery secondary
databases over the Internet
Computational sequence analysis
Proteins: Prediction of biochemical function
•Relationships between
DNA or amino acidsequence 3D structure protein functions
•Use of this knowledge for prediction of function, molecular modelling, and design (e.g., new therapies)
3’5’Cyclic Nucleotide PhosphodiesteroseDomain, Signal Transduction
CN4D_RAT
3’5’Cyclic Nucleotide PhosphodiesteroseDomain, Signal Transduction
CN4B_HUMAN
Homeobox Protein Antennapedia Type Domain, Nucleus, Regulation of Transcription DNA-Dependent
O42504
Homeobox Protein Antennapedia Type Domain, Nucleus, Regulation of Transcription DNA-Dependent
HXA7_HUMAN
Items Protein name
Examples of Generated Rules Examples of Generated Rules
100%15A protein, which belongs to Family: Mevalonate kinase it most likely contains Domain: GHMP kinases putative ATP-binding domain
R300
100%32A protein, which belongs to Family: G-protein coupled receptors family 3 (Metabotropic glutamate receptor-like) it most likely also contains Domain: Receptor Family ligand binding region
R200
100%32A protein, which belongs to Family: Xeroderma pigmentosum group G/yeast RAD Superfamily it most likely also contains Domain: 5`3`-Exonuclease N- and I- domains
R100
CSDescriptionRid
100%15If a protein is associated with Cellular component: extracellular it most likely contains Domain: Complement C3a, C4a and C5a anaphylatoxin
R600
100%97If a protein is associated with Biological process: potassium transport it most likely contains Domain: Potassium channel
R500
100%24If a protein is associated with Biological Process: defense response it most likely contains Domain: Selectin (CD62E/L/P antigens)
R400
CSDescriptionRid
How about Hardware: How about Hardware: The IBM Blue Gene SupercomputerThe IBM Blue Gene Supercomputer
•Will take on problem of protein folding.
•Protein folding is governed by basic rules of how atoms attract and repel each other, but size of proteins (thousands of atoms) makes it a very difficult problem.
•Will be able to fold a protein of up to 300 amino acids, but this will take a whole year of supercomputing time!
•One quadrillion calculations per second, ~1,000 times faster than Deep Blue that beat world chess champion Garry Kasparov in 1997.
•New architecture of more than one million CPUs. 32 CPUs per chip, bundled with computer memory.