Top Banner
Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW [email protected] C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E
44

Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW [email protected] C E N T.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Master’s course

Bioinformatics Data Analysis and Tools

Lecture 1: Introduction

Centre for Integrative BioinformaticsFEW/FALW

[email protected]

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

Page 2: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Course objectives

• There are two extremes in bioinformatics work– Tool users (biologists): know how to press the

buttons and know the biology but have no clue what happens inside the program

– Tool shapers (informaticians): know the algorithms and how the tool works but have no clue about the biology

Both extremes are dangerous, need a breed that can do both

Page 3: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Course objectives• How do you become a good bioinformatics

problem solver?– You need to know basic analysis and data mining

modes– You need to know some important backgrounds of

analysis and prediction techniques (e.g. statistical thermodynamics)

– You need to have knowledge of what has been done and what can be done (and what not)

• Is this enough to become a creative tool developer?– Need to like doing it– Experience helps

Page 4: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Contents (tentative dates)Date Lecture Title Lecturer

1 [wk 19] 07/05/07 Introduction Jaap Heringa  

2 [wk 19]10/05/07 Microarray data analysis Jaap Heringa  

3 [wk 20]14/05/07 Molecular simulations & sampling techniques

Anton Feenstra  

4 [wk 21] 22/05/07 Introduction to Statistical Thermodynamics I Anton Feenstra

5 [wk 21] 24/05/07 Introduction to Statistical Thermodynamics II Anton Feenstra  

6[wk 23] 05/06/07 Machine learning Elena Marchiori 

7[wk 23] 07/06/07 Clustering algorithms Bart van Houte

8[wk 24] 11/06/07 Support vector machines and feature selection in bioinformatics Elena Marchiori  

9[wk 24] 12/06/07 Databases and parsing Sandra Smit

10[wk 24] 14/06/07 Ontologies Frank van Harmelen

11[wk 25] 18/06/07 Benchmarking, parallelisation & grid computing Thilo Kielmann  

12[wk 25] 19/06/07 Method development I: Protein domain prediction Jaap Heringa13[wk 25] 21/06/07 Method development II Jaap Heringa 

Page 5: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

At the end of this course…

• You will have seen a couple of algorithmic examples

• You will have got an idea about methods used in the field

• You will have a firm basis of the physics and thermodynamics behind a lot of processes and methods

• You will have an idea of and some experience as to what it takes to shape a bioinformatics tool

Page 6: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Bioinformatics

“Studying informatic processes in biological systems”

(Hogeweg)

Applying algorithms and mathematical formalisms tobiology (genomics)

“Information technology applied to the management and analysis of biological data” (Attwood and Parry-Smith)

Page 7: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

This course

• General theory of crucial algorithms (GA, NN, HMM, SVM, etc..)

• Method examples• Research projects within own group

– Repeats– Domain boundary prediction

• Physical basis of biological processes and of (stochastic) tools

Page 8: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

BioinformaticsLarge - external(integrative) Science Human

Planetary Science Cultural Anthropology

Population Biology Sociology Sociobiology Psychology Systems Biology Biology Medicine

Molecular Biology Chemistry Physics

Small – internal (individual)

Bioinformatics

Page 9: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Genomic Data Sources

• DNA/protein sequence

• Expression (microarray)

• Proteome (xray, NMR,

mass spectrometry,

PPI)

• Metabolome

• Physiome (spatial,

temporal)

Integrative bioinformatics

Page 10: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Protein structural data explosion

Protein Data Bank (PDB): 14500 Structures (6 March 2001)10900 x-ray crystallography, 1810 NMR, 278 theoretical models, others...

Page 11: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

MathematicsStatistics

Computer ScienceInformatics

BiologyMolecular biology

Medicine

Chemistry

Physics

Bioinformatics

Bioinformatics inspiration and cross-fertilisation

Page 12: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Algorithms in bioinformatics• string algorithms• dynamic programming• machine learning (NN, k-NN, SVM, GA, ..)• Markov chain models• hidden Markov models• Markov Chain Monte Carlo (MCMC) algorithms• stochastic context free grammars• EM algorithms• Gibbs sampling• clustering• tree algorithms (suffix trees)• graph algorithms• text analysis• hybrid/combinatorial techniques and more…

Page 13: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Joint international programming initiatives

• Bioperlhttp://www.bioperl.org/wiki/Main_Pagehttp://bioperl.org/wiki/How_Perl_saved_human_genome

• Biopythonhttp://www.biopython.org/

• BioTclhttp://wiki.tcl.tk/12367

• BioJavawww.biojava.org/wiki/Main_Page

Page 14: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Integrative bioinformatics @ VUStudying informational processes at biological system

level• From gene sequence to intercellular processes

• Computers necessary

• We have biology, statistics, computational intelligence (AI), HTC, ..

• VUMC: microarray facility, cancer centre, translational medicine

• Enabling technology: new glue to integrate

• New integrative algorithms

• Goals: understanding cellular networks in terms of genomes; fighting disease (VUMC)

Page 15: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Bioinformatics @ VU

Progression:• DNA: gene prediction, predicting

regulatory elements, alternative splicing• mRNA expression• Proteins: (multiple) sequence alignment,

docking, domain prediction, PPI• Metabolic pathways: metabolic control• Cell-cell communication

Page 16: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Fold recognition by threading:THREADER and GenTHREADER

Query sequence

Compatibility scores

Fold 1

Fold 2

Fold 3

Fold N

Page 17: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Polutant recognition by microarray mapping:

Compatibility scores

Cond. 1

Cond. 2

Cond. 3

Cond. N

Contaminant 1

Contaminant 2

Contaminant 3

Contaminant N

Query array

Page 18: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

ENFIN WP4• Functional threading

• From sequence to function– Multiple alignment– Secondary structure prediction, Solvation prediction,

Conservation patterns, Loop enumeration

Page 19: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

ENFIN WP4• Functional threading

• From sequence to function– Multiple alignment– Secondary structure prediction, Solvation prediction,

Conservation patterns, Loop enumeration

D H S

Struct FuncDHS

DB of active site descriptors

Page 20: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

ENFIN WP5 - BioRange (Anton Feenstra)

• Protein-protein interaction prediction

• Mesoscopic modelling

• Soft-core Molecular Dynamics (MD)– Fuzzy residues– Fuzzy (surface) locations

Page 21: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

ENFIN WP6

• Silicon Cell – Database of fully parametrized pathway model

(differential equations) solver

• Jacky Snoep (Stellenbosch, VU/IBIVU)

• Hans Westerhoff (VU, Manchester)

Page 22: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Where are important new questions?

Page 23: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

New neighbouring disciplines• Translational Medicine

A branch of medical research that attempts to more directly connect basic research to patient care. Translational medicine is growing in importance in the healthcare industry, and is a term whose precise definition is in flux. In particular, in drug discovery and development, translational medicine typically refers to the "translation" of basic research into real therapies for real patients. The emphasis is on the linkage between the laboratory and the patient's bedside, without a real disconnect. This is often called the "bench to bedside" definition.

• Computational Systems BiologyComputational systems biology aims to develop and use efficient algorithms, data structures and communication tools to orchestrate the integration of large quantities of biological data with the goal of modeling dynamic characteristics of a biological system. Modeled quantities may include steady-state metabolic flux or the time-dependent response of signaling networks. Algorithmic methods used include related topics such as optimization, network analysis, graph theory, linear programming, grid computing, flux balance analysis, sensitivity analysis, dynamic modeling, and others.

• Neuro-informatics Neuroinformatics combines neuroscience and informatics research to develop and apply the advanced tools and

approaches that are essential for major advances in understanding the structure and function of the brain

Page 24: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Translational Medicine

• “From bench to bed side”

• Genomics data to patient data

• Integration

Page 25: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Natural progression of a gene

Page 26: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

TERTIARY STRUCTURE (fold)TERTIARY STRUCTURE (fold)

Genome

Expressome

Proteome

Metabolome

Functional GenomicsFunctional GenomicsFrom gene to functionFrom gene to function

Page 27: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.
Page 28: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.
Page 29: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Systems Biologyis the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behaviour of that system (for example, the enzymes and metabolites in a metabolic pathway). The aim is to quantitatively understand the system and to be able to predict the system’s time processes

• the interactions are nonlinear• the interactions give rise to emergent properties,

i.e. properties that cannot be explained by the components in the system

Page 30: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Systems Biologyunderstanding is often achieved through modeling and simulation of the system’s components and interactions.

Many times, the ‘four Ms’ cycle is adopted:

Measuring

Mining

Modeling

Manipulating

Page 31: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.
Page 32: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.
Page 33: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

A system response

Apoptosis: programmed cell death Necrosis: accidental cell death

Page 34: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Neuroinformatics

• Understanding the human nervous system is one of the greatest challenges of 21st century science.

• Its abilities dwarf any man-made system - perception, decision-making, cognition and reasoning.

• Neuroinformatics spans many scientific disciplines - from molecular biology to anthropology.

Page 35: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Neuroinformatics• Main research question: How does the brain and

nervous system work?• Main research activity: gathering neuroscience data,

knowledge and developing computational models and analytical tools for the integration and analysis of experimental data, leading to improvements in existing theories about the nervous system and brain.

• Results for the clinic: Neuroinformatics provides tools, databases, models, networks technologies and models for clinical and research purposes in the neuroscience community and related fields.

Page 36: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.
Page 37: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Bioinformatics @ VU

Qualitative challenges:• High quality alignments (alternative splicing)• In-silico structural genomics• In-silico functional genomics: reliable annotation• Protein-protein interactions.• Metabolic pathways: assign the edges in the

networks• Fluxomics, quantitative description (through time)

of fluxes through metabolic networks• New algorithms

Page 38: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Bioinformatics @ VUQuantitative challenges:• Understanding mRNA expression levels• Understanding resulting protein activity• Time dependencies• Spatial constraints, compartmentalisation• Are classical differential equation models adequate or do

we need more individual modeling (e.g macromolecular crowding and activity at oligomolecular level)?

• Metabolic pathways: calculate fluxes through time • Cell-cell communication: tissues, hormones, innervations

Need ‘complete’ experimental data for good biological model system to learn to integrate

Page 39: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Bioinformatics @ VU

VUMC

• Neuropeptide – addiction

• Oncogenes – disease patterns

• Reumatic diseases

Page 40: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

• Integrate data sources

• Integrate methods

• Integrate data through method integration (biological model)

Integrative bioinformatics

Page 41: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Data

Algorithm

BiologicalInterpretation

(model)

tool

Integrative bioinformaticsData integration

Page 42: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Integrative bioinformaticsData integration

Data 1 Data 2 Data 3

Page 43: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

Integrative bioinformaticsData integration

Data 1

Algorithm 1

BiologicalInterpretation

(model) 1

tool

Algorithm 2

BiologicalInterpretation

(model) 2

Algorithm 3

BiologicalInterpretation

(model) 3

Data 2 Data 3

Page 44: Master’s course Bioinformatics Data Analysis and Tools Lecture 1: Introduction Centre for Integrative Bioinformatics FEW/FALW heringa@few.vu.nl C E N T.

“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky (1900-1975))

“Nothing in Bioinformatics makes sense except in the light of Biology”

Bioinformatics