Top Banner
Bioinformatics History and Introduction Luce Skrabanek ICB, WMC January 28, 2010 http://chagall.med.cornell.edu/BioinfoCourse/
55

Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Mar 29, 2018

Download

Documents

NguyễnKhánh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

BioinformaticsHistoryandIntroduction

LuceSkrabanekICB,WMC

January28,2010

http://chagall.med.cornell.edu/BioinfoCourse/!

Page 2: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

WhatISbioinformatics?

•  Currentdefinitionsvarywidely:–  Thetermbioinformaticsisusedtoencompassalmostallcomputerapplicationsinbiological

sciences,butwasoriginallycoinedinthemid‐1980sfortheanalysisofbiologicalsequencedata.(AttwoodandParry‐Smith,1999)

–  Theuseofcomputersinsolvinginformationproblemsinthelifesciences,mainly,itinvolvesthecreationofextensiveelectronicdatabasesongenomes,proteinsequences,etc.Secondarily,itinvolvestechniquessuchasthethree‐dimensionalmodelingofbiomoleculesandbiologicsystems.(21Mar1998,CancerWEB)

–  “Idonotthinkallbiologicalcomputingisbioinformatics,e.g.mathematicalmodellingisnotbioinformatics,evenwhenconnectedwithbiology‐relatedproblems.Inmyopinion,bioinformaticshastodowithmanagementandthesubsequentuseofbiologicalinformation,inparticulargeneticinformation.”(RichardDurbin,HeadofInformaticsattheSangerCenter)

–  Thestorage,manipulationandanalysisofbiologicalinformationviacomputerscience.Bioinformaticsisanessentialinfrastructureunderpinningbiologicalresearch(theRoslinInstitute)

–  Atthebeginningofthe“genomicrevolution”,abioinformaticsconcernwasthecreationandmaintenanceofadatabasetostorebiologicalinformation,suchasnucleotideandaminoacidsequences.[…]Thefieldofbioinformaticshasevolvedsuchthatthemostpressingtasknowinvolvestheanalysisandinterpretationofvarioustypesofdata,includingnucleotideandaminoacidsequences,proteindomains,andproteinstructures.(NCBI)

–  Theapplictionofcomputationalsciences(computerscience,mathematics,statistics)toadvanceresearchinthelifesciences(agriculture,basicbiology,medicine).(U.ofTrieste)

Page 3: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Evenjobtitlesundecided

•  Abioinformaticistisanexpertwhonotonlyknowshowtousebioinformaticstools,butalsoknowshowtowriteinterfacesforeffectiveuseofthetools.

•  Abioinformatician,ontheotherhand,isatrainedindividualwhoonlyknowstousebioinformaticstoolswithoutadeeperunderstanding.

•  Thus,abioinformaticististo*.omicsasamechanicalengineeristoanautomobile.Abioinformaticianisto*.omicsasatechnicianistoanautomobile.

BioinformaticsWeb

Page 4: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Notjust“informatics”•  Bioinformaticsisthefieldofscienceinwhichbiology,computer

science,mathematicsandinformationtechnologymergeintoasinglediscipline.Theultimategoalofthefieldistoenablethediscoveryofnewbiologicalinsightsaswellastocreateaglobalperspectivefromwhichunifyingprinciplesinbiologycanbediscerned.Therearethreeimportantsub‐disciplineswithinbioinformatics:–  thedevelopmentofnewalgorithmsandstatisticswithwhichtoassess

relationshipsamongmembersoflargedatasets–  theanalysisandinterpretationofvarioustypesofdataincluding

nucleotideandaminoacidsequences,proteindomains,andproteinstructures

–  thedevelopmentandimplementationoftoolsthatenableefficientaccessandmanagementofdifferenttypesofinformation.

•  Needtohavebiologicalknowledgetoknowwhatquestionstoask

Page 5: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

BagofTools

•  Bioinformaticsisinterdisciplinary•  Synthesisoftoolsfrommanyfields

–  Biology–  Computerscience– Mathematics/Statistics

•  (Usually)don’thavetoreinventthewheel

Page 6: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

MargaretDayhoff(1925‐1983)

Page 7: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

HwaALim

PaulBerg(1926‐)

Page 8: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation
Page 9: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Typesofdataavailable

•  Enormousamountsofdataavailablepublicly–  DNA/RNAsequence–  SNPs–  proteinsequence–  proteinstructure–  proteinfunction–  organism‐specificdatabases–  genomes–  geneexpression–  biomolecularinteractions–  molecularpathways–  scientificliterature–  diseaseinformation

Page 10: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

http://www.ncbi.nlm.nih.gov/

The blue area shows the total number of bases in GenBank excluding those from whole genome shotgun (WGS) sequencing projects. The checkered area shows only the non-WGS portion.

In release 175.0, there are now over 110 billion bases in GenBank, and almost 160 billion bases in the WGS division.

Page 11: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

GrowthofPDB

http://www.rcsb.org/pdb/

Page 12: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Thebadnews

•  Hugenumbersoferrorsinthedatabases–  wrongpositionsofgenes–  exon‐intronboundaryerrors–  contaminatingsequences

–  sequencediscrepancies/variations–  frameshifterrors

–  annotationerrors–  spellingmistakes–  incorrectlyjoinedcontigs

Mousebuild32

Page 13: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Findingbioinformaticsresources

•  Google!

•  Databases:–  http://www.expasy.org/links.html

•  Programs:–  siteswithcompendia,e.g.,http://www.bioinformatik.de/cgi‐bin/

browse/Catalog/Software/Online_Tools/

•  Literaturesearches

Page 14: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Programs

• Mustknowatleasttheprinciplesbehindtheprograms

•  Don’tjusttreatthemasablackbox

•  Tounderstandtheresults,theusershouldhavesomeideaof:–  howtheywork– whatassumptionstheymake

Page 15: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Somecommonanalysistools

•  Homologysearching(e.g.,BLAST)

•  Sequencealignment(e.g.,ClustalW)•  Phylogenetics(e.g.,PHYLIP)•  Functionalpatterns(e.g.,HMMER)•  Geneprediction(e.g.,GenScan)•  Regulatoryregionanalysis(e.g.,MatInspector)

•  RNAstructure(e.g.,UniFold)•  Proteinstructure(e.g.,JPred)

Page 16: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Scalability

•  Hugevolumesofdataavailabletous–  Completegenomes,NGS

•  Necessarycomputationalresourcesnowavailabletodealwiththeseamountsofdata–  8GB(~humangenome)canbestoredonaniPod

–  Treeoflifecanbestoredin1TB–  Rawdatafrom1NGSexperiment=1TB

•  Toolsandtechniqueshavetobeefficientandscalable

Page 17: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

•  Hugeamountsof‘parts’data–  Sequence‐nucleotideandprotein–  Structure–  Function–  Biochemicalinformation

–  Protein‐proteininteractions,complexes

–  Protein‐DNAcomplexes

–  Kineticsofreactions Integratedtogetherinto“SystemsBiology”

•  Thestudyoftheinteractionsbetweenthecomponentsofabiologicalsystem

•  Howthoseinteractionsgiverisetothefunctionandbehaviorthatwesee

Wheredowegofromhere?

Page 18: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Mathematicalmodeling

•  BiologicalsystemscanberepresentedbyODEs–  compartments–  stochasticmethodsforlowconcentrationcomponents

•  Systemsmodelingcan:–  effectivelyintegrate“parts”information–  helprevealnon‐intuitiveproperties–  teachushowcellsstoreinformationand‘compute’

•  Quantitativemodelsofpathwaysandnetworks–  predictcellularresponsestoexternalstimuli–  modeleffectsofperturbationsonthesystem–  predicthowto‘correct’diseasestates

•  identifycontrolpointsinthesystem

Page 19: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

RaviIyengarlab

Page 20: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Protein-protein interaction networks in the Drosophila melanogaster cell

Giotetal,Science,2003

Page 21: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Recentexample

•  miRNAsdiscoveredin1993inC.elegans–  Aberration?–  Oneofthosestrangewormphenomena?

–  Thenwasfoundtobeconservedinotherorganisms•  Bioinformaticsmethodsused

–  Alignments–  RNAsecondarystructure&freeenergy–  Scoring–  Conceptofapipeline

•  Noteinterplaybetweenwetanddrylabs

Page 22: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

MicroRNAbackground•  MicroRNAs(miRNAs)

–  Short(21‐22nt)sequences–  Involvedinregulationbytranslationinhibition–  Tendtobetissue‐ordevelopmentalstage‐specific–  SimilarinsomewaystosiRNAs

•  LongprimarymiRNAs(pri‐miRNA,possibly1000sofnt)transcribedfrommiRNAgenebyRNApolIIorRNApolIII

•  Pre‐miRNA(70nt)createdfrompri‐miRNAbyDroshaandPasha•  MaturemiRNAcreatedfrompre‐miRNAbyDicer•  Perfectornear‐perfecttargetcomplementarityleadstotranscriptdegradation

–  lin4,let7:firstmiRNAandtargetsdiscovered(inC.elegans)–  Conservedacrossspecies–  50%ofcasesfoundwithinintronsofgenes(alsofoundinprotein‐

codingandintergenicregions)–  miRNAgenesoftenfoundtobeclustered,transcribedas

polycistrons

Page 23: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

miRNA

3’UTR

adaptedfromLietal,MammGenome2009

Page 24: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Basicresearchquestions

1.  HowcanweidentifynewmiRNAs?–  Initiallydoneexperimentallybydirectcloning

ofshortRNAmolecules–  Resultsdominatedbyafewhighlyexpressed

miRNAs

2.  Howcanwefindtheirtargetsites?

3.  HowaremiRNAgenesregulated?

Page 25: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

•  DiscovermiRNAsinDrosophila

•  DiscovermiRNAtargetsinDrosophila

•  DiscovermiRNAtargetsinmammals

Page 26: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Notegeneralmethodology

•  Formulatehypothesis•  Developmodelincorporatingbackgroundknowledge

•  Runanalysis•  Validateresults•  Refinehypothesis/model

Page 27: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

•  DiscovermiRNAsinDrosophila

•  DiscovermiRNAtargetsinDrosophila

•  DiscovermiRNAtargetsinmammals

Page 28: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

1.IdentifyingnewmiRNAsinDrosophila

• miRNAscreatedbyDicerfrompre‐miRNA•  pre‐miRNA:~70ntwhichformsalonghairpin‐shapedstem‐loop

•  Pre‐miRNA,miRNAconservedacrossspecies

•  Highminimalfoldingfreeenergy• Moredifferencesareallowedintheloopregionthaninthestem

Page 29: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

FindingnewmiRNAs:miRseeker

Identify conserved genomic regions!(between Drosophila melanogaster and Drosophila pseudooscura)!

Identify and rank stem-loop structures!(look at both forward and reverse complement of sequence)!

Evaluate pattern of divergence of potential miRNAs!

Add evidence from a third organism!(Anopheles gambiae)!

Laietal,GenomeBiology,200324 Drosophila pre-miRNAs!

Page 30: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Identifyingconservedgenomicregions

•  Alignrepeat‐maskedD.melanogastergenomiccontigswithD.pseudoobscuracontigs

•  Eliminateallannotatedsequences:–  Removeexons,transposableelements,snRNA,snoRNA,tRNA,rRNA

•  51.3/90.2Mbofintronicandintergenicsequencealigned

Page 31: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Identifystem‐loopstructures

•  RNAsecondarystructureprogram(MFOLD)usedtodetectstem‐loopstructures–  Lookatlongesthelical(paired)arm–  Calculatefreeenergyofarm–  Penalizeinternalloopsofincreasingsize–  Penalizeasymmetricloopsandbulgednucleotides

Page 32: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Laietal,GenomeBiology,2003

Page 33: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Evaluationofstem‐loops

Page 34: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Validation

•  UsingNorthernblots•  Ofthe124tophits

–  18arereferencesetmembers–  24werevalidated–  14werefalsepositives

•  ExpressionprofilesandabundanceofcomputationallyderivedmiRNAsmuchmoreheterogeneousthanthosediscoveredexperimentally

•  EstimatedthatDrosophilidgenomesmaycontain~110miRNAs

Referenceset(15%)

Validated(TP)(19%)

Notvalid(FP)(11%)

Untested(55%)

Page 35: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Computationalapproachsummary

•  Sequence/structureconservation‐based– Heavilydependentonuseofconservationtofilterout“uninteresting”hairpins

• Machine‐learning(SVM,HM,NB)–  Featureclassifiersthatdistinguishbetweenapositiveandnegativetrainingset

•  Experimentaldata‐driven– Nextgeneration“deepsequencing”

Page 36: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

•  DiscovermiRNAsinDrosophila

•  DiscovermiRNAtargetsinDrosophila

•  DiscovermiRNAtargetsinmammals

Page 37: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

2.  IdentifyingmiRNAtargets

•  Backgroundknowledge:–  Ofteninthe3’UTR(unlikeinplants,wheretheyarepredominantlyinthecodingregion)

–  Thefirst8‐10ntaremoreimportantindeterminingbindingthanthelast12‐14

–  TendtobelesscomplementarytotheirtargetsthanplantmiRNAs

–  Targetsitestendtobeconservedacrossspecies

Page 38: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

PipelinetoidentifymiRNAtargetsinDrosophila‐miRanda

Enrightetal,GenomeBiology,2003

Find complementary sequence matches in 3’ UTRs!(Modified Smith-Waterman algorithm)!

Calculate free energy (stability) of miRNA/UTR binding!(ΔG Kcal / mol)!

Estimate evolutionary conservation!(Sequence conservation; relative positioning within the 3’ UTR) !

73 known Drosophila miRNAs!

Page 39: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Sequence matching: problems •  miRNAs are very small (21-22nt)

–  Enormous number of potential targets with complementary sequence

–  BLAST does not scale.

•  Low-complexity sequences –  Signal to noise problem

•  Standard sequence analysis packages generally not applicable

–  Looking for complementarity, not similarity •  i.e. A:U G:C not A:A G:G etc.

–  Wobble pairing permitted •  G:U and U:G base pairs

•  Small number of known cases to work with

Page 40: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Sequencematchingalgorithm•  ModifiedSmith‐Watermanalgorithm

–  Insteadoflookingformatchingnucleotides,findscomplementarynucleotides

–  AllowsGU‘wobble’pairs(butdownweightthem)–  Scoringsystemweightedsothatcomplementaritytothefirst11basesofthemiRNAismoregreatlyrewarded

–  Non‐complementarityalsomoreheavilypenalizedinthatregion

–  KnownmiRNAsbind3’UTRsatmultiplesites•  AdditivescoringsystemforalltargetsitespredictedinaUTR

•  Calculatefreeenergyofbinding(ViennaRNApackage)

Page 41: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Evolutionaryconservation

•  UsedconservationasawayofkeepingonlythemostlikelymiRNAtargetcandidates

•  UsedDrosophilapseudooscuraandAnophelesgambiaeascloselyrelatedspecies:–  Required>=80%sequencesimilarityoftargetsitewithD.pseudooscura

–  Required>=60%seqidwithA.gambiae

•  Also,requirethatthelocationofthetargetsiteintheUTRisequivalent

Page 42: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Controlsequences

•  100setsofrandom73miRNAsgenerated–  ConservedD.melanogastermiRNAnucleotidefrequencies

•  Analysisrunindependentlyforeachset•  Resultsandcountsaveragedoverall100sets•  OverallFPrate:35%

–  Numberofrandomhits/numberof“real”hits

•  Ifonlytargetsthathave≥2conservedsitesinaUTRarecounted,theFPratedropsto9%

Page 43: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Validation

•  Initialvalidation:applicationtoexperimentallyverifiedtargets–  9/10knowntargetgenesforthreemiRNAscorrectlyidentified–  BUTbiasedinfavorofthissincethemethodisbasedonthe

backgroundknowledgederivedfromthese•  For73DrosophilamiRNAs,701predictedtargetgenes

(outof~9,805/13,500genesinthegenome)–  Manytranscriptionfactorsandothergenesinvolvedin

development→  One‐to‐manyandmany‐to‐onerelationships

Page 44: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

•  DiscovermiRNAsinDrosophila

•  DiscovermiRNAtargetsinDrosophila

•  DiscovermiRNAtargetsinmammals

Page 45: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Pipelinetoidentifymammaliantargets‐TargetScan

Lewisetal,Cell,2003

Find “seed matches” in the 3’ UTR!(match bases 2-8 of the miRNA exactly)!

Extend the seed matches!

Evaluate the folding free energy!

79 conserved !mammalian miRNAs!

Page 46: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Controls

•  Shuffledsequences‐havefewermatchesthantherealmiRNA

•  Preserveallrelevantcompositionalfeatures–  ExpectedfrequencyofseedmatchestotheUTRdataset–  Expectedfrequencyofmatchingtothe3’endofthemiRNA–  ObservedcountofseedmatchesintheUTRdataset–  PredictedfreeenergyoftheRNAduplex

•  Eachshuffledcontrolsequencealsohasthesamelengthandbasecompositionastheparent

•  Signal:noiseratio=3.2:1–  5.7“real”targetsvs.1.8targetsfoundwithcontrolsequences

–  Approximatelya31%FPrate

Page 47: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Validation

•  Luciferasereporterassaysusedtotest15(outof>400)predictedtargets–  Experimentalsupportfor11/15

•  MammalianmiRNAtargetshavediversefunctions(unlikeplants,wheremiRNAsalmostexclusivelyinvolvedindevelopmentalprocesses)–  Enrichedindevelopmentalfunction,transcription

–  Alsoinnucleicacidbindingandtranscriptionalregulatoractivity

Page 48: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Examineresults

•  Addedindogandchickenconservation•  Lookedatflankingsequenceofcontrolandrealmatches

intheUTRs

Lewisetal,Cell,2005

anchoringAs

Page 49: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Lewisetal,Cell,2005

Page 50: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Modifymodel‐TargetScanS

•  Targetsidentifiedbyconservedcomplementaritytonucleotides2‐7ofthemiRNA

•  AconservedAdenosineatnucleotide1•  Often,aconservedAdenosineatnucleotide8•  Don’tlookpastnucleotide8anymore•  Don’tcalculatefreeenergyanymore•  Potentially,thousandsofmammaliantargets

Page 51: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Nottheendofthestory…

•  ManyprogramsareclaimedtobeabletodiscovermiRNAtargetsinmammals–  TargetScanS‐Lewisetal,MIT–  miRanda‐Enrightetal,SKI–  DIANA‐MicroT‐Hatzigeorgiouetal,UPenn–  rna22‐Rigoutsosetal,IBM–  PicTar‐Rajewskyetal,NYU–  RNAhybrid‐Rehmsmeieretal,Bielefeld

•  Differentalgorithms/modelsgivedifferentresults

Page 52: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Userfrustration

•  AnilJeqqa,postingonthemiRNANatureforums,reports:–  “I was looking at and comparing the miRNA target gene

predictions from five commonly used algorithms, viz., miRanda, targetScanS, PicTar, microT and mirTarget. Surprisingly, there is so little overlap! And I also did a comparison with the entries in TarBase (that houses about 100 experimentally validated miRNA-gene pairs) and surprisingly almost all of the five prediction algorithms perform quite badly.” (from the miRNA forum on the Nature forums, 27 August, 2007)

Page 53: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Evaluationcomparison

Alexiouetal,Bioinformatics,2009

Asingleaccuratealgorithmisbetterthanacombinationofpredictions.Betterspecificityofacombinationisachievedatahigherpriceinsensitivity

Page 54: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Futuredirections

•  Adventofexperimentaldatagivesexcellentbenchmarkingopportunitesaswellasprovidingnewdatatorefinehypotheses–  SILAC:measuresthelevelsofmanyproteinsconcurrently

•  BaeketalNature2008•  SelbachetalNature2008

–  HITS‐CLIP:identificationandsequencingoftargetsitesformiRNAs

•  ChietalNature2009•  Lookfortargetsitesoutsidethe3’UTR•  CombinatorialeffectofmiRNAs

–  CoordinatedregulationbymultiplemiRNAs(whichmayalsobeco‐transcribedinthesamepri‐miRNA)

•  SeereviewbyBartel(Cell2009)foradiscussionofotherchallenges

Page 55: Bioinformatics History and Introduction - Cornell Universitychagall.med.cornell.edu/BioinfoCourse/presentations201… ·  · 2010-01-30Introduction Luce Skrabanek ... storage, manipulation

Importantpoints

•  Thistypeofanalysisfollowsthesamebasicprocedureasa‘normal’wetlabscientificexperiment–  Backgroundinformation–  Hypothesis/model–  Controls–  Validation–  Modifymodelandrepeat

•  Manyofthetechniquesusedherearewell‐known,somearemodified

•  Availabilityofcompletegenomes,scalablealgorithmsandcomputationalresourcescrucialtothistypeofanalysis

Knowledgeofthebiologyinformsthebioinformatics