Top Banner
Data first vs Hypothesis first Alan Ward
28

Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Apr 02, 2015

Download

Documents

Kallie Berney
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

Alan Ward

Page 2: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

Hypothesis driven approach• Look at the data we have• Formulate an hypothesis about ..• Do experiments to test the hypothesis• As a byproduct, collect more data

Weinberg R (2010) Point: Hypotheses first. NATURE 464, 678

Page 3: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

Data driven approach• Identify a system of interest• Identify an approach to measure/describe

attributes of the system• Collect and organise the data

Golub T (2010) Counterpoint: Data first. NATURE 464, 679

Page 4: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

“Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don't know. But there are also unknown unknowns – there are things we do not know we don't know.”—United States Secretary of Defense, Donald Rumsfeld

Page 5: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

The Black Swan: The Impact of the Highly Improbable. Nassim Taleb

Page 6: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis firstknown

Hypothesis driven research

unknownEnzyme activity Feedback

inhibitionAllosteric regulation

Transcriptional regulation -Inducers and repressors

Non-coding short RNAs

Page 7: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis firstBreadth first vs Depth first

A slice up and down

A slice across

Page 8: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Observation has always been part of biology as in the imatinib example (Golub, 2010)but DNA sequencing technology has revolutionized observational data collection. You can see that Weinberg (2010) is arguing that ‘cheap sequencing’ on a massive scale = too much funding for data collection.And, he doesn’t argue it but you might spend all your time managing the data1

Data first vs Hypothesis first

1 Marx, V (2013) Biology: The big challenges of big data. Nature 498, 255–260

Page 9: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Depth first or breadth firstTwo different strategies for computer search algorithms

Which is best?That heavily depends on the structure of the search tree and the number and location of solutions.If you know a solution is not far from the root of the tree, a breadth first search (BFS) might be better. If the tree is very deep and solutions are rare, depth first search (DFS) might rootle around forever, but BFS could be faster. If the tree is very wide, a BFS might need too much memory, so it might be completely impractical. If solutions are frequent but located deep in the tree, BFS could be impractical. If the search tree is very deep you will need to restrict the search depth for depth first search (DFS), anyway.

Data first vs Hypothesis first

Page 10: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis firstEST database• dbEST release 130101

• Summary by Organism - 01 January 2013

• Number of public entries: 74,186,692• Homo sapiens (human) 8,704,790• Mus musculus + domesticus (mouse) 4,853,570• Zea mays (maize) 2,019,137• Sus scrofa (pig) 1,669,337• Bos taurus (cattle) 1,559,495• Arabidopsis thaliana (thale cress) 1,529,700• Danio rerio (zebrafish) 1,488,275• Glycine max (soybean) 1,461,722• Triticum aestivum (wheat) 1,286,372• Xenopus (Silurana) tropicalis (western clawed frog) 1,271,480• Oryza sativa (rice) 1,253,557• Ciona intestinalis 1,205,674• Rattus norvegicus + sp. (rat) 1,162,136• Drosophila melanogaster (fruit fly) 821,005• …..• Salmonella enterica subsp. enterica serovar Typhi 217• Mycobacterium smegmatis str. MC2 155 30• Mycobacterium tuberculosis 30

Page 11: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

DbEST referencesBoguski, MS, Lowe, TMJ, Tolstoshev, CM (1993) DbEST - Database For Expressed Sequence Tags. Nature Genetics 4, 332-333Boguski, MSS (1994) Gene discovery in dbEST. Science 265, 1993-4 Boguski, MSS (1995) The turning point in genome research. Trends in Biochemical Sciences 20, 295-6 Nagaraj, S (2007) A hitchhiker's guide to expressed sequence tag (EST) analysis. Briefings in Bioinformatics 8, 6-21

Data first vs Hypothesis first

Page 12: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Why DNA?An example:Species and strain identification in prokaryotes

• DNA:DNA similarity• MLEE (MultiLocus Enzyme Electrophoresis)• MLST (MultiLocus Sequence Typing)• ANI (Average Nucleotide Identity)

Data first vs Hypothesis first

Page 13: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Defining speciesThe modern concept of species dates back to:

Mayr, E. (1942) Systematics and the Origin of Species(Columbia Univ. Press, New York)

Biological species concept: Species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups

de Queiroz K (2005) Ernst Mayr and the modern concept of species. Proc Natl Acad Sci U S A. 102 Suppl 1: 6600-7.

Page 14: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Bacterial speciesBacteria do not interbreed in the same way so defining species in bacteria remained an exercise in clustering organisms with similar, initially phenotypic, characters

Stanier RY. Adaptation, evolutionary and physiological: Or Darwinism among the microorganisms. In: Davies R, Gale EF, editors. Adaptation

in Microorganisms, Third Symposium of the Society for General Microbiology. Cambridge: Cambridge University Press; 1953

Goldner M (2007) The genius of Roger StanierCan J Infect Dis Med Microbiol 18, 193–194

Page 15: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

DNA:DNA similarityFrom the 1960s there was a consensus that all taxonomic information about a bacterium is incorporated in the complete nucleotide sequence of its genomeWayne et al., in 1987 correlated the measurement of the similarity of DNA of two strains with then currently defined species and concluded that: A DNA:DNA similarity of 70% and a ΔTm of > 5°C, both are important, marks the boundary of a group of strains which belong to the same species

Wayne, L. G., Brenner, D. J., Colwell, R. R., Grimont, P. A. D., Kandler, O., Krichevsky, M. I., Moore, L. H., Moore, W. E. C., Murray, R. G. E. & other authors (1987). Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Bacteriol 37,

463–464.

Page 16: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

DNA-DNA similarityMeasuring DNA similarity by hybridisation is not the same as DNA sequence similarity and it is measured using a number of different techniques% SimilarityDe Ley – rate of renaturationEzaki – microplate bindingΔTmDNA meltingElution from hydroxyapatite

The methods are not robust and few labs can do:

Stackebrandt et al. (2002) Report of the Ad Hoc Committee for the re-evaluation of the species definition in bacteriology. Intl J Systematic Evol Microbiol 52, 1043-1047

Page 17: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Melting Temperature analysis

Page 19: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Using RT-PCR and Syber Green for DNA melt curve analysis

Gonzalez, JM & Saiz-Jimenez, C (2005) A simple fluorimetric method for the estimation of DNA–DNA relatedness between closely related microorganisms by thermal denaturation temperatures. Extremophiles 9, 75–79

Page 20: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

ΔTm determination

Exactly the same melting program, but this time the DNA from Organism 1 and Organism 2 has been mixed, denatured and then renatured at the optimum temperature for renaturation TOR calculated from the %GC (Tor=0.51(%GC)+47.0) before adding Syber Green and melting

Page 21: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Disadvantages of DNA-DNA similarityBecause DNA:DNA hybridisation compares the whole genome it has remained the “Gold standard” for species delineation but it has several disadvantages:It requires large amounts of high quality DNAThe methods are difficult to doDifferent methods can different resultsReciprocal measurements can be very different(amount of A binding to B is different from amount of B binding to A)

The experimental measurement has to be made between 2 strains – so to obtain DNA-DNA similarity for 5 strains requires 20 experimental determinations and if a 6th strain needs to be compared another 5 experiments are neededYou can’t build an incremental database

Page 22: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Disadvantages of DNA-DNA similarity

Page 23: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Multilocus Enzyme Electrophoresis

MLEE

Selander, RK, Caugant, DA, Ochman, H, Musser, JM, Gilmour, MN and Whittam, TS (1986) Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Appl. Environ. Microbiol 51, 873-884

Page 24: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Multilocus sequence typingMLST

Maiden, MCJ, Bygraves, JA, Feil, E, Morelli, G, Russell, JE, Urwin, R, Zhang, Q, Zhou, J, Zurth, K, Caugant, DA, Feavers, IM, Achtman, M, and Spratt, BG (1998) Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95, 3140–3145

Staphylococcus aureus

Page 25: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

• Portable• Unambiguous• Reproducible• Cumulative• Scalable

Multilocus sequence typingMLST

Page 26: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

The traditional method of data reduction is publication —results are summarized in peer-reviewed journals. Publications include only the most important results, from experiments that may have been performed over many years. The published paper is a concise compilation of the data, an interpretation of the results, and a comparison with results obtained by others.

Data first vs Hypothesis first

A significant fraction of experiments from academic laboratories cannot be repeated in industry1. Reflecting inadequate description of experiments performed on different equipment and on biological samples that were produced with disparate methods.1 Begley CG & Ellis LM (2012) Drug development: Raise standards for preclinical cancer research Nature 483, 531–3

Page 27: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

In 1991 the GenBank On-line Service utilized a Solbourne 5/800 running OS/MP 4.0C. The database work was done on a Sun network 4/490 server and workstations running SunOS UNIX version 4.1. The GenBank database was maintained on Sybase relational database management system (RDBMS). Software was developed in ' C language.

In 1990s NCBI scanned the literature for sequences and manually typed them into the database.

Page 28: Data first vs Hypothesis first Alan Ward. Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about..

Data first vs Hypothesis first

Benson, DA, Cavanaugh, M, Clark, K, Karsch-Mizrachi, I, Lipman, DJ, Ostell J and Sayers EW (2013) GenbankNucleic Acids Research 41, D36–D42