Top Banner

of 48

2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

Jun 02, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    1/48

    Reproducible

    Research andthe Cloud

    Dr Kenji Takeda ([email protected])

    Microsoft Research

    @azure4research@ktakeda1

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    2/48

    Microsoft Research

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    3/48

    Scientific Discovery

    = + +

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    4/48

    The Research Lifecycle

    Data

    Acquisition &modelling

    Collaboration

    andvisualisation

    Analysis &data mining

    Dissemination& sharing

    Archiving andpreserving

    fourthparadigm.org

    http://fourthparadigm.org/http://fourthparadigm.org/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    5/48

    X

    -

    Info

    The evolution of X-Info

    and Comp-X for each discipline X How to codify and represent our knowledge

    Data ingest

    Managing a petabyte Common schema

    How to organize it

    How to reorganize it

    How to share with others

    Query and Vis tools

    Building and executing models Integrating data and Literature

    Documenting experiments

    Curation and long-term

    preservation

    The Generic Problems

    Experiments &Instruments

    Simulations

    Literature

    Other Archives

    facts

    facts

    facts

    facts

    Questions

    Answers

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    6/48

    Data-Intensive Research

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    7/48

    Believe it or not: how much can we rely onpublished data on potential drug targets?

    at least 50% of published studies, even those in top-tier academic journals,

    cant be repeated with the same conclusions by an industrial lab

    Osherovich, L. Hedging against academic risk. SciBX14 Apr 2011 (doi:10.1038/scibx.2011.416).

    http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.htmlhttp://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.htmlhttp://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    8/48

    Cold fusion

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    9/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    10/48

    Science 2.0 EU Consultation

    http://www.consultation-science20.eu/

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    11/48

    CLOUD COMPUTING

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    12/48

    On-demand services,delivered over the network

    Cloud computing provides

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    13/48

    Getting what you need,when you need it

    Cloud computing is good for

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    14/48

    Focussing on your research

    Cloud computing is good for

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    15/48

    The Cloud

    democratizes

    access to scale &

    economies of scale

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    16/48

    CloudComputing

    Patterns

    tCompute

    Inactivity

    Period

    t

    t

    t

    On and OffOn & off workloads (e.g. batch job)

    Over provisioned capacity is wasted

    Time to market can be cumbersome

    Unpredictable BurstingUnexpected/unplanned peak in demand

    Sudden spike impacts performance

    Cant over provision for extreme casesCompute

    Growing FastSuccessful services needs to grow/scale

    Keeping up w/ growth is big IT challenge

    Cannot provision hardware fast enoughCompute

    Predictable BurstingServices with micro seasonality trends

    Peaks due to periodic increased demand

    IT complexity and wasted capacity

    Compute

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    17/48

    Globalpresence

    Datacenter

    Edge point

    The Microsoft Cloud

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    18/48

    Cloud Computing

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    19/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    20/48

    Choose from multiple runtimes and languages for your

    applications: Python, Java, PHP, .NET, Node.js

    Run Linux on Microsoft Azure Virtual Machines (VHD)

    Support multiple frameworks and popular open source

    applications with Microsoft Azure Web Sites

    HDInsightHadoop for Big Data analysis

    Microsoft Azure

    http://github.com/windowsazure

    http://github.com/windowsazurehttp://github.com/windowsazure
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    21/48

    Research Cloud Ecosystem

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    22/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    23/48

    http://www.p

    hdcomics.com

    /comics.p

    hp?f=1689

    http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689http://www.phdcomics.com/comics.php?f=1689
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    24/48

    Computational experiments should berecomputable for all time

    Recomputation of recomputable experimentsshould be very easy

    It should be easier to make experimentsrecomputable than not to

    Tools and repositories can help recomputationbecome standard

    The onlyway to ensure recomputability is toprovide virtual machines

    Runtime performance is a secondary issue

    Ian Gent , Alexander Konovalov and Lars Kotthoff

    Steven Crouch, Devasena Inupakutika

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    25/48

    Recomputation.org

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    26/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    27/48

    Zanadu.IO

    Patrick Henaff and Claude Martini

    http://zanadu.io/http://zanadu.io/http://zanadu.io/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    28/48

    Zanadu.IO

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    29/48

    khmer-protocols: Effort to provide standard

    cheap assemblyprotocols for cloudmachines.

    Entirely copy/paste; ~2-6days from raw reads toassembly, annotations,and differentialexpression analysis. Est~$150 per data set

    Open, versioned,forkable, citable.

    Open Science

    C. Titus Brown, @ctitusbrown

    http://ged.cse.msu.edu/

    http://ivory.idyll.org/

    http://ged.cse.msu.edu/http://ivory.idyll.org/http://ivory.idyll.org/http://ged.cse.msu.edu/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    30/48

    Explicitly a protocol explicitsteps, copy-paste, customizable,versioned; not black box.

    No requirement for computational

    expertise or significantcomputational hardware.

    ~1-5 days to teach a benchbiologist to use.

    $100-150 of rental compute(cloud computing)

    for $1000 data set.

    Now adding in quality control andinternal validation steps.

    Some thoughts

    Reproduciblecomputingenvironment

    (Azure)

    Publiclyavailable

    data

    (MMETSP)

    Open andversionedprotocol

    Provenancetracking and

    registration

    (Synapse?)

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    31/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    32/48

    Computing Cancer

    http://biomodelanalyzer.research.microsoft.com/

    http://biomodelanalyzer.research.microsoft.com/http://biomodelanalyzer.research.microsoft.com/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    33/48

    Troubling Trends in Scientific Software

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    34/48

    Azure Machine Learning

    Azure Machine Learning Awards 15 Sep14

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    35/48

    Azure Machine Learning - Sharing

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    36/48

    www.tryfsharp.org

    http://www.tryfsharp.org/http://www.tryfsharp.org/create/kenji/WorldBankeDemo.fsxhttp://www.tryfsharp.org/create/kenji/WorldBankeDemo.fsxhttp://www.tryfsharp.org/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    37/48

    NOTES FROM THE FIELD

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    38/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    39/48

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    40/48

    http://www.rigb.org/docs/faraday_notebooks__induction_0.pdf

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    41/48

    21st Century Log Notebooks

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    42/48

    Verification versus Validation

    Are you building

    it right?

    Are you building

    the right thing?

    R t bilit R li bilit

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    43/48

    Reproducing myown results

    Replicating otherpeoples results

    Reproducing otherpeoples results

    Repeatability, Replicability,Reproducibility, Reuse

    reviewers have no time and no resources to reproducedata and to dig deeply into the presented work.

    Life Sci VC: Academic bias & biotech failures: http:// lifescivc.com/2011/03/academic-bias-biotech-failures/#0_ undefined,0_

    Photo:leecha

    ntmcarthur,CC-BY

  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    44/48

    Enabling Science 2.0

    www.azure4research.com

    http://www.azure4research.com/http://www.azure4research.com/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    45/48

    Use laptops &desktop computers

    Overwhelmed bydata

    Finding analysisever more difficult;sharing evenharder

    www.azure4research.com

    Enabling Science 2.0

    http://www.azure4research.com/http://www.azure4research.com/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    46/48

    Microsoft Azure for Research

    Azure Research Awards General next 15 Aug

    Machine Learning next 15 Sep

    Microsoft Azure for ResearchOnline Training

    Webinars

    Technical papers & walkthroughs

    Research community engagementswww.azure4research.com

    http://www.azure4research.com/http://www.azure4research.com/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    47/48

    THANK YOU

    [email protected]

    www.azure4research.com

    Microsoft Azure for Research Group

    @azure4research

    http://www.azure4research.com/http://www.azure4research.com/http://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttp://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttps://twitter.com/Azure4Researchhttps://twitter.com/Azure4Researchhttp://www.linkedin.com/groups/Windows-Azure-Research-6521580?home=&gid=6521580&trk=groups_most_popular-h-logohttp://www.azure4research.com/http://www.azure4research.com/
  • 8/11/2019 2014 EMCSR Kenji Takeda ReproducibilityWorkshopStAndrews

    48/48