Top Banner

of 41

RevolutionAnalytics - Solution for Big Data

Apr 14, 2018

Download

Documents

dadhich00
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    1/41

    Revolution Confidential

    Revolution Analytics

    September, 2010

    1

    Introduction to Revolution R

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    2/41

    Revolution Confidential

    Key challenges for data-driven

    organizations

    2

    Explosion of data, and competitive imperative to analyze it

    Access to advanced analytics tools to derive knowledge

    Attracting analysts at all levels capable of performingadvanced analytics

    Efficiently sharing analyses and disseminating knowledgethroughout the organization

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    3/41

    Revolution Confidential

    R and Revolution

    3

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    4/41

    Founded: 2007

    Offices: Palo Alto (HQ)

    Seattle (Eng)

    Employees: 30+

    CEO: Norman Nie

    4

    The leading commercial provider of software and support for the

    popular open source R statistics language.

    http://www.revolutionanalytics.com/video.php

    http://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.php
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    5/41

    Revolution ConfidentialThe Team

    Norman H. Nie CEO, Director

    Co-founder, former CEO & Chairman SPSS,

    Professor Emeritus of Political ScienceUniversity of Chicago & Stanford

    Robert Gentleman Director Co-creator of R, Sr. Director Genentech

    Zack Urlocker Director Former EVP MySQL, open-source expert

    David Champagne CTO Former principal architect, SPSS

    Jeff Erhardt COOFortune 500 background in technology,

    operations, finance, M&A, and strategy

    David SmithVP Marketing &

    CommunityStatistics expert& long-time community

    advocate; co-author An Introduction to R

    Mike Minelli VP SalesFormer Sales Director at SAS, author of

    Partnering with the CIO

    Sheri GilleyPrincipal UI

    DesignerChief UI designer at SPSS, creator of 1st

    SPSS UI for Windows

    Tex Hull Technical Advisor Co-founder and 40 year employee of SPSS

    5*Team rebuilt and expanded in 2009

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    6/41

    Revolution ConfidentialWhat is R?

    Data analysis software A programming language

    Development platform designed by and for statisticians

    An environment

    Huge library of algorithms for data access, datamanipulation, analysis and graphics

    An open-source software project Free, open, and active

    A community Thousands of contributors, 2 million users

    Resources and help in every domain

    6

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    7/41

    Revolution Confidential

    because R is the most powerful & flexible

    statistical programming language in the world1

    Capabilities Sophisticated

    statistical analyses

    Predictive analytics

    Data visualization

    Applications Real-time trading

    Finance

    Risk assessment

    Forecasting

    Bio-technology

    Drug development

    Social networks

    .. and more

    7

    Sharpe Ratio

    fast

    slow

    20

    40

    60

    80

    20 40 60 80

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    15

    20

    25

    30

    MSFT [2009-01-02/2010-03-31]

    Last 29.29

    Volume(millions):63,760,000

    50

    100

    150

    200

    250

    MovingAverageConvergenceDivergence(12,26,9):MACD: 0.702Signal: 0.712

    -6

    -4

    -2

    0

    2

    4

    6

    Jan 02 2009 A pr 0 12009 Jul 01 2009 O ct 01 2009 Jan 04 2010 M ar 31 2010

    prestige

    20 40 60 80

    0

    20

    40

    60

    80

    100

    20

    40

    60

    80

    income

    0 20 4 0 60 80 100 20 40 60 80 100

    20

    40

    60

    80

    100

    education

    X

    -10

    -5

    0

    5

    10

    Y

    -10

    -5

    0

    5

    10

    Sinc

    (r)

    -2

    0

    2

    4

    6

    8

    1 Norman Nie, multiple interviews

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    8/41

    R is exploding in popularity and

    functionality

    0

    1,000

    2,000

    3,000

    4,000

    1995 2000 2005 2010

    R

    SAS

    Stata

    SPSS

    S-Plus

    600

    900

    1,050

    2,000

    4,000

    Stata

    S-Plus

    SPSS

    SAS

    R

    Stata 10%

    S-Plus 0%

    SPSS -27%

    SAS -11%

    R 46%

    Internet Discussion

    Mean monthly traffic on email discussion list

    Web Site PopularityNumber of links to main web site

    Scholarly ActivityGoogle Scholar hits (05-09 CAGR)

    8Source: http://r4stats.com/popularity

    0

    500

    1000

    1500

    2000

    2500

    2004 2006 20102002 2008

    Package Growth

    Number of R packages listed on CRAN

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    9/41

    Revolution Confidential

    Revolution R Enterprise:

    Production-Grade Analysis for Business

    Revolution R Enterprise is fast, usable and practical, making itthe ideal choice for real-world data analytics.

    Enhanced Speed andReliability

    Graphical IDE enables faster, more accurate R programming.Create breakpoints and step through code with a single click.

    Visual Productivity &Step Debugging

    Process and analyze very large data sets, with highperformance, but without the need for specialized hardware.

    Scale to Terabyte-ClassDatasets

    Deploy on-demand R analytics to spreadsheets, BIdashboards, web applications and more via Web Services

    Enhance businessapplications with analytics

    Use multiple servers to reduce computation time forsimulations, optimizations, segmented data analysis and more.Parallel Processing Tools

    Workstation and Server deployments for 32-bit and 64-bitWindows and Red Hat Enterprise Linux.Wide Platform Support

    Revolution Analytics is there to support you when you needhelp or confront an issue.On-Call Technical Support

    9

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    10/41

    Revolution Confidential

    Revolution R Enterprise has

    Open-Source R Engine at the core

    10

    R EngineLanguageLibraries

    CommunityPackages

    TechnicalSupport

    Multi-

    Threadedmath

    libraries

    Web-Based

    GUI

    WebServices

    API

    Big DataAnalysis

    ParallelTools

    IDE/Developer

    GUI

    BuildAssurance

    Revolution Proprietary

    additions

    Community - Open Source

    Revolution Forthcoming

    proprietary additions

    www.revolutionanalytics.com/our-vision

    http://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-vision
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    11/41

    Revolution Confidential

    Revolution and R have garnered

    tremendous attention in the media

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    12/41

    Revolution Confidential

    Seven Awesome Things about R

    12

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    13/41

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    14/41

    Revolution ConfidentialAwesome Thing #1: Open Source

    Licensed under GPL (like Linux!) Flexible

    Open for integration

    Data (SAS, SPSS, Excel, SQL Server, Oracle,)

    Systems (applications, webservers, )

    Broad user-base De-facto standard for data analysis teaching

    14

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    15/41

    Revolution ConfidentialAwesome Thing #2: Language

    Programming, not dialogs or cell formulas Freedom to combine methods

    Repeatable results

    Reliable and reusable Language designed for data analysis

    Object-oriented: vector, matrix, model,

    Built-in library of algorithms Get more done, faster

    15

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    16/41

    Revolution ConfidentialAwesome Thing #2: Language = Speed

    New York Times, June 25 2009

    3 hours after Michael Jacksons death

    16

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    17/41

    Revolution Confidential

    Awesome Thing #3: Graphics and Data

    Visualization Functions for standard graphs

    Scatterplot, time series, histogram,

    smoothing,

    Bar plot, pie chart, dot chart,

    Image plot, 3-D surface, map,

    Influences from Cleveland, Tufte etc.

    Conditioning, small multiples, use of color

    Customize without limits

    Combine graph types

    Create entirely new graphics

    17

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    18/41

    Revolution ConfidentialAwesome Thing #4: Statistics

    All standard statistical methods built in Mean, median, covariance, distributions,

    Regression, ANOVA, cross-tabulations,

    Survival, nonlinear mixed effects, GLM, Neural networks, trees, GAM,

    Object-oriented functions

    Access all parts of the analysis results Combine analytic methods

    18

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    19/41

    Revolution ConfidentialStatistics in R at Google

    Predicting the Present

    with Google TrendsHal Varian, Chief

    Economist, Google

    (link)

    ## Import Google Trends Datagoogle = read.csv(googletrends.csv)google$date = as.Date(google$date)

    ## Merge Sales Data w/ Google Trends Datagoogle$month = as.Date(paste(

    substr(google$date, 1, 7), 01, sep=-))dat = merge(dat, google)

    ### Fit Modelfit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1)summary(fit)

    19

    http://blog.revolutionanalytics.com/2009/09/google-uses-r-to-predict-economic-activity.htmlhttp://blog.revolutionanalytics.com/2009/09/google-uses-r-to-predict-economic-activity.html
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    20/41

    Revolution ConfidentialAwesome Thing #5: Cutting-edge analytics

    Really good domain-specific suites for R: Portfolio Optimization: Rmetrics

    Quantitative Financial Modeling: Quantmod

    Genomics: Bioconductor

    Thousands of add-on packages:

    CRAN: cran.r-project.org

    Task Views

    Machine learning, natural language processing,HPC, Econometrics, Environmetrics,

    20

    http://www.rmetrics.org/http://www.quantmod.com/http://www.bioconductor.org/http://cran.r-project.org/http://cran.r-project.org/web/views/http://cran.r-project.org/web/views/http://cran.r-project.org/http://www.bioconductor.org/http://www.quantmod.com/http://www.rmetrics.org/
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    21/41

    Revolution ConfidentialAwesome Thing #6: Community

    R Project homepage: www.r-project.org

    Find the best R package to solve a problem: crantastic.org

    Get your R question answered:

    Stackoverflow (R tag) Read R blogs

    Revolutions blog

    R-bloggers

    Find R Tweeps #rstats hashtag on Twitter

    Find the best of R on the Web inside-R.org

    21

    http://www.r-project.org/http://crantastic.org/http://stackoverflow.com/questions/tagged/rhttp://blog.revolutionanalytics.com/http://www.r-bloggers.com/http://twitter.com/http://inside-r.org/http://inside-r.org/http://inside-r.org/http://inside-r.org/http://twitter.com/http://www.r-bloggers.com/http://www.r-bloggers.com/http://www.r-bloggers.com/http://blog.revolutionanalytics.com/http://stackoverflow.com/questions/tagged/rhttp://crantastic.org/http://www.r-project.org/http://www.r-project.org/http://www.r-project.org/
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    22/41

    Revolution Confidential

    Revolution: Supporting the R Community

    http://www.inside-R.org 22

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    23/41

    Revolution ConfidentialAwesome Thing #7 : No Limits!

    Open Powerful

    Mashable

    Flexible

    Fun!

    San Francisco Estuary Institute (link)

    23

    http://blog.revolutionanalytics.com/2009/04/find-a-safer-place-in-the-bay-area.htmlhttp://blog.revolutionanalytics.com/2009/04/find-a-safer-place-in-the-bay-area.html
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    24/41

    Revolution Confidential

    Revolution R Enterprise

    www.revolutionanalytics.com/prod

    ucts

    24

    http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    25/41

    Revolution ConfidentialRevolution R Enterprise

    25

    High-performance R for multiprocessor systems

    Statistical Analysis of Terabyte-Class Data Sets

    Parallel Programming on Clusters / Cloud

    Modern Integrated Development Environment

    Validation for use in regulated environments

    Telephone and email technical support

    Training and consulting services

    Deploy R Applications via Web Services

    Easy-to-Use Graphical User Interface1

    Production-Grade Statistical Analysis for the Workplace

    1 Coming in 2011

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    26/41

    Revolution ConfidentialHigh performance Intel MKL library

    * 4-core laptop 26

    R(CRAN)

    Revolution REnterprise

    Computation* R Revolution R Speedup

    Linear Algebra1

    Matrix Multiply 243 sec 5.9 sec 41x

    Cholesky Factorization 23 sec 1.1 sec 21x

    Linear Discriminant Analysis 142 sec 32.0 sec 4.4x

    General R Benchmarks2

    R Benchmarks (Matrix Functions) 20 sec 2.1 sec 9.5x

    R Benchmarks (Program Control) 4.7 sec 4.2 sec 0x

    1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php

    2. http://r.research.att.com/benchmarks/

    http://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://r.research.att.com/benchmarks/http://r.research.att.com/benchmarks/http://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    27/41

    Revolution Confidential

    R Productivity Environment

    27

    Script with type

    ahead and code

    snippets

    Solutions window

    for organizingcode and data

    Packages

    installed and

    loaded

    Objects

    loaded in theR

    Environment

    Object

    details

    Sophisticated

    debugging with

    breakpoints , variable

    values etc.

    http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm

    R S l R

    http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    28/41

    Revolution Confidential

    RevoScaleR:

    Scale R to terabyte-class data sets

    28

    Existing Solutions

    Base R is constrained by

    CAPACITY and PERFORMANCE

    Existing packages address

    EITHER capacity OR

    performance

    DIFFICULT and COSTLY to

    implement

    RevoScaleR

    COMPREHENSIVE top-down

    solution

    Addresses BOTH capacity

    AND performance

    Plug-and-Play EASY to

    implement

    CUSTOMIZABLE and

    EXTENSIBLE

    Bi D t A l i ith R S l R

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    29/41

    Revolution Confidential

    Big Data Analysis with RevoScaleR:

    Bringing performance andcapacity to R

    XDF File Format

    C++ Distributed

    AnalyticsProgramming

    Framework*

    Distributed

    Statistical Algorithms

    External

    Memory

    Programming

    Framework

    A novel high-speed

    file format designedspecificall to

    support statistical

    analyses

    Addresses capacity

    through a collection

    of functions for

    chunking through

    massive data files

    Extensible

    framework for

    creating new

    external memory

    algorithms

    Addresses

    performance by

    distributing

    computations

    between cores and

    computers

    *Coming soon

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    30/41

    Revolution Confidential

    RevoScaleR: Big Data Statisticswww.revolutionanalytics.com/bigdata

    30

    Every US airline departureand arrival, 1987-2008

    File: AirlineData87to08.xdf

    Rows: 123.5 million

    Variables: 29

    Size on disk: 13.2Gb

    Average Arrival Delay by Day of Week by Departure Hour

    DepartureHour

    ArrDelay

    -200

    20

    40

    0123456789101112131415161718192021222324

    Monday Tuesday

    Wednesday

    -20

    0

    20

    40Thursday

    -20

    0

    20

    40Friday Saturday

    -20

    0

    20

    40Sunday

    arrDelayLm2

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    31/41

    Revolution ConfidentialRevoDeployR: Enterprise-ready R server

    Disseminate output of analysts to decision makers

    Integrate R analytics into Web based applications Data Analysis and Visualization

    Reporting Dashboards

    Interactive applications

    Revolution R Enterprise Server with RevoDeployR a standardized collection of web services for delivering R

    capabilities to applications

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    32/41

    Revolution ConfidentialIntegrate R analytics with applications

    32

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    33/41

    Revolution Confidential

    Revolution DeployR

    Data Analysis Business

    Intelligence

    Cloud / SaaSInteractive Web

    Apps

    Revolution R Web Services: DeployR

    R / Statistical Modeling Expert

    Deployment Expert

    DataSources&

    creationofAnalyt

    ics

    Consumptionof

    Analytics&Results

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    34/41

    Revolution ConfidentialParallelR: computing in the cloud

    Parallel processing onlocal machine, cluster, or

    cloud

    foreach replaces loops

    Minimal code changes

    Significant speedups

    34Evolving R for Commercial Use

    # 1000 simulations# 8 running in parallel

    library(foreach)

    require("doNWS")

    s

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    35/41

    Revolution ConfidentialRevolution R: Ready for IT

    Repeatable build process which isdocumented and open to auditCompliance

    IQ and OQ documentation for streamlineddeploymentsDocumentation

    Confidence in control over versioning anddistributionGovernance

    Support from a world class engineeringand development organizationSupport

    R is a modern language and is objectoriented, which means IT can use it tooModern

    35

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    36/41

    Revolution ConfidentialComing soon: Revolution R GUI

    36

    Accessible

    Powerful

    Extensible

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    37/41

    Revolution Confidential

    Conclusion

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    38/41

    Revolution ConfidentialWhy R?

    Embraced by the academic community,which is creating a lot of R TalentAvailable Talent

    Large innovative community sharing theirideas and models (2M+ users)Open Knowledge

    Its open source nature leads to rapidinnovationRapid Innovation

    R is an object oriented modern programminglanguage, which is IT friendlyModern Language

    Most complete math library, which can bebroken down at a granular levelRich & Flexible

    38

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    39/41

    Revolution Confidential

    Power

    ScaleR

    ParallelR

    Intel MKL build

    Visual IDE

    DeployR

    GUI*

    Support

    Qualification

    Services & Training

    Control

    Enterprise Readiness Productivity

    Revolution brings R to the Enterprise

    *Coming in 2011

    Revolution R Enterprise:

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    40/41

    Revolution Confidential

    Revolution R Enterprise:

    Addressing Key Challenges

    40

    RevoScaleR: Process and Analyze large data sets

    Revolution R Enterprise: Advanced Analytics

    R: The tool of choice for todays analyst

    RevoDeployR: Disseminate Results

  • 7/27/2019 RevolutionAnalytics - Solution for Big Data

    41/41

    Revolution Confidential

    The leading commercial provider of software and support for the

    popular open source R statistics language.

    www.revolutionanalytics.com(650) 330 0553

    Twitter: @RevolutionR

    http://www.revolutionanalytics.com/http://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://www.revolutionanalytics.com/