7/27/2019 RevolutionAnalytics - Solution for Big Data
1/41
Revolution Confidential
Revolution Analytics
September, 2010
1
Introduction to Revolution R
7/27/2019 RevolutionAnalytics - Solution for Big Data
2/41
Revolution Confidential
Key challenges for data-driven
organizations
2
Explosion of data, and competitive imperative to analyze it
Access to advanced analytics tools to derive knowledge
Attracting analysts at all levels capable of performingadvanced analytics
Efficiently sharing analyses and disseminating knowledgethroughout the organization
7/27/2019 RevolutionAnalytics - Solution for Big Data
3/41
Revolution Confidential
R and Revolution
3
7/27/2019 RevolutionAnalytics - Solution for Big Data
4/41
Founded: 2007
Offices: Palo Alto (HQ)
Seattle (Eng)
Employees: 30+
CEO: Norman Nie
4
The leading commercial provider of software and support for the
popular open source R statistics language.
http://www.revolutionanalytics.com/video.php
http://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.phphttp://www.revolutionanalytics.com/video.php7/27/2019 RevolutionAnalytics - Solution for Big Data
5/41
Revolution ConfidentialThe Team
Norman H. Nie CEO, Director
Co-founder, former CEO & Chairman SPSS,
Professor Emeritus of Political ScienceUniversity of Chicago & Stanford
Robert Gentleman Director Co-creator of R, Sr. Director Genentech
Zack Urlocker Director Former EVP MySQL, open-source expert
David Champagne CTO Former principal architect, SPSS
Jeff Erhardt COOFortune 500 background in technology,
operations, finance, M&A, and strategy
David SmithVP Marketing &
CommunityStatistics expert& long-time community
advocate; co-author An Introduction to R
Mike Minelli VP SalesFormer Sales Director at SAS, author of
Partnering with the CIO
Sheri GilleyPrincipal UI
DesignerChief UI designer at SPSS, creator of 1st
SPSS UI for Windows
Tex Hull Technical Advisor Co-founder and 40 year employee of SPSS
5*Team rebuilt and expanded in 2009
7/27/2019 RevolutionAnalytics - Solution for Big Data
6/41
Revolution ConfidentialWhat is R?
Data analysis software A programming language
Development platform designed by and for statisticians
An environment
Huge library of algorithms for data access, datamanipulation, analysis and graphics
An open-source software project Free, open, and active
A community Thousands of contributors, 2 million users
Resources and help in every domain
6
7/27/2019 RevolutionAnalytics - Solution for Big Data
7/41
Revolution Confidential
because R is the most powerful & flexible
statistical programming language in the world1
Capabilities Sophisticated
statistical analyses
Predictive analytics
Data visualization
Applications Real-time trading
Finance
Risk assessment
Forecasting
Bio-technology
Drug development
Social networks
.. and more
7
Sharpe Ratio
fast
slow
20
40
60
80
20 40 60 80
0.00
0.05
0.10
0.15
0.20
0.25
15
20
25
30
MSFT [2009-01-02/2010-03-31]
Last 29.29
Volume(millions):63,760,000
50
100
150
200
250
MovingAverageConvergenceDivergence(12,26,9):MACD: 0.702Signal: 0.712
-6
-4
-2
0
2
4
6
Jan 02 2009 A pr 0 12009 Jul 01 2009 O ct 01 2009 Jan 04 2010 M ar 31 2010
prestige
20 40 60 80
0
20
40
60
80
100
20
40
60
80
income
0 20 4 0 60 80 100 20 40 60 80 100
20
40
60
80
100
education
X
-10
-5
0
5
10
Y
-10
-5
0
5
10
Sinc
(r)
-2
0
2
4
6
8
1 Norman Nie, multiple interviews
7/27/2019 RevolutionAnalytics - Solution for Big Data
8/41
R is exploding in popularity and
functionality
0
1,000
2,000
3,000
4,000
1995 2000 2005 2010
R
SAS
Stata
SPSS
S-Plus
600
900
1,050
2,000
4,000
Stata
S-Plus
SPSS
SAS
R
Stata 10%
S-Plus 0%
SPSS -27%
SAS -11%
R 46%
Internet Discussion
Mean monthly traffic on email discussion list
Web Site PopularityNumber of links to main web site
Scholarly ActivityGoogle Scholar hits (05-09 CAGR)
8Source: http://r4stats.com/popularity
0
500
1000
1500
2000
2500
2004 2006 20102002 2008
Package Growth
Number of R packages listed on CRAN
7/27/2019 RevolutionAnalytics - Solution for Big Data
9/41
Revolution Confidential
Revolution R Enterprise:
Production-Grade Analysis for Business
Revolution R Enterprise is fast, usable and practical, making itthe ideal choice for real-world data analytics.
Enhanced Speed andReliability
Graphical IDE enables faster, more accurate R programming.Create breakpoints and step through code with a single click.
Visual Productivity &Step Debugging
Process and analyze very large data sets, with highperformance, but without the need for specialized hardware.
Scale to Terabyte-ClassDatasets
Deploy on-demand R analytics to spreadsheets, BIdashboards, web applications and more via Web Services
Enhance businessapplications with analytics
Use multiple servers to reduce computation time forsimulations, optimizations, segmented data analysis and more.Parallel Processing Tools
Workstation and Server deployments for 32-bit and 64-bitWindows and Red Hat Enterprise Linux.Wide Platform Support
Revolution Analytics is there to support you when you needhelp or confront an issue.On-Call Technical Support
9
7/27/2019 RevolutionAnalytics - Solution for Big Data
10/41
Revolution Confidential
Revolution R Enterprise has
Open-Source R Engine at the core
10
R EngineLanguageLibraries
CommunityPackages
TechnicalSupport
Multi-
Threadedmath
libraries
Web-Based
GUI
WebServices
API
Big DataAnalysis
ParallelTools
IDE/Developer
GUI
BuildAssurance
Revolution Proprietary
additions
Community - Open Source
Revolution Forthcoming
proprietary additions
www.revolutionanalytics.com/our-vision
http://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-visionhttp://www.revolutionanalytics.com/our-vision7/27/2019 RevolutionAnalytics - Solution for Big Data
11/41
Revolution Confidential
Revolution and R have garnered
tremendous attention in the media
7/27/2019 RevolutionAnalytics - Solution for Big Data
12/41
Revolution Confidential
Seven Awesome Things about R
12
7/27/2019 RevolutionAnalytics - Solution for Big Data
13/41
7/27/2019 RevolutionAnalytics - Solution for Big Data
14/41
Revolution ConfidentialAwesome Thing #1: Open Source
Licensed under GPL (like Linux!) Flexible
Open for integration
Data (SAS, SPSS, Excel, SQL Server, Oracle,)
Systems (applications, webservers, )
Broad user-base De-facto standard for data analysis teaching
14
7/27/2019 RevolutionAnalytics - Solution for Big Data
15/41
Revolution ConfidentialAwesome Thing #2: Language
Programming, not dialogs or cell formulas Freedom to combine methods
Repeatable results
Reliable and reusable Language designed for data analysis
Object-oriented: vector, matrix, model,
Built-in library of algorithms Get more done, faster
15
7/27/2019 RevolutionAnalytics - Solution for Big Data
16/41
Revolution ConfidentialAwesome Thing #2: Language = Speed
New York Times, June 25 2009
3 hours after Michael Jacksons death
16
7/27/2019 RevolutionAnalytics - Solution for Big Data
17/41
Revolution Confidential
Awesome Thing #3: Graphics and Data
Visualization Functions for standard graphs
Scatterplot, time series, histogram,
smoothing,
Bar plot, pie chart, dot chart,
Image plot, 3-D surface, map,
Influences from Cleveland, Tufte etc.
Conditioning, small multiples, use of color
Customize without limits
Combine graph types
Create entirely new graphics
17
7/27/2019 RevolutionAnalytics - Solution for Big Data
18/41
Revolution ConfidentialAwesome Thing #4: Statistics
All standard statistical methods built in Mean, median, covariance, distributions,
Regression, ANOVA, cross-tabulations,
Survival, nonlinear mixed effects, GLM, Neural networks, trees, GAM,
Object-oriented functions
Access all parts of the analysis results Combine analytic methods
18
7/27/2019 RevolutionAnalytics - Solution for Big Data
19/41
Revolution ConfidentialStatistics in R at Google
Predicting the Present
with Google TrendsHal Varian, Chief
Economist, Google
(link)
## Import Google Trends Datagoogle = read.csv(googletrends.csv)google$date = as.Date(google$date)
## Merge Sales Data w/ Google Trends Datagoogle$month = as.Date(paste(
substr(google$date, 1, 7), 01, sep=-))dat = merge(dat, google)
### Fit Modelfit = lm(log(sales) ~ log(s1) + log(s12) + trends1, data=dat1)summary(fit)
19
http://blog.revolutionanalytics.com/2009/09/google-uses-r-to-predict-economic-activity.htmlhttp://blog.revolutionanalytics.com/2009/09/google-uses-r-to-predict-economic-activity.html7/27/2019 RevolutionAnalytics - Solution for Big Data
20/41
Revolution ConfidentialAwesome Thing #5: Cutting-edge analytics
Really good domain-specific suites for R: Portfolio Optimization: Rmetrics
Quantitative Financial Modeling: Quantmod
Genomics: Bioconductor
Thousands of add-on packages:
CRAN: cran.r-project.org
Task Views
Machine learning, natural language processing,HPC, Econometrics, Environmetrics,
20
http://www.rmetrics.org/http://www.quantmod.com/http://www.bioconductor.org/http://cran.r-project.org/http://cran.r-project.org/web/views/http://cran.r-project.org/web/views/http://cran.r-project.org/http://www.bioconductor.org/http://www.quantmod.com/http://www.rmetrics.org/7/27/2019 RevolutionAnalytics - Solution for Big Data
21/41
Revolution ConfidentialAwesome Thing #6: Community
R Project homepage: www.r-project.org
Find the best R package to solve a problem: crantastic.org
Get your R question answered:
Stackoverflow (R tag) Read R blogs
Revolutions blog
R-bloggers
Find R Tweeps #rstats hashtag on Twitter
Find the best of R on the Web inside-R.org
21
http://www.r-project.org/http://crantastic.org/http://stackoverflow.com/questions/tagged/rhttp://blog.revolutionanalytics.com/http://www.r-bloggers.com/http://twitter.com/http://inside-r.org/http://inside-r.org/http://inside-r.org/http://inside-r.org/http://twitter.com/http://www.r-bloggers.com/http://www.r-bloggers.com/http://www.r-bloggers.com/http://blog.revolutionanalytics.com/http://stackoverflow.com/questions/tagged/rhttp://crantastic.org/http://www.r-project.org/http://www.r-project.org/http://www.r-project.org/7/27/2019 RevolutionAnalytics - Solution for Big Data
22/41
Revolution Confidential
Revolution: Supporting the R Community
http://www.inside-R.org 22
7/27/2019 RevolutionAnalytics - Solution for Big Data
23/41
Revolution ConfidentialAwesome Thing #7 : No Limits!
Open Powerful
Mashable
Flexible
Fun!
San Francisco Estuary Institute (link)
23
http://blog.revolutionanalytics.com/2009/04/find-a-safer-place-in-the-bay-area.htmlhttp://blog.revolutionanalytics.com/2009/04/find-a-safer-place-in-the-bay-area.html7/27/2019 RevolutionAnalytics - Solution for Big Data
24/41
Revolution Confidential
Revolution R Enterprise
www.revolutionanalytics.com/prod
ucts
24
http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/http://www.revolutionanalytics.com/products/7/27/2019 RevolutionAnalytics - Solution for Big Data
25/41
Revolution ConfidentialRevolution R Enterprise
25
High-performance R for multiprocessor systems
Statistical Analysis of Terabyte-Class Data Sets
Parallel Programming on Clusters / Cloud
Modern Integrated Development Environment
Validation for use in regulated environments
Telephone and email technical support
Training and consulting services
Deploy R Applications via Web Services
Easy-to-Use Graphical User Interface1
Production-Grade Statistical Analysis for the Workplace
1 Coming in 2011
7/27/2019 RevolutionAnalytics - Solution for Big Data
26/41
Revolution ConfidentialHigh performance Intel MKL library
* 4-core laptop 26
R(CRAN)
Revolution REnterprise
Computation* R Revolution R Speedup
Linear Algebra1
Matrix Multiply 243 sec 5.9 sec 41x
Cholesky Factorization 23 sec 1.1 sec 21x
Linear Discriminant Analysis 142 sec 32.0 sec 4.4x
General R Benchmarks2
R Benchmarks (Matrix Functions) 20 sec 2.1 sec 9.5x
R Benchmarks (Program Control) 4.7 sec 4.2 sec 0x
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
http://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://r.research.att.com/benchmarks/http://r.research.att.com/benchmarks/http://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.phphttp://www.revolutionanalytics.com/why-revolution-r/benchmarks.php7/27/2019 RevolutionAnalytics - Solution for Big Data
27/41
Revolution Confidential
R Productivity Environment
27
Script with type
ahead and code
snippets
Solutions window
for organizingcode and data
Packages
installed and
loaded
Objects
loaded in theR
Environment
Object
details
Sophisticated
debugging with
breakpoints , variable
values etc.
http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm
R S l R
http://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htmhttp://www.revolutionanalytics.com/demos/revolution-productivity-environment/demo.htm7/27/2019 RevolutionAnalytics - Solution for Big Data
28/41
Revolution Confidential
RevoScaleR:
Scale R to terabyte-class data sets
28
Existing Solutions
Base R is constrained by
CAPACITY and PERFORMANCE
Existing packages address
EITHER capacity OR
performance
DIFFICULT and COSTLY to
implement
RevoScaleR
COMPREHENSIVE top-down
solution
Addresses BOTH capacity
AND performance
Plug-and-Play EASY to
implement
CUSTOMIZABLE and
EXTENSIBLE
Bi D t A l i ith R S l R
7/27/2019 RevolutionAnalytics - Solution for Big Data
29/41
Revolution Confidential
Big Data Analysis with RevoScaleR:
Bringing performance andcapacity to R
XDF File Format
C++ Distributed
AnalyticsProgramming
Framework*
Distributed
Statistical Algorithms
External
Memory
Programming
Framework
A novel high-speed
file format designedspecificall to
support statistical
analyses
Addresses capacity
through a collection
of functions for
chunking through
massive data files
Extensible
framework for
creating new
external memory
algorithms
Addresses
performance by
distributing
computations
between cores and
computers
*Coming soon
7/27/2019 RevolutionAnalytics - Solution for Big Data
30/41
Revolution Confidential
RevoScaleR: Big Data Statisticswww.revolutionanalytics.com/bigdata
30
Every US airline departureand arrival, 1987-2008
File: AirlineData87to08.xdf
Rows: 123.5 million
Variables: 29
Size on disk: 13.2Gb
Average Arrival Delay by Day of Week by Departure Hour
DepartureHour
ArrDelay
-200
20
40
0123456789101112131415161718192021222324
Monday Tuesday
Wednesday
-20
0
20
40Thursday
-20
0
20
40Friday Saturday
-20
0
20
40Sunday
arrDelayLm2
7/27/2019 RevolutionAnalytics - Solution for Big Data
31/41
Revolution ConfidentialRevoDeployR: Enterprise-ready R server
Disseminate output of analysts to decision makers
Integrate R analytics into Web based applications Data Analysis and Visualization
Reporting Dashboards
Interactive applications
Revolution R Enterprise Server with RevoDeployR a standardized collection of web services for delivering R
capabilities to applications
7/27/2019 RevolutionAnalytics - Solution for Big Data
32/41
Revolution ConfidentialIntegrate R analytics with applications
32
7/27/2019 RevolutionAnalytics - Solution for Big Data
33/41
Revolution Confidential
Revolution DeployR
Data Analysis Business
Intelligence
Cloud / SaaSInteractive Web
Apps
Revolution R Web Services: DeployR
R / Statistical Modeling Expert
Deployment Expert
DataSources&
creationofAnalyt
ics
Consumptionof
Analytics&Results
7/27/2019 RevolutionAnalytics - Solution for Big Data
34/41
Revolution ConfidentialParallelR: computing in the cloud
Parallel processing onlocal machine, cluster, or
cloud
foreach replaces loops
Minimal code changes
Significant speedups
34Evolving R for Commercial Use
# 1000 simulations# 8 running in parallel
library(foreach)
require("doNWS")
s
7/27/2019 RevolutionAnalytics - Solution for Big Data
35/41
Revolution ConfidentialRevolution R: Ready for IT
Repeatable build process which isdocumented and open to auditCompliance
IQ and OQ documentation for streamlineddeploymentsDocumentation
Confidence in control over versioning anddistributionGovernance
Support from a world class engineeringand development organizationSupport
R is a modern language and is objectoriented, which means IT can use it tooModern
35
7/27/2019 RevolutionAnalytics - Solution for Big Data
36/41
Revolution ConfidentialComing soon: Revolution R GUI
36
Accessible
Powerful
Extensible
7/27/2019 RevolutionAnalytics - Solution for Big Data
37/41
Revolution Confidential
Conclusion
7/27/2019 RevolutionAnalytics - Solution for Big Data
38/41
Revolution ConfidentialWhy R?
Embraced by the academic community,which is creating a lot of R TalentAvailable Talent
Large innovative community sharing theirideas and models (2M+ users)Open Knowledge
Its open source nature leads to rapidinnovationRapid Innovation
R is an object oriented modern programminglanguage, which is IT friendlyModern Language
Most complete math library, which can bebroken down at a granular levelRich & Flexible
38
7/27/2019 RevolutionAnalytics - Solution for Big Data
39/41
Revolution Confidential
Power
ScaleR
ParallelR
Intel MKL build
Visual IDE
DeployR
GUI*
Support
Qualification
Services & Training
Control
Enterprise Readiness Productivity
Revolution brings R to the Enterprise
*Coming in 2011
Revolution R Enterprise:
7/27/2019 RevolutionAnalytics - Solution for Big Data
40/41
Revolution Confidential
Revolution R Enterprise:
Addressing Key Challenges
40
RevoScaleR: Process and Analyze large data sets
Revolution R Enterprise: Advanced Analytics
R: The tool of choice for todays analyst
RevoDeployR: Disseminate Results
7/27/2019 RevolutionAnalytics - Solution for Big Data
41/41
Revolution Confidential
The leading commercial provider of software and support for the
popular open source R statistics language.
www.revolutionanalytics.com(650) 330 0553
Twitter: @RevolutionR
http://www.revolutionanalytics.com/http://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://twitter.com/RevolutionRhttp://www.revolutionanalytics.com/