An introduction to Microsoft R Services Microsoft R Open and Microsoft R Server 498 – Show and Tell Gregg Barrett
An introduction to Microsoft R Services
Microsoft R Open and Microsoft R Server
498 – Show and Tell Gregg Barrett
Introduction
This presentation will briefly cover the following:
- Why consider MRO and R Server
- R Server
- MRO
- Microsoft R Services/R Server Platform- DistributedR
- RevoScaleR/ScaleR
- ConnectR
- DevelopR
- DeployR
- Resources
- References
Why consider MRO and R Server
- You get the optionality of working with R and the added benefits of Microsoft R Open (MRO) and R Server
- Performance
- MRO is FREE
- R Server is FREE – well for students at least through DreamSpark
Why consider MRO and R Server
(Gartner, 2015)
Definition: Originally released in 1993, R is a mature, domain-specific and open-sourced language for statistical analysis workloads.
Trend Analysis: Gartner client inquiry levels for R remain light and range from exploratory to best-practice adopter themes; however, like MATLAB, the number of inquiries has increased substantially in recent years. External data sources reflect a growth in R usage across the industry as well. We expect inquiry levels to increase consistently through 2017.
Time to Next Market Phase: 2 to 5 years
Business Impact: The significant impact of "big data" analytics and real-time data analysis is driving demand for languages such as R and MATLAB beyond previous entrenched market niches and into increasingly mainstream programming workloads. In particular, adopters are turning to R as a free alternative to platforms such as SAS and SPSS.
User Advice: Consider R as a free and open-source solution for workloads that require advanced statistical computing or data mining capabilities with minimal coding and optimal maintenance costs over more general-purpose languages.
Sample Vendors: Microsoft, Oracle, TIBCO Software, IBM, Wolfram Research (Gartner, 2015)
Why consider MRO and R Server
Why consider MRO and R Server
(Microsoft, 2016)
R Server
- Revolution R Enterprise (RRE) was developed by Revolution Analytics
- RRE is intended to offer a fast, cost effective enterprise-class big data analytics platform
- Revolution Analytics was acquired by Microsoft
- RRE is now Microsoft R Server
- R Server is free for students and can be obtained through DreamSpark
- Logon or create a profile at DreamSpark using your university credentials: https://www.dreamspark.com/Product/Product.aspx?productid=105
- RRE uses an R engine called Revolution R Open
- The Revolution R Open engine is now called Microsoft R Open (MRO)
- MRO is intended to be an enhanced distribution of open source R from Microsoft Corporation. Specifically Microsoft R Open leverages high-performance, multi-threaded math libraries to deliver performance boosts. This means that functions in R that use, for example, matrix multiplication, will run faster out of the box.
- Just like R, Microsoft R Open is open source and free
- You can download MRO here: https://mran.revolutionanalytics.com/download/
- MRO is intended to support a variety of big data statistics, predictive modelling, and machine learning capabilities
- At the time of this writing the latest version of MRO is version 3.2.5
MRO
- It is important to note that R Server uses a different version of MRO
- At the time of this writing the latest version of MRO for R Server is version 3.2.2
- MRO for R Server can be found here: https://mran.revolutionanalytics.com/download/mro-for-mrs/
- MRO for R Server is a prerequisite for R Server
- After downloading and installing MRO whether it be the version for R Server or not, download and install MKL
- MKL is the Intel Math Kernel Library
- Important: Install Microsoft R Open first before MKL
MRO
Microsoft R Services/R Server Platform
Note: There are name changes due to the Microsoft acquisition with the “Revo” designation/reference falling away – making things a little more challenging.
Microsoft R Services is positioned as R for the Enterprise.
The feature set provided by the Microsoft R Services software can be categorized as follows:
- Microsoft R Open: High performance math libraries installed on top of a stable version of Open Source R
- DistributedR: Parallel and distributed computing framework for Big Data Analytics
- RevoScaleR/ScaleR: High performance, scalable, parallelized and distributable for Big Data Analytics in R
- ConnectR: Data connections for the Big Data Analytics
- DevelopR: An integrated development environment (IDE) for R on Windows
- DeployR: A web services software development kit for integrating R with third party products (including business intelligence, data visualization, rules engines, etc.)
Microsoft R Services/R Server Platform
DistributedR
DistributedR allows you to run the same R script on multiple platforms; you can create a model in one environment such as a workstation and then deploy it on a different environment such as an on-site Microsoft SQL Server, a Teradata platform, or a Hadoop cluster in the cloud. You just need to specify the information about where these computations should be performed and what data should be analyzed.
For information on supported computing environments, look for the “compute contexts” in the RevoScaleR package.
RevoScaleR
RevoScaleR/ScaleR package provides efficient, scalable computational power and allows for the development of ready-to-deploy suites of data processing and analytics with R.
To learn more, look for the RevoScaleR “rx” analysis and data manipulation functions and “rxExec” for HPC functionality. If you are computing decision trees, also check out the included RevoTreeView package that allows you to interactively visualize your decision trees.
Or run the following script: ?RevoScaleR
The RevoScaleR package provides a way for you to connect with the data you may have stored in a variety of formats: SAS, SPSS, Teradata, ODBC, delimited and fixed format text, and Hadoop Distributed File System (HDFS) text files. You have a choice of:
1. keeping the data as is and analyzing it directly with RevoScaleR analysis functions,
2. extracting the data you want to analyze and storing it in the efficient and higher performance .xdf file format provided with the RevoScaleR package, or
3. bringing some or all of your data into memory as an R data frame to use with any R analysis function.
To learn more, look for data sources in the RevoScaleR package.
Note: The RevoScaleR package is included with every distribution of RRE/R Server, and is automatically loaded into memory when you start the program. So all of the “rx” functions mentioned are at your fingertips.
You can get information on them by using the ? at the command line, for example: ?rxLinMod
ConnectR
DevelopR
Microsoft R Services provides a tool for the R developer to efficiently create sets of R scripts—the R Productivity Environment (RPE).
Working on a Windows workstation with the RPE, the R developer has a full-featured Visual Studio-like integrated development environment for R, including an indispensable visual debugger for R. The RPE has a customizable workspace, including an enhanced Script Editor, an Object Browser, a Solution Explorer, and an R Command Console.
DeployR
The optional DeployR package provides the tools for doing just that; it is a full-featured web services software development kit for R which allows programmers to use Java, JavaScript or .Net to integrate the R analysis output with a third party package.
There are now Accelerators for DeployR which are starter kits for integrating with tools including:
- Microsoft Excel
- Tableau
- Jaspersoft
- QlikView
R Server User Interface
Resources
R Services 2016 Getting Started Guide:
https://packages.revolutionanalytics.com/doc/8.0.0/win/MicrosoftRServices_Getting_Started.pdf
Webinar “Using Microsoft R Server to Address Scalability Issues in R”: https://channel9.msdn.com/blogs/Cloud-and-Enterprise-Premium/Using-Microsoft-R-Server-to-Address-Scalability-Issues-in-R
Task Views are guides on CRAN that group sets of R packages and functions by type of analysis, fields, or methodologies. You can browse and find packages organized by task view:
https://mran.microsoft.com/taskview/
Resources
Software available to NU students:
http://www.it.northwestern.edu/software/
https://northwestern.onthehub.com/WebStore/Welcome.aspx
https://www.dreamspark.com/Student/Software-Catalog.aspx
Gartner. (2015). IT Market Clock for Programming Languages, 2015. [Diagram]. Retrieved from Gartner. (2015).
IT Market Clock for Programming Languages, 2015. [pdf]. https://www.gartner.com/doc/3145117/it-market-clock-programming-languages
Gartner. (2015). IT Market Clock for Programming Languages, 2015. [pdf]. Retrieved from https://www.gartner.com/doc/3145117/it-market-clock-programming-languages
Microsoft, (2016). The Benefits of Multithreaded Performance with Microsoft R Open. [webpage]. Retrieved from
https://mran.microsoft.com/documents/rro/multithread/
Microsoft, (2016). R Services 2016 Getting Started Guide. [pdf]. Retrieved from https://packages.revolutionanalytics.com/doc/8.0.0/win/MicrosoftRServices_Getting_Started.pdf
References