Top Banner
R Language for Business Analytics Kamakshaiah Musunuru Dhruva College of Management, Hyderabad September 1, 2013
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introtor

R Language for Business Analytics

Kamakshaiah Musunuru

Dhruva College of Management, Hyderabad

September 1, 2013

Page 2: Introtor

Objectives

Pre-history

About R

how to obtain R

IntroductionSpeciality

Muenchen’s Survey

O’Conner’s Comparison

Help

Page 3: Introtor

Objectives

I To know about R

I To asertain Characteristics of R

I To compare with other proprietary alternatives

I To evaluate with the help of an example

Page 4: Introtor

I The predecessor for R is S.I S was developed by John Chanmbers (earlier versions) along

with Rick Becker and Allan Wilks of Bell Laboratories

I the project was started on May, 1976.I in 1979, S was ported to UNIXI S-Plus and R happened to be by-products of S. 1

I S was available for academic and commercial purposes fromATΓT Laboratories.

1Ironically, R stood at top 26 best software languages, where as S andS-Plus are observed in 100.

Page 5: Introtor

I R began as a research project by Ross Ihaka andRobert Gentleman at University of Aukland in 1990s.

I R is programming language, meant for statistical computing.

I R is open source software, supported by volunteers all aroundthe world. But the central control in the hands of a groupcalled R-core

I The base system provides:I interactive language for numerical computingI data managementI graphicsI a variety of related calculations

Page 6: Introtor

Please visit:http://r-project.org

R is available for Windows, Linux and MacOS.

Page 7: Introtor

Some important websites for R:

I CRAN

I BioConductor

I omegahat

I RForge

Page 8: Introtor

Introduction to R

R is an integrated suite of software facilities for data manipulation,calculation and graphical display. Among other things it has

I an effective data handling and storage facility,

I a suite of operators for calculations on arrays, in particularmatrices,

I a large, coherent, integrated collection of intermediate toolsfor data analysis,

I graphical facilities for data analysis and display either directlyat the computer or on hardcopy, and

I a well developed, simple and effective programming language(called ’S’) which includes conditionals, loops, user definedrecursive functions and input and output facilities.(Indeedmost of the system supplied functions are themselveswritten in the S language.)

Page 9: Introtor

Introduction - Continued

I R is a GNU project which is similar to the S language andenvironment which was developed at Bell Laboratories(formerly ATT, now Lucent Technologies) by John Chambersand colleagues.(Please visit http://www.r-project.org/)

I R provides a wide variety of statistical and and graphicaltechniques, and is highly extensible. some of them are:

I Linear ModellingI Non-Linear ModellingI Classical Statistical TestsI time-series AnalysisI Classification, clusteringI Neural NetworksI Social Network AnalysisI Linear Programming, integer-programming and etcI and many more.............

Page 10: Introtor

Introduction - More...

I R is Ligua Franca of statistical research

I Over all SAS is 11 years behind R (William Ravelle)

I Most importantly R is not only free but also open source-which mean much more

I R is available under GNU Copy-left

I The recent R version 2.15.3 (Security Blanket) has beenreleased on 2013-03-01

Page 11: Introtor

Speciality of R

I By Tal Galili (from http://www.kdnuggets.com/), he assertsthat:

I R has largest number of email discussionsI The number of R packages published on CRAN continue to

grow (than STATA and SAS)I R has more blogs (appox. 170) the second to R is SAS (only

31 blogs)I Even in terms of job opportunities it might not be worse

I 41 percent SASI 15 percent SPSSI 14 percent R

Page 12: Introtor

Meunchen Survey - CRAN growth

2

2Fig. 1: CRAN-growth

Page 13: Introtor

Introduction - Speciality of R

I By R A Muenchen (fromhttp://r4stats.com/articles/popularity/), he observes that

I R counts for more number of downloads (but it might bedifficult to count)

I TIOBE (http://www.tiobe.com, community programminglanguage index) ranked R ranked as 24th best programminglanguage (SPSS was out from the list)

I Transparent Language Popularity Index (TLPI) ranked R as 12most wounderful languages on the globe; the SAS as 26th

I R observed as most wanted on online discussionsI Mean monthly email disscussions for R are more than 3000I Mean monthly email disscussions for STATA are more than

1000I Mean monthly email disscussions for SAS are less than 1000I Mean monthly email disscussions for SPSS are less than 500

I The assumption is being that what you want is that what youtalk

Page 14: Introtor

Introduction - downloads

Illustration.1: most important package downloads -Bioconductor

Page 15: Introtor

Introduction - Downloads

Fig.2: Downloads of Bioconductor package

Page 16: Introtor

Introduction - What Muenchen Said?His book, ”R for SAS and SPSS users” is a great work for miners andanalyst.He studied popularity of data analysis software with respect certainfactors(https://sites.google.com/site/r4statistics/popularity):

I sales downloadsI Language popularity measuresI Internet discussionsI CompetitionI UsageI Literature booksI Impact on scholarly activityI Website popularityI Growth in pupularityI IT Research firmsI Job markets

Page 17: Introtor

Meunchen Survey - Number of Users

3

3Fig. 3: Number of Users and Analytics

Page 18: Introtor

Meunchen Survey - Email Discussions

4

4Fig. 4: Traffic on Email Discussons

Page 19: Introtor

Meunchen Survey - Scholers Hits

5

5Fig. 5: Scholers hits on software

Page 20: Introtor

Meunchen Survey - Job Market

6

6Fig. 6: Jobs for analytics software on Indeed.com

Page 21: Introtor

Introduction - ComparisonAccording to Brendan O’Conner (expert of artificial intelligenceand social science researcher):

I there are two big divisions of solutions; they are:I programming oriented solutions like R, Matlab, PythonI analytic solutions like Excel, Stata, and SPSS

I Python is “immature”

I Matlab is certainly “weak”, but might be better formathematical algorithms

I SPSS and Stata are equal in capabilities; perhaps Stata mightbe much cheaper than SPSS

I These two are for those who crave for easy ways andshort-cuts.....

I SAS is favoured by older crowd....

I SAS people complain that that the graphical outputs are poor

I Matlab visualization too is in little controversy compared to R

So, why not we try R!

Page 22: Introtor

Introduction - O’Conner’s Comparison

Name Advantages DisadvantagesR Library support Steep

Visualization learningcurve

Matlab Elegant visualization Expensivematrix support

SciPy/ Python ImmatureNumPy/Matplotlib

Excel Easy; visual Largeflexible datasets

SAS Large datasets Expensiveoutdated

programming language

Stata Easy statisticalanalysis

SPSS Like stata butmore expensive and wost

7

7Illustration-2: comparison 1

Page 23: Introtor

Introduction - O’Conner’s Comparison - Continued

Name Open Source Typical UsersR Yes Finance and Statistics

Matlab No Engineering

SciPy/ Yes EngineeringNumPy/Matplotlib

Excel No Business

SAS No Business;Government

Stata No Science

SPSS No Business; Academics

8

8Illustration-3: comparison 2

Page 24: Introtor

Last but not least......

I Ista Zahn 9 says that ....”I am the only person in my department who uses LATEX andR. Because Sweave simply provides a way to integrate thesetwo programs, it follows that I am the only Sweave user aswell. Why have I taken the time and eort to learn theseprograms instead of following the crowd and sticking withWord and SPSS? Quite simply, I made the switch becauseusing LATEX and R is actually easier. It took me some time tobecome familiar with these programs, but after using them fora couple of months I am firmly convinced that I am moreproductive with these programs than I ever was with Wordand SPSS.

9Zahn, I. (2008). Learning to Sweave in APA Style. The PracTEX Journal,No-1

Page 25: Introtor

R Help

I Manuals

I Help Files

I Help.Search (?help.search)

I The wikis

I The mailing lists

I Journals

Page 26: Introtor

Thanks for Your Patient ListeningAny Questions?