Dec 24, 2015
Source Code- Tons of Code
Package- More Code- Statistical Functions- Datasets
Workspace- Fewer Lines of Code- Capability
http://www.statmethods.net/management/functions.html
Currently, how many R Packages?
At the command line enter: dim(available.packages()) available.packages()
Is there an R App Store?
Two heavyweights in the statistical software market are SAS and SPSS/IBM
R Packages have been created that are equivalent to the functionality of SAS and SPSS
XLConnect
XML
rhbase
sas7bdat
Rcpp
Packages for reading, writing for various file formats
RJSONIO
Hmisc
RODBC / ROracle
foreign
RMySQL
RWeka
Comma Separated Variables
Oracle R Enterprise (ORE)
R Being Integrated Into Other Data-Related Products
http://help.sap.com/hana/hana_dev_r_emb_en.pdf
https://blogs.oracle.com/R/
http://www-142.ibm.com/software/products/us/en/spss-stats-developer/
“Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.”`
http://support.sas.com/rnd/app/studio/Rinterface2.html
R “Machine Learning” Libraries
Analytic Technique R Package/Library Author OrganizationSupport Vector Mach. libsvm
(ksvm)Chih-Chung ChangChih-Jen Lin
National Taiwan Univ. + EBay Research Labs
Neural Networks neuralnet Frauke GuntherStefan Fritsch
Epidemiology and Prevention Research
nnet Brian Ripley University of Oxford
monmlp Alex J. Cannon Atmospheric Science
Randomized Forests randomForest Fortran original by Leo Breiman & Adele Cutler, R port by Andy Liaw and Matthew Wiener. Merck
Decision Trees rpart Terry M Therneau and Beth Atkinson. R port by Brian Ripley.
Mayo Clinic
University of Oxford
Boosting Model Ada Mark Culp West Virginia University
Maximum Entropy maxent Yoshimasha TsuruokaTimothy Jurka
University of TokyoUC-Davis
Bagging, bootstrap adabag Esteban Alfaro-Cortes La Universidad de Castilla-La Mancha
Latent Diralect slda Jonathan Chang Facebook
Naïve Bayes e1071 David MeyerEvgenia Dimitriadout
Vienna University
Bayesian Network bnlearn Marco Scutari. UCL Genetics Institute
Hidden Markov hiddenmarkov David Harte Statistics Research
Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%
Hadley Wickham
Asst. Professor of Statistics at Rice University
ggplot2plyrreshaperggobiprofr
Industries / Organizations Creating and Using R
Package Title Downloads1 plyr Tools for splitting, applying and combining data 840492 digest Create cryptographic hash digests of R objects 831923 ggplot2 An implementation of the Grammar of Graphics 827684 colorspace Color Space Manipulation 819015 stringr Make it easier to work with strings 776586 RColorBrewer ColorBrewer palettes 667837 reshape2 Flexibly reshape data: a reboot of the reshape package 649118 zoo S3 Infrastructure for Regular and Irregular Time Series 608449 proto Prototype object-based programming 59043
10 scales Scale functions for graphics 5836911 car Companion to Applied Regression 5745312 dichromat Color Schemes for Dichromats 5662413 gtable Arrange grobs in tables 5443114 munsell Munsell colour system 5318315 labeling Axis Labeling 5187716 Hmisc Harrell Miscellaneous 4783617 rJava Low-level R to Java interface 4773118 mvtnorm Multivariate Normal and t Distributions 4688419 bitops Bitwise Operations 4568920 rgl 3D visualization device system (OpenGL) 41001
http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/
Top 100 R packages for 2013 (Jan-May)
Specialized“Domain”
Beginner Some Coverage
statsgraphics(both built-in)
Data Managementplyrreshape
Graphicsggplot2
BayesianDifferentialEquationsEconometricsEnvironmetricsExperimentalDesignFinanceGeneticsHighPerformanceComputingMachineLearningMedicalImagingNaturalLanguageProcessingPharmacokineticsPhylogeneticsPsychometricsSocialSciencesSpatialTimeSeries
Easy to
Use
InteractiveStandard
Visualizations
SteepLearning
Curve
Visualization and Reporting
The R Graphics Package
Graphing Parameters
TitlesX-Axis TitleY-Axis TitleLegendScalesColorGridlines
library(help="graphics")
Basic Chart Types
In ggplot2 a plot is made up of layers.
ggplot2
Pl o t
Grammar of Graphics
Layer
- Data
- Mapping
- Geom
- Stat
- Postiion
Scale
Coord
Facet
Correlations Matrix library(car) scatterplotMatrix(h)
The Correlation Package was built on top of the Pairs Package
The next data visual was produced with about 150 lines of R code
http://shiny.rstudio.com/gallery/movie-explorer.html
• http://statmethods.net/• good documentation and sample code
• http://stackoverflow.com/• helpful for trouble-shooting code
• http://www.r-bloggers.com/• helpful for hearing about new things
Additional Resources