Top Banner
©UFS R for Data Analysis and Data Mining Jianping Liu Mar 19, 2014
17

R for Data Analysis and Data Mining

Feb 22, 2016

Download

Documents

creda

R for Data Analysis and Data Mining. Jianping Liu Mar 19, 2014. Outline. R and RStudio installation Basics of R : data types and operators R for Statistical Analysis and Data mining. What is R?. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R for Data Analysis and Data Mining

©UFS

R for Data Analysis and Data Mining

Jianping Liu

Mar 19, 2014

Page 2: R for Data Analysis and Data Mining

2

Outline

• R and RStudio installation

• Basics of R : data types and operators

• R for Statistical Analysis and Data mining

Page 3: R for Data Analysis and Data Mining

3

What is R?

• “a language and environment for statistical computing and graphics”; a combination of statistical packages ( interactive statistical analysis) and a programming language

• a dialect of the S language that was developed at AT&T Bell Laboratories by Rick Becker, John Chambers and Allan Wilks in 90’s

• Run on multiple platforms and various devices: MacOS, Windows, Linux, PC, iPhone …

• Frequent releases and bugfix; active development

• Free

Page 4: R for Data Analysis and Data Mining

Installation of R and Resources online

• http://www.r-project.org/

• http://www.rseek.org/

• http://www.rstudio.com/

• http://www.rdatamining.com/

• http://www.ats.ucla.edu/stat/r/

# R download & installation

# RStudio installation

# web-based R search

• http://cran.r-project.org/doc/manuals/R-intro.html

# data mining examples

# Stat analysis examples

• http://www.coursera.org # R Programming start 4/7/2014

Page 5: R for Data Analysis and Data Mining

5

RStudio : an integrated development environment for R

Page 6: R for Data Analysis and Data Mining

6

The uses of R

• R may be used as a calculator• R provide numerical or graphical summaries of data• R has extensive graphical abilities• R will handle a variety of specific analyses• R is an interactive programming language

• Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)• S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)

useofR

Page 7: R for Data Analysis and Data Mining

7

Packages

• Install.packages(“name of the package”)

• library(pkg)

• detach(“package:pkg”)

• update.packages(“”)

Example:

install.packages(“sos”)

library(sos)

Alert: R is case sensitive

Page 8: R for Data Analysis and Data Mining

8

Getting help and info

• help(package=“sos”) #documentation on topic• ?'&&'• ??audit• help.search("time series")• library(sos)• findFn("time series")• example(data.frame)• demo(lm.glm, package=“stats”, ask=T)

helpsearch.R

Page 9: R for Data Analysis and Data Mining

9

Data Types and Basic Operations

R has five “atomic” classes of Objects:• Character• Numeric (real numbers)• Integer• Complex• Logical(True/False)The most basic object is a vector• A vector contain objects of the same class : c()• A list can contain objects of various classes: list()

Page 10: R for Data Analysis and Data Mining

10

Data Types and Basic Operations

Matrices are vectors with a dimension attribute.• The dimension attribute is itself an integer vector of length 2

(nrow, ncol)• Matrices are constructed column-wise, or specify row-wise

Factors are used to represent categorical data.• Factors can be unordered or ordered.• One can think of a factor as an integer vector where each

integer has a label.

Page 11: R for Data Analysis and Data Mining

11

Data frames are used to store tabular data

• They are fundamental to the use of the R modelling and graphics functions

• They are represented as a special type of list where every element of the list has to have the same length

• Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class

• Data frames are usually created by calling read.table() or read.csv()

• Can be converted to a matrix by calling data.matrix()

Data Types and Basic Operationsdatatypes

Page 12: R for Data Analysis and Data Mining

12

R for Regression Analysis

http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf Faraway_practical linear model

logitRegression.R

• Regression analysis is the analysis of the relationship between a response or outcome variable and another set of variables

• The relationship is expressed through a statistical model equation that predicts a response variable (also called a dependent variable or criterion) from a function of explanatory variables (also called independent variables, predictors, factors, or carriers) and parameters

Page 13: R for Data Analysis and Data Mining

13

R for Time series Analysis

• Introductory Time Series with R

• Time Series Analysis and Its Applications: With R Examples (3rd ed) by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2011(package: astsa)

http://www.stat.pitt.edu/stoffer/tsa3/

http://elena.aut.ac.nz/~pcowpert/ts/#RScripts

Page 14: R for Data Analysis and Data Mining

14

R Reference Card

R_referencecard_2.0

R_referencecard_regression

R _referencecard_timeseries

R_referencecard_data_mining

Page 15: R for Data Analysis and Data Mining

15

Data Mining with Rattle

# to install package rattle and load the GUI

install.packages("rattle", dependencies = c("Depends", "Suggests"))library(rattle)rattle()

• Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) by Graham Williams • http://www.r-project.org/doc/bib/R-books.html

Page 16: R for Data Analysis and Data Mining

16

Drawbacks of R

• Little support on dynamic or interactive graphics

• Objects must generally be stored in physical memory

• Functionality is based on consumer demand and user distribution

• Not ideal for all situations

Page 17: R for Data Analysis and Data Mining

17

Thank you !