Introduction to R nitesh chhabria
Jul 19, 2015
Introduction to Rnitesh chhabria
The R environment
!
R is integrated suite of software facilities for data manipulation, calculation and graphical display.
its also open source - “cheers”
How to get and use R?
http://r-project.org/ is place where u can get the all the core packages.
How to start using it?
— Terminal :
> R // Type this command
— GUI :
Find the installed R application and double click it
Some basic commands
!
source(“file.r”) Used for executing commands stored in file.r
sink(“record.lis”) All the subsequent outputs will be stored in record file
ls() Used to display the names of objects stored in within R
rm(ob) Removes the object ob from memory
Vectors
!
R works on data structures. Vector is simplest of them.
> vec <- c(1, 2, 3, 4, 5) // <- is assignment operator and c is function used for creating vectors
> vec
[1] 1 2 3 4 5
> vec +1 // can u guess what will be output?
[1] 2 3 4 5 6
Generating sequences
!
> vec1 <- 1:10 // Used for generating vector having elements from 1 to 10. > vec2 <- seq(-5, 5, by= 0.2) // Used for generation vector from -5 to 5 with difference of 0.2
> vec3 <- rep( vec1, times=10) // Will generate 10 copies of vec1
> temp <- vec > 3 // Will check condition for all the elements in vec
> vec4 <- c(“hello”, “there”) // Will create vector of strings
!
!
Matrices
> X <- matrix(NA, nrow= 7 , ncol= 3)
> X
[,1] [,2] [,3] [1,] NA NA NA [2,] NA NA NA //This will create a matrix with values not available [3,] NA NA NA . Indexing starts from 1 [4,] NA NA NA [5,] NA NA NA [6,] NA NA NA [7,] NA NA NA
!
> X[row, col] syntax is used for accessing values of the cell
Lists
!
List is used to make parcel of unrelated items
> result <- list(mu = 0.3, sigma = 0.45, x =1:3)
> result$mu 0.3
> result$x [1] 1 2 3
> result.new <- “hello there” // Will add the string in new variable
Regression
!
Linear regression is used to find the best fit curve from the the given values so that the residual error is minimum.
Steps needed to find the best fit curve:
#collect data #define model #apply regression #use the generated values to predict
Linear model in R
Modelling is technique to represent the data mathematically
General form: response ~ op1 term1 op2 term 2 op3 term3...
Models and syntax: -Independent Variables - Y , A , B -Coefficients - β
!
Model Syntax
Y=βo +β1A Y~A
Y = β1A Y ~ -1 + A
Y = βo+ β1A + β2A2 Y ~ A + I(A^2)
Y = βo+ β1A + β2B Y~A+B
Y=βo +β1AB Y ~ A:B
Example
Data : > conc [1]0 10 20 30 40 50 > signal [1] 4 22 44 60 82 95
!
Expected model: signal = βo + β1×conc#
#
!
!
> lm(signal ~ conc)
Call: lm(formula = signal ~ conc) Coefficients: (Intercept) conc 3.60 1.94
> lm.r <- lm(signal ~ conc) !
Carrying out regression
> layout(matrix(1:4, 2, 2) !> plot(lm.r)
! !!!!
Uniform vs Normal Distribution
Normal
Uniform
Uniform Distribution
> runif(5000) // Will generate 5000 uniform dist points
> plot(runif(5000)) // Will plot all the points and produce UD
> plot(density(runif(5000))) // Density of all the numbers
Some statics:
> summary(runif(5000)) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0004056 0.2701000 0.5072000 0.5124000 0.7514000 0.9995000
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
density.default(x = runif(5000))
N = 5000 Bandwidth = 0.04717
Density
0 1000 2000 3000 4000 5000
0.0
0.2
0.4
0.6
0.8
1.0
Index
runif(5000)
Normal Distribution
> rnorm(5000) // Will generate 5000 uniform dist points
> plot(rnorm(5000)) // Will plot all the points and produce UD
> plot(density(rnorm(5000))) // Density of all the numbers
Some statics:
> summary(runif(5000)) Min. 1st Qu. Median Mean 3rd Qu. Max. -4.549000 -0.674800 0.005506 -0.001849 0.666600 3.629000
-4 -2 0 2 4
0.0
0.1
0.2
0.3
density.default(x = rnorm(5000))
N = 5000 Bandwidth = 0.1643
Density
0 1000 2000 3000 4000 5000
-4-2
02
4
Index
rnorm(5000)
rnorm(5000)
Auto correlation function
> acf(rnorm(100))
0 5 10 15 20
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Lag
ACF
Series rnorm(100)
!
!
references: http://www.montefiore.ulg.ac.be/~kvansteen/GBIO0009-1/ac20092010/Class8/Using%20R %20for%20linear%20regression.pdf
An Introduction to R - W. N. Venables, D. M. Smith and the R Core Team !
Thank You