Top Banner
Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009
35

Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Dec 17, 2015

Download

Documents

Chester Cameron
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Intro to R

Stephanie Lee

Dept of Sociology, CSSCR

University of Washington

September 2009

Page 2: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Class Outline

I. What is R?

II. The R Environment

III.Reading in Data

IV.Viewing and Manipulating Data

V. Data Analysis

Page 3: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

What is R?

R is frequently thought of as another statistics package, like SPSS, Stata or SAS.

While many people use R for statistical analysis, R is actually a full programming environment.

Page 4: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

What is R?

R is completely command-driven.

There are very few menu items, so you must use the R language to do anything.

Another important distinction between traditional stats packages and R is that R is object-oriented.

Page 5: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Why Use R?

Free!Extremely flexibleMany additional packages availableExcellent graphics

DisadvantagesSteep learning curveDifficult data entry

Page 6: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Download R

Download R:

http://cran.r-project.org

Available for Linux, MacOS, and Windows

Page 7: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

The R Environment

A traditional stats program like SPSS or Stata only contains one rectangular dataset at a time. All analysis is done on the current dataset.

In contrast, the R environment is like a sandbox.

It can contain a large number of different objects.

Page 8: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

The R Environment

R is also function-driven.

The functions act on objects and return objects.

Functions themselves are objects, too!

function works its black-box magic!

InputArguments(Objects)

Output(Objects)

Page 9: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Rectangular Dataset(Excel, SPSS, Stata, SAS)

Variable 1 Variable 2 Variable 3

Case 1

Case 2

Case 3

Case 4

Case 5

Page 10: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

R Environment (Object-Oriented)

Function 1

Function 2

Results

Vector 1

Vector 2

Matrix

Data Frame

String

Numeric Value

Page 11: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Help Function

help(function name)help.search(“search term”)

Note: R is case-sensitive!

Try: help(help), ls()

Page 12: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Help Function

Sometimes one help file will contain information for several functions.

Usage: Shows syntax for command and required arguments (input) and any default values for arguments.

Value: the output object of the function

Page 13: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Setting Up Our Data

> library(datasets)> mtcars> ?mtcars> write.csv(mtcars, “C:/temp/cars.csv”)

Page 14: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Creating Objects

Assignment operator: = or <-

Objects need to be assigned a name, otherwise they get dumped to main window, not saved to the environment.

c() is a useful function for creating vectors

Page 15: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Reading in Data

read.table(filename, ...)

> cars = read.csv(C:/temp/cars.csv)

I prefer the CSV (comma-separated values) format. Almost every stats program will export to this format.

Page 16: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Viewing Data

What does the dataset look like?

> str(cars)> colnames(cars)> dim(cars)> nrow(cars)> ncol(cars)You can also assign row/col names with these

functions.

Page 17: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Common Mode Types

Mode Possible Values

Logical TRUE or FALSE or NA

Integer Whole numbers

Numeric Real numbers

Character Single character or String (in double quotes)

Page 18: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Common Object Types

Object Modes More than one mode?

vector Logical, Char, or Numeric

No

factor Logical, Char, or Numeric

No

matrix Logical, Char, or Numeric

No

data frame Logical, Char, and Numeric

Yes

Page 19: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Creating Objects

Object Create Function

vector c(), vector()

factor factor()

matrix matrix()

data frame data.frame()

Page 20: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Viewing Data: Indexing

datasetname[rownum, columnnum]

> cars[1,4] displays value at row 1, column 4

> cars[2:5, 6]displays rows 2-5, column 6

Page 21: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Viewing Data: Indexing

> cars[, 2] displays all rows, column 2

> cars[4,]displays row 4, all columns

Page 22: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Viewing Data

You can also access columns (variables) using the ‘$’ symbol if the data frame has column names:

> cars$mpg> cars$wt

Page 23: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data

Now we can give that first column (variable) a better name than “X”.

> colnames(cars) = c(“name”, colnames(cars)[2:ncol(cars)])

Page 24: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data

> str(cars)

R has the unfortunate habit of trying to turn vectors of character strings into factors (categorical data).

> cars$name = as.character(cars$name)

Page 25: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data: Operators

Arithmetic: + - * / ^

Comparison

< less than

> greater than

<= less than or equal to

>= greater than or equal to

== is equal to

!= is not equal to

Logical

! not

& and

| or

xor() exclusive or

Page 26: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data

Viewing subsets of data using column names and operators:

> cars[cars$vs == 1,]> cars[cars$cyl >= 6,]> cars$name[cars$hp > 100]> cars$name[cars$wt > 3]

Page 27: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Analyzing Data

What do the variables look like?

> table(cars$gear)> hist(cars$qsec)> mean(cars$mpg)> sd(cars$mpg)> cor(cars$mpg, cars$wt)> mean(cars$mpg[cars$cyl == 4])

Page 28: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data

Transforming variables:

> wt.lb = cars$wt * 1000

This creates a new vector called wt.lb of length 32 (our number of cases).

Page 29: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Manipulating Data

We can use wt.lb without “adding” it to our dataframe.

But if you like the rectangular dataset concept, you can column bind it to the existing dataframe:

> cars = cbind(cars, wt.lb)

Page 30: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Data Analysis

Hypothesis Testing

t.test, prop.test

Regression

lm(), glm()

Page 31: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Data Analysis: OLS Regression

> regr = lm(cars$mpg ~ wt.lb + cars$hp + cars$cyl)

The output of the regression is also an object. We’ve named it regr.

> summary(regr)

Page 32: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Saving Data

You can use write.csv() or write.table() to save your dataset.

When you quit R, it will ask if you want to save the workspace. This includes all the objects you have created, but it does not include the code you’ve written. You can also use save.image() to save the workspace.

You should always save your code in a *.r file.

Page 33: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Other Useful Functions

> ifelse()> is.na()> match()> merge()> apply()> order()> sort()

Page 34: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Other Resources

Main R website: http://www.r-project.org

UW CSSS Intro to RUW CSDE Intro to R UCLA Statistical Computing

http://www.ats.ucsla.edu/stat

Page 35: Intro to R Stephanie Lee Dept of Sociology, CSSCR University of Washington September 2009.

Advanced Topics

More on factorsLists (data type)LoopsString manipulationWriting your own functionsGraphics