Top Banner
R Studio R Basics Operators Packages Importing Visualization DataCamp R: Introduction Olga Scrivner 1 / 67
78

Introduction to R - from Rstudio to ggplot

Jan 23, 2018

Download

Data & Analytics

Olga Scrivner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R: Introduction

Olga Scrivner

1 / 67

Page 2: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Acknowledgments

Center of Excellence for Women in Technology (CEWiT)

Social Science Research Commons (SSRC)

Cyberinfrastructure for Network Science Center (CNS)

2 / 67

Page 3: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Outline

1 Intro to RStudio

2 Using R scripts

3 Installing packages

4 R objects

Data types

Vectors

Lists

5 Getting help

6 Data visualization

3 / 67

Page 4: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Materials Needed

1 https://languagevariationsuite.wordpress.com/

2017/08/07/r-introduction-sph-workshop/

2 intro.r

3 plotting.r

4 Movie metadata csv

4 / 67

Page 5: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R software

R is a free software for statistical analysis, text mining andgraphics.

To install R on Window:

1 Download the binary file for R https://cran.

r-project.org/bin/windows/base/R-3.3.1-win.exe

2 Open the downloaded .exe file and Install R

To install R on Mac:

1 Download the appropriate version of .pkg filehttps://cran.r-project.org/bin/macosx/

2 Open the downloaded .pkg file and Install R

5 / 67

Page 6: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R Studio

RStudio is a free user interface for R.

1 Install the appropriate RStudio version https:

//www.rstudio.com/products/rstudio/download/

2 Run it to install R-studio

6 / 67

Page 7: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

R Studio Structure

For more details - see handout RStudio101 (by OscarTorres-Reyna)

http://dss.princeton.edu/training/RStudio101.pdf7 / 67

Page 8: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Organizing Your Files

Option 1

Create new script / Open existing script

Set up your working directory

Keep your datafiles in this directory (easy access)

Or use command file.choose()

Or remember the path to datafiles

Option 2

Create new project/ Open existing project

Do not have to set up working directory

Keep your datafiles in the project directory

Do not have to remember the path to datafiles8 / 67

Page 9: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Projects

9 / 67

Page 10: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Projects

9 / 67

Page 11: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating R Script

10 / 67

Page 12: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Saving R Script

11 / 67

Page 13: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Closing and Opening Scripts

Close R File: File → Close

Open R File: File → Open

12 / 67

Page 14: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Editing Script: Font and Size

13 / 67

Page 15: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

RStudio - Full View

14 / 67

Page 16: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Learning R Syntax

variable stores values

Assignment operator: <-

x <- 5

y <- 6

A valid name for variable must start with a letter.

Name can contain letters, numbers, underscores, and dot.

Valid names Invalid names

mydata

my data

mydata2

my.data

mydata!

my data

2mydata

.mydata15 / 67

Page 17: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Script Flow

1 Create two variables

x <- 5y <- 6

2 run executes commands:

- Place cursor anywhere on the first line - click run- Place cursor on the second line - click run

3 Console displays the execution

4 Right top

- Environment stores objects- History stores commands

16 / 67

Page 18: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Values

1 Change value of y to 6.5

2 Examine objects in environment

17 / 67

Page 19: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Comments

1 Comments are not executed

2 Comments are preceded by # (hash tag)

3 Type a comment above your first line of code

18 / 67

Page 20: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Print()

Function print() prints the value into your console

Inside the parenthesis you type the name of your variable

Examine the output in the console

19 / 67

Page 21: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Characters versus Numeric Values

Numbers are without quotation marks:

x <- 5

Characters are enclosed in quotation marks:

z <-“a”

Arithmetic operations with numerics

In the console type x*y, press enter

In the console type z*w, press enter

20 / 67

Page 22: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Characters versus Numeric Values

Numbers are without quotation marks:

x <- 5

Characters are enclosed in quotation marks:

z <-“a”

Arithmetic operations with numerics

In the console type x*y, press enter

In the console type z*w, press enter

20 / 67

Page 23: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Logical Values

1 TRUE, FALSE - upper case, no quotes

2 Add comment # logical values

21 / 67

Page 24: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Data Types

1 Data types:

LogicalNumericCharacter

2 Function class() identifies the class type

3 Type in the script

4 Examine the console

22 / 67

Page 25: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vector - Basic Types

Vector: A sequence of data elements of the same basic type

Numeric

c(2, 3, 5)

Logical

c(TRUE, FALSE, TRUE)

Character string

c("aa", "bb", "cc")

23 / 67

Page 26: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vector

In the script create two vectors:

Examine the environment

24 / 67

Page 27: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Length

Function length() of a vector

length(v1)

Create a vector with words:

mywords <-c(“These”, “are”,“my”,“words”)

1 How many words in mywords?

25 / 67

Page 28: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Index Slicing

1. [1:3] - consecutive elements: one, two, three

2. [c(1,3)] - only the elements one and three

3. [-2] - all except the element number two

Extract the first and the second elements

Extract all except the first element

Extract the first and the fourth elements

26 / 67

Page 29: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing

How to extract certain elements from a vector?

What is the first word in mywords?

- mywords[1]

What are the first and second words in mywords?

- mywords[1:2]

What are the first and third words in mywords?

- mywords[c(1,3)]

27 / 67

Page 30: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Combining Vectors - Strings

vector1 <- c("my", "first", "vector")

vector2 <- c("my", "second", "vector")

vector3 <- c(vector1, vector2)

print(vector3)

28 / 67

Page 31: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

Page 32: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

Page 33: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - Arithmetic Operations

Click RUN to execute each line

v1 <- c(1, 3, 6)

v2 <- c(2, 4, 6)

v1*v2

v1+v2

v1/v2

vector1*vector2 - what will happen?

vector3 <- c(vector1, vector2)

29 / 67

Page 34: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Vectors - paste

paste(vector1, "+", vector2, sep = " ")

paste(vector1, "+", vector2, sep = "")

paste(vector1, "+", vector2, collapse = " ")

30 / 67

Page 35: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Usefulness of paste - Create a Plot Title

Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups

y <- "Frequency"

x <- "Age Groups"

title <- "My plot:"

c(title,y,"of",x)

paste(title,y,"of",x,collapse=" ")

31 / 67

Page 36: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Usefulness of paste - Create a Plot Title

Scenario: You are going to create a plot with x (Age Groups)and y (Frequency) with the following titleMy plot: Frequency of Age Groups

y <- "Frequency"

x <- "Age Groups"

title <- "My plot:"

c(title,y,"of",x)

paste(title,y,"of",x,collapse=" ")

31 / 67

Page 37: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Lists

List: a vector that can contain different types

mylist <- list(vector1, v1)

print(mylist)

[[ ]] - index for lists

[ ] - index for vectors32 / 67

Page 38: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

List versus Vector

Vectors contain the objects of the same type:

- v1 <- c(“a”,“b”,“c”)

- v2 <- c(1,2,3,4)

Lists contain different types of objects

Vector uses c() function

List uses list() function

Create mylist:

miniquiz: What are the data types in mylist?

33 / 67

Page 39: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

List versus Vector

Vectors contain the objects of the same type:

- v1 <- c(“a”,“b”,“c”)

- v2 <- c(1,2,3,4)

Lists contain different types of objects

Vector uses c() function

List uses list() function

Create mylist:

miniquiz: What are the data types in mylist?

33 / 67

Page 40: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

Page 41: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

Page 42: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

Page 43: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Indexing List

1 Print list: print(mylist)

2 Remember vector indices [ ]?

3 List will use [[ ]]

4 Type mylist[[1]]

5 Type mylist[[7]]

6 How to access the first numberinside the list object?

7 mylist[[7]][1]

34 / 67

Page 44: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Arithmetic

35 / 67

Page 45: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

36 / 67

Page 46: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

37 / 67

a <- 1

b <- 2

a > b

a <= 2

a != b

a == b

Page 47: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Operators: Logical

37 / 67

a <- 1

b <- 2

a > b

a <= 2

a != b

a == b

Page 48: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Installing Packages

In your bottom left window - go to Packages

38 / 67

Page 49: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Selecting Packages

39 / 67

Page 50: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Package = Library

In your Packages window scroll down until you see languageRand click inside the box:

40 / 67

Page 51: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Package Content

To access package description and its content, click on thepackage name.

New window Help will open up:

41 / 67

Page 52: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Accessing Info from Packages

Scroll down and select languageR-package

You will see the list of available functions from this package

42 / 67

Page 53: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Quick Help

Type in the console (bottom left):

?length

Instead of Run - click enter-key

43 / 67

Page 54: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

File Formats

1 CSV, Excel Movie metadata.csv

2 TXT NY Times.txt

3 PDF Article.pdf

44 / 67

Page 55: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV, Excel, SAS, SPSS Data

45 / 67

Page 56: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV

46 / 67

Page 57: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

CSV Data

Close data view:

colnames(movie metadata)

nrow(movie metadata)

47 / 67

Page 58: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Visualization

“The science of analytical reasoningfacilitated by visual interactive interfaces”

(Thomas and Cook, 2005)

“Visual analytics integrates new computational andtheory-based tools with innovative interactive techniquesand visual representations to enable human-information

discourse” (Thomas and Cook, 2005)

48 / 67

Page 59: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Elements

PointsLinesSurfacesVolumes

https://www.interaction-design.org/literature/article/

visual-mapping-the-elements-of-information-visualization

49 / 67

Page 60: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Properties

Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”

50 / 67

Page 61: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Graphical Properties

Graphical properties - make graphical elements “more (orindeed less) noticeable to the eye and/or valuable to the user ofthe representation”

50 / 67

Page 62: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Data Mapping (Mackinley, 1987)

51 / 67

Nominal

Quantitative

Ordinal

Page 63: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Mapping: Quantitative Data

Based on slides by John Hart https://www.coursera.org/learn/datavisualization

52 / 67

Page 64: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Mapping Perceptual Accuracy

Color Hue - wheel colorSaturation - intensity

Mackinlay, 1987 - https://research.tableau.com/sites/default/files/p110-mackinlay.pdf53 / 67

Page 65: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Bar Chart

The value of a column in the data set. This is done withstat=“identity”, which leaves the y values unchanged.The count of cases for each group - each x valuerepresents one group.

http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

54 / 67

Page 66: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Sample

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

55 / 67

Page 67: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Sample

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

56 / 67

Page 68: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Values

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

57 / 67

Page 69: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Counts

To get a bar graph of counts, we do not map a variable to y,and we use stat=“count”http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

58 / 67

Page 70: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating a Bar Chart - Counts

http:

//www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/

59 / 67

Page 71: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Title

60 / 67

Page 72: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Scatter Plot

Scatter charts show the relationship between two variables. Toconstruct a scatter chart, we need observations that consist ofpairs of variables

Based on slides by John Hart https://www.coursera.org/learn/datavisualization61 / 67

Page 73: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Scatter Plot

http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/

62 / 67

Page 74: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Bubble Chart

A bubble chart is a type of scatter chart in which the size ofthe data marker corresponds to the value of a third variable;consequently, it is a way to plot three variables in twodimensions

https://www.tableau.com/sites/default/files/media/which_chart_v6_final_0.pdf

63 / 67

Page 75: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

64 / 67

Page 76: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

65 / 67

Page 77: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Practice - Flashcards

IVMOOC flashcards app

IU IVMOOC course

66 / 67

Page 78: Introduction to R - from Rstudio to ggplot

R Studio

R Basics

Operators

Packages

Importing

Visualization

DataCamp

Practice-DataCamp

1 Sign up for a free DataCamp.com account

2 Search Introduction to R course

3 Complete and receive a Certificate!

67 / 67