Top Banner
1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010
14

1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

Jan 16, 2016

Download

Documents

Willa Stewart
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

1

The R Project for statistical computing

Eric Fouh, Christopher Poirel

CS 5604

Fall 2010

Page 2: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

2

What is R?

Page 3: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

3

Usages of R

• statistics system

• data handling and storage facility

• calculations on arrays, in particular matrices

• integrated collection of tools for data analysis

• graphical tool for data analysis

• programming language (called ‘S’)

Page 4: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

4

Structure of R• R functions and dataset are stored in packages

• R is provided with 25 “standard” packages:

• Hundreds of contributed packages (written by different authors ) are available

Package Name Description

baseBase R functions

dataset Base R datasets

graphicsR functions for base graphics

stats R statistical functions

utils R utility functions

matrix Matrix package

class Functions for classification

clusterFunctions for cluster analysis

Page 5: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

5

R and Information Retrieval

IR Concept R package

Text preprocessing

Term weighting, scoring

tm package: Constructs a term-document matrix, using one of the the following weighting functions TF (weightTf), TF-IDF

(weightTfIdf). e.g. tdm <- TermDocumentMatrix(crude, control = list(weighting = weightTfIdf, stopwords = TRUE))

vector space model for scoring clv package: dot.product function returns a cosine similarity

measure of two vectors.

vector space classification class package: performs a k-Nearest Neighbour Classification on a dataset

Hierarchical clustering Cluster package: computes clusters (agglomerative hierarchical ) on dataset

Latent Semantic Indexing Base package: performs Singular Value Decomposition on matrix

Page 6: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

6

Getting started with R• To start R

>R• To quit R

>q()• To see installed packages

>library()• To load a package

>library(class)• To start help

> help.start()• To create a vector

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)• To create a matrix

> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array filled with number from 1 to 20.• To display an object

>x• To delete an object

>rm x• To load data from file

>HousePrice <- read.table("houses.data")

Page 7: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

7

Examples (1)

• Term-Document Matrix

Page 8: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

8

Examples (1)

Page 9: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

9

Examples (2)

• Eigenvalues and eigenvectors

Page 10: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

10

Examples(3)

Page 11: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

11

Examples(3)

• Law Rank approximation

Page 12: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

12

Examples(3)

Page 13: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

13

Examples(3)

Page 14: 1 The R Project for statistical computing Eric Fouh, Christopher Poirel CS 5604 Fall 2010.

14

Resources

• IIR Book

• http://www.r-project.org/

Questions?