Top Banner
Statistics for CS 312
22

Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Statistics for CS 312

Page 2: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Descriptive vs. inferential statistics

• Descriptive – used to describe an existing population

• Inferential – used to draw conclusions of related populations

Page 3: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Graphical descriptions

• Histograms

• Frequency polygons/curves

• Pie charts

Page 4: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Measures of central tendency

• Mean – average – used most often

• Median – midpoint value – used when data is skewed

• Mode – most frequently occurring value – used when interested in what most people think

Page 5: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Measures of variability

• Range – highest value minus lowest value

• Standard deviation – average of how distant the individual values are from the mean

Page 6: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Normal curve

• Bell shaped curve – 68% of values lie within one standard deviation of the mean

• Non-normal – skewed either negatively (tail to left) or positively (tail to right)

• Percentiles - values that fall between two percentile values

• Standard scores – distance from mean in terms of the standard deviation – z = (X-m) / s.

• Z scores – transformed standard scores – Z = 10z + 50

Page 7: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Variables

• Quantitative – things that can be measured (age, income, number of credits)

• Qualitative – things without an inherent order (college major, address)

Page 8: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Populations and samples

• Population – entire universe from which a sample is drawn

• Sample – subset of population

• Symbols – mean m, µ; standard deviation s, σ; variance s2, σ2

Page 9: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

How representative is the sample

• Random sample – use random numbers to choose members of the sample

• Stratified sample – sample that represents subgroups proportionally

Page 10: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Hypothesis testing

• Hypothesis as to relationship of variables – similar or different

• Inference from a sample to the entire population

Page 11: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Statistical significance

• Accept true hypotheses and reject false ones• Based on probability (10 heads in a row occurs

once in 1024 coin tosses)• Significant result means a significant departure

from what might be expected from chance alone• Example – a result two standard deviations from

the mean occurs 2.3% of the time in a normally distributed population

Page 12: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Null hypothesis

• Assumption that there is no difference between two variables

• Example – Male and female college students do similar amounts of music downloading using BitTorrent.

• Example – School use of computers is unrelated to income of the students’ families

Page 13: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Levels of significance

• 5 percent level – Event could occur by chance only 5 times in 100

• 1 percent level – Event could occur by chance only 1 time in 100

• Significance level should be chosen before doing experiment

Page 14: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Types of errors

• Type I error – Rejection of a true null hypothesis

• Type II error – Acceptance of a false null hypothesis

• Decreasing one type increases the other

Page 15: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

One and two tailed tests

• One tailed test – Experimental values will only fail the null hypothesis in one direction

• Two tailed test – Values could occur on either the positive or negative tail of the curve

Page 16: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Estimation

• Concerns the magnitude of relationships between variables

• Hypothesis testing asks “is there a relationship”

• Estimation asks “how large is the relationship”

• Confidence interval – provides an estimate of the interval that the mean will be in

Page 17: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Sequence of activities

• Description

• Tests of hypotheses

• Estimation

• Evaluation

Page 18: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Correlation

• Quantifiable relationship between two variables

• Example – relationship between age and type of computer games played

• Example – relationship between family income and speed of home computer connection.

Page 19: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Correlation chart

• Two (or more) dimensional table

• Variables on the axes, could be intervals

• Scattergram – positive correlated values scatter with positive slope, negative with negative slope

Page 20: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Product-moment coefficient

• Formula based on deviations from means

• If deviations are the same or similar, values are positively correlated

• If deviations are the opposite, values are negatively correlated

• Most correlations are somewhere in between +1 and -1

Page 21: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Perfect positive correlation: r = +1

A B C

D

A B C

D

X Y Y

Page 22: Statistics for CS 312. Descriptive vs. inferential statistics Descriptive – used to describe an existing population Inferential – used to draw conclusions.

Perfect negative correlation: r = -1

A B C

D

C B A

D

X Y