Slides Prepared by JOHN S. LOUCKS - Cameron Universitysyeda/orgl333/Ch2b.pdf · Slides Prepared by JOHN S. LOUCKS St. Edward’s UniversitySt. Edward’s University ... John S. Loucks
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Exploratory Data AnalysisExploratory Data Analysis
The techniques of The techniques of exploratory data analysisexploratory data analysis consist ofconsist ofsimple arithmetic and easysimple arithmetic and easy--toto--draw pictures that candraw pictures that canbe used to summarize data quickly.be used to summarize data quickly.
One such technique is the One such technique is the stemstem--andand--leaf displayleaf display..
Each digit on a stem is a Each digit on a stem is a leafleaf..Each line in the display is referred to as a Each line in the display is referred to as a stemstem..
To the right of the vertical line we record the lastTo the right of the vertical line we record the lastdigit for each item in rank order.digit for each item in rank order.
The first digits of each data item are arranged to theThe first digits of each data item are arranged to theleft of a vertical line.left of a vertical line.
It is It is similar to a histogramsimilar to a histogram on its side, but it has theon its side, but it has theadvantage of showing the actual data values.advantage of showing the actual data values.
A stemA stem--andand--leaf display shows both the leaf display shows both the rank orderrank orderand and shape of the distributionshape of the distribution of the data.of the data.
Example: Hudson Auto RepairExample: Hudson Auto Repair
The manager of Hudson AutoThe manager of Hudson Autowould like to have a betterwould like to have a betterunderstanding of the costunderstanding of the costof parts used in the engineof parts used in the enginetunetune--ups performed in theups performed in theshop. She examines 50shop. She examines 50customer invoices for tunecustomer invoices for tune--ups. The costs of parts,ups. The costs of parts,rounded to the nearest dollar, are listed on the nextrounded to the nearest dollar, are listed on the nextslide.slide.
Whenever a stem value is stated twice, the first valueWhenever a stem value is stated twice, the first valuecorresponds to leaf values of 0 corresponds to leaf values of 0 −− 4, and the second4, and the secondvalue corresponds to leaf values of 5 value corresponds to leaf values of 5 −− 9.9.
If we believe the original stemIf we believe the original stem--andand--leaf display hasleaf display hascondensed the data too much, we can condensed the data too much, we can stretch thestretch thedisplaydisplay by using two stems for each leading digit(s).by using two stems for each leading digit(s).
Crosstabulations and Scatter DiagramsCrosstabulations and Scatter Diagrams
CrosstabulationCrosstabulation and a and a scatter diagramscatter diagram are twoare twomethods for summarizing the data for two (or more)methods for summarizing the data for two (or more)variables simultaneously.variables simultaneously.
Often a manager is interested in tabular andOften a manager is interested in tabular andgraphical methods that will help understand thegraphical methods that will help understand therelationship between two variablesrelationship between two variables..
Thus far we have focused on methods that are usedThus far we have focused on methods that are usedto summarize the data for to summarize the data for one variable at a timeone variable at a time..
The left and top margin labels define the classes forThe left and top margin labels define the classes forthe two variables.the two variables.
CrosstabulationCrosstabulation can be used when:can be used when:•• one variable is qualitative and the other isone variable is qualitative and the other is
quantitative,quantitative,•• both variables are qualitative, orboth variables are qualitative, or•• both variables are quantitative.both variables are quantitative.
A A crosstabulationcrosstabulation is a tabular summary of data foris a tabular summary of data fortwo variables.two variables.
Insights Gained from Preceding Insights Gained from Preceding CrosstabulationCrosstabulation
•• Only three homes in the sample are an AOnly three homes in the sample are an A--FrameFramestyle and priced at more than $99,000.style and priced at more than $99,000.
•• The greatest number of homes in the sample (19)The greatest number of homes in the sample (19)are a splitare a split--level style and priced at less than orlevel style and priced at less than orequal to $99,000.equal to $99,000.
CrosstabulationCrosstabulation: Row or Column Percentages: Row or Column Percentages
Converting the entries in the table into row Converting the entries in the table into row percentages or column percentages can provide percentages or column percentages can provide additional insight about the relationship between additional insight about the relationship between the two variables.the two variables.
SimpsonSimpson’’ ParadoxParadox: In some cases the conclusions: In some cases the conclusionsbased upon an aggregatedbased upon an aggregated crosstabulationcrosstabulation can becan becompletely reversed if we look at the completely reversed if we look at the unaggregatedunaggregateddata. suggests the overall relationship between thedata. suggests the overall relationship between thevariables.variables.
We must be careful in drawing conclusions about theWe must be careful in drawing conclusions about therelationship between the two variables in therelationship between the two variables in theaggregated aggregated crosstabulationcrosstabulation..
Data in two or moreData in two or more crosstabulationscrosstabulations are oftenare oftenaggregated to produce a summaryaggregated to produce a summary crosstabulationcrosstabulation..
The general pattern of the plotted points suggests theThe general pattern of the plotted points suggests theoverall relationship between the variables.overall relationship between the variables.
One variable is shown on the horizontal axis and theOne variable is shown on the horizontal axis and theother variable is shown on the vertical axis.other variable is shown on the vertical axis.
A A scatter diagramscatter diagram is a graphical presentation of theis a graphical presentation of therelationship between two relationship between two quantitativequantitative variables.variables.
Scatter Diagram and Scatter Diagram and TrendlineTrendline
A A trendlinetrendline is an approximation of the relationship.is an approximation of the relationship.
Example: Panthers Football TeamExample: Panthers Football Team
Scatter DiagramScatter DiagramThe Panthers football team is interestedThe Panthers football team is interested
in investigating the relationship, if any,in investigating the relationship, if any,between interceptions made and points scored.between interceptions made and points scored.
1133221133
14142424181817173030
xx = Number of= Number ofInterceptionsInterceptions
yy = Number of= Number ofPoints ScoredPoints Scored
Insights Gained from the Preceding Scatter DiagramInsights Gained from the Preceding Scatter Diagram
•• The relationship is not perfect; all plotted points inThe relationship is not perfect; all plotted points inthe scatter diagram are not on a straight line.the scatter diagram are not on a straight line.
•• Higher points scored are associated with a higherHigher points scored are associated with a highernumber of interceptions.number of interceptions.
•• The scatter diagram indicates a positive relationshipThe scatter diagram indicates a positive relationshipbetween the number of interceptions and thebetween the number of interceptions and thenumber of points scored.number of points scored.
Example: Panthers Football TeamExample: Panthers Football Team