Using_JMP

Using JMP® Student Edition

For Windows and Macintosh

The User’s Guide to Statistics with JMP® Student Edition

The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2009. Using JMP® Student Edition. Cary, NC: SAS Institute Inc.

Using JMP® Student Edition

Copyright © 2009, SAS Institute Inc., Cary, NC, USA

ISBN 978-1-60764-190-2

All rights reserved. Produced in the United States of America.

For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.

For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.

U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restric-tions set forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).

SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.

1st printing, April 2009

SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are registered trademarks or trademarks of their respective companies.

For more information about this or any other JMP product, contact:

SAS INSTITUTE INC.SAS Campus DriveCary, NC 27513 USA

For permission to use this work, contact us:

www.jmp.com/sefax: 919.677.4444phone: 919.677.8000

Technical Support is provided by the publisher of the work that JMP-SE is bundled with, and to regis-tered instructors.

Table of Contents

1 Getting Started with JMP Student EditionPrerequisites For This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Computer and Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Learning JMP Student Edition with its Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Conventions Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Starting JMP Student Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

JMP Student Edition Toolbars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Macintosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

First Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Finding Means, Medians, and Standard Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2 The Distribution PlatformIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

About the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Using Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Testing a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Testing Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Annotating Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

The Modeling Type of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Continuous Variable Graphs and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Display Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Histogram Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Normal Quantile Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Outlier Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Quantile Box Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Stem and Leaf Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

CDF Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Fit Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Categorical Variable Graphs and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Testing a Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Testing a Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Testing Categorical Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Saving Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Whole-Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Capability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3 The Fit Y by X PlatformIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Launching the Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Computing a t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Pooled t test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Selecting and Marking Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Analysis of Variance (ANOVA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Comparison Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Fitting Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Disclosure Icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Two-Way Contingency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

The Formula Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Scatterplots—The Continuous by Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

The Summary of Fit Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

The Lack of Fit Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Analysis of Variance Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Parameter Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Other Fitting Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

One Way ANOVA—The Continuous by Categorical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Script Submenu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Contingency Analysis—The Categorical by Categorical Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Logistic Regression—The Categorical by Continuous Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

The Whole Model Test Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 The Matched Pairs PlatformIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Preparing the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71


The Matched Pairs Launch Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

The Matched Pairs Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Interpreting the Matched Pairs Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5 The Fit Model PlatformIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


Setting Titles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Examining Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Least Squares Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Re-running an Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Linear Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

The Fit Model Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Model Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Fitting Personalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Emphasis Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Run Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Fit Model Report Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Regression Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Leverage Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Effect Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Exploring the Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Row Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Save Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6 Stepwise RegressionIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

The Stepwise Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Stepwise Regression Control Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Current Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Step History Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Make Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

All Possible Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Control ChartsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

The Control Chart Launch Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Process Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Chart Type Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Using Specified Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Tailoring the Horizontal Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Display Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Single Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Tests for Special Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

Western Electric Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Westgard Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Excluded, Hidden, and Deleted Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Shewhart Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Shewhart Control Charts for Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

XBar-, R-, and S- Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Run Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Individual Measurement Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Shewhart Control Charts for Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

p- and np-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

u-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

c-Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Cumulative Sum (Cusum) Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Launch Options for Cusum Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Cusum Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8 Time SeriesIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

The Time Series Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

The Time Series Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

Time Series Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Partial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Number of Forecast Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Modeling Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Model Comparison Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Model Summary Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Parameter Estimates Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Forecast Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Iteration History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

Model Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Smoothing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Smoothing Model Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Simple Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Double (Brown) Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Linear (Holt) Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Damped-Trend Linear Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Seasonal Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Winters Method (Additive) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

9 Correlations and Multivariate TechniquesIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Launch the Platform and Select Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Correlations Multivariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Inverse Correlations and Partial Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Scatterplot Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

Pairwise Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Simple Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

Nonparametric Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

Computations and Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Pearson Product-Moment Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Nonparametric Measures of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Inverse Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

10 Importing, Exporting, and Charting DataIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Using the Chart Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Using the Overlay Plot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

Macintosh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

Importing Text Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Importing Microsoft Excel Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

Results from Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

The Chart Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Single-Chart Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Frame Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Level Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


The Overlay Plot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185


Single-Plot Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

11 Full Factorial DesignsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Creating a Factorial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Entering Responses and Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Selecting Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

Making the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

12 Screening DesignsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Creating a Screening Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Entering Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Entering Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Choosing a Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Displaying and Modifying the Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

Specifying Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

Viewing the Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Continuing the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

13 Response Surface DesignsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

Creating a Response Surface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Entering Responses and Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Choosing a Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

Specifying Axial Value (Central Composite Designs Only) . . . . . . . . . . . . . . . . . . . . . . . . . . 220

Specifying Output Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Viewing the Design Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

Continuing the Analysis, If Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

14 Prospective Power and Sample SizeProspective Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

One-Sample and Two-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

Single-Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Power and Sample Size Animation for a Single Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

Two-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

k-Sample Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

One-Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

One-Sample and Two-Sample Proportions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Counts per Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Sigma Quality Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

IndexJMP-SE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

NoticesTechnology License Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

1Getting Started with JMP Student Edition

Welcome to JMP Student Edition—the version of SAS Institute’s award-winning JMP Statistical Dis-covery software tailor-made for the introductory statistics student.

JMP Student Edition is easy to learn and easy to use. All of the statistics are accessible in a familiar, point-and-click format, and the statistical concepts are supported with both graphs and appropriate numerical results. In addition, all the data tables, graphs, and charts are dynamically linked together, allowing for interactive exploration of patterns and outliers, whenever they present themselves. We hope that this visualization makes learning statistics more fun and easier than it has ever been before.

Prerequisites For This BookTo use JMP Student Edition, minimal knowledge about computers and statistics is necessary. The spe-cific prerequisites are as follows:

Computer and Operating System

In this manual, familiarity with standard computer operations and operating system terminology is assumed, especially use of the mouse, standard menus, and commands. Knowledge of opening, closing, and saving files should also exist before reading this guide. See the reference books for the operating sys-tem and computer for more information on these topics.

Statistics

Since JMP Student Edition is specially-made for the beginning statistics student, it requires no formal statistics knowledge. This book shows how to accomplish simple statistical tasks, like those in all intro-ductory statistics texts.

Learning JMP Student Edition with its Documentation

JMP Student Edition includes an extensive online help system. It can be read like a book, since it con-tains a complete table of contents, or it can be used to search for a specific topic.

In addition, JMP Student Edition is equipped with context-sensitive help. To use it, select the help tool (see Figure 1.1) and click anywhere inside a data table or report. JMP Student Edition opens help spe-cific to the item you clicked.

12 1 Getting Started with JMP Student Edition Conventions Used in This Book

Figure 1.1 The JMP Student Edition Help Tool

Conventions Used in This BookThroughout this manual, special typefaces are used to designate commands, menu items, or other unique features.

• Menu items, buttons, and report titles are usually set by JMP Student Edition and are not alter-able by the user.

• Variables under study are arranged in columns in the data spreadsheet, so the words variable and column are often used interchangeably.

• File names are opened and saved to disk or network folders.

• New or important words are emphasized.

Certain paragraphs are meant to be carried out while reading the text. They are designated by a mouse on the left.

• The notation File > Open means to select the Open command from the File menu.

• Sections titled “Introduction” provide a hands-on approach to learning the basics of JMP Student Edition. Each “Introduction” section explores the Denim.jmp sample data set using a specified plat-form or function. They are separate from the rest of the material in the chapter. In fact, all the “Introduction” sections could be read from each chapter before reading the rest of the material in the book, which is intended primarily as a reference.

Starting JMP Student EditionJMP Student Edition can be started in two ways:

• Double-click the JMP Student Edition icon

• Double-click a JMP Student Edition data set or script.

By default, JMP Student Edition begins by opening a special navigation window, called the JMP Starter (see Figure 1.2). If the JMP Starter is not automatically opened,

Select View > JMP Starter.

Help Tool

1 Getting Started with JMP Student Edition JMP Student Edition Toolbars 131

Intro

du

ctio

n to

JM

P S

tud

en

t Ed

ition

Figure 1.2 The JMP Starter Window

This window provides quick and easy access to all the menu commands of JMP Student Edition. Although these commands are accessible through menus and toolbars, they are also presented in the JMP Starter in a logical, organized way. There are nine tabs groups that partition the commands based on their function:

• The File group contains commands related to opening and closing several types of files.

• The Basic group contains commands that perform analyses for one-variable and two-variable situa-tions.

• The Model group contains commands for matched pairs (a special two-variable situation) and mul-tivariate models.

• The Survival group contains reliability and survival commands.

• The Graph group contains commands for charts and 3D graphics.

• The Measure group contains tools for capability analysis.

• The Control group contains commands for control charts.

• The DOE group contains commands for designing an experiment.

• The Tables tab contains commands used to manipulate data tables.

JMP Student Edition ToolbarsAn alternative way of accessing JMP Student Edition commands is by using toolbars.

Windows

Toolbars that duplicate the JMP Starter’s commands include the File/Edit toolbar (Figure 1.3), the Tools toolbar (Figure 1.4), the Analyze toolbar (Figure 1.5), the Graph toolbar (Figure 1.6), and the

14 1 Getting Started with JMP Student Edition JMP Student Edition Toolbars

Tables toolbar (Figure 1.7). There is also a Data Files toolbar, used to switch between open data tables, as well as user-customizable toolbars. Each of these commands is explained fully in later chapters.

Figure 1.3 The File/Edit Toolbar

Figure 1.4 The Tools Toolbar

Figure 1.5 The Analyze Toolbar

Figure 1.6 The Graph Toolbar

Figure 1.7 The Tables Toolbar

Some of these toolbars are not displayed by default. To activate toolbars that are not showing,

• Select View > Toolbars > Show Toolbars to open the Show Toolbars window (Figure 1.8)

NewDataTable

NewScript

OpenSave

Print

Cut

Copy

Paste

RunScript

HelpSelection

AnnotateScroller

HandBrush

LassoZoom

Cross-hair Lines

Polygon

SimpleShapes

Arrow

Distribution Survival/

MatchedPairs

FitModelFit Y By X

TimeSeries

ReliabilityMultivariate

Bar and Pareto PlotOverlay

SpinningPlot

PlotPie Charts

Summary

Subset

Sort

Stack

Split

1 Getting Started with JMP Student Edition First Session 151

Intro

du

ctio

n to

JM

P S

tud

en

t Ed

ition

Toolbars that are checked become visible. Those that are unchecked are hidden.

Figure 1.8 Show Toolbars Window

Macintosh

On the Macintosh, toolbars are not set in groups, but are all available to be added to a single toolbar. To see the definitions of each button on the toolbar, or to add and subtract buttons from the toolbar,

Control-click on the toolbar area of a window.

From the window that appears, drag buttons onto the toolbar to add them.

First SessionThis section is a guide through a few simple steps that demonstrate opening a data table, requesting an analysis, and closing a data table.

To open a data table, select File > Open, select Open Data Table from the JMP Starter, or click the Open button on the File/Edit toolbar.

Select the file Denim.jmp and click Open.

The data should appear like the listing in Figure 1.9.

16 1 Getting Started with JMP Student Edition First Session

Figure 1.9 Partial Listing of the Denim Data File

This data set contains data on the starch content of processed denim. In this example, we examine the data for the Starch Content (%) variable and answer the following questions:

• What is the mean of the data?

• What is its median?

• What is its standard deviation?

• Also, produce a histogram of the data.

Finding Means, Medians, and Standard Deviations

To answer these questions, use the Distribution platform.

Select Analyze > Distribution.

This brings up the launch dialog as seen in Figure 1.10.


Intro

du

ctio

n to

JM

P S

tud

en

t Ed

ition

Figure 1.10 The Distribution Dialog

Select the variable Starch Content (%), then click the Y, Columns button.

This step tells JMP Student Edition the variable to analyze. Since Starch Content (%) is the only vari-able of interest, we are finished with this dialog.

Click OK.

The report is presented in its default vertical format. However, some people prefer a horizontal layout for the Distribution report. To change the layout to horizontal,

Click on the red triangle next to the word Starch Content (%) in the report (see Figure 1.11).Figure 1.11 Red Triangles Reveal Popup Menus

All of these red triangles reveal popup menus when they are clicked. Watch closely for them—they reveal further options and explorations available during the data exploration process. The menu next to Starch Content (%) shows the options for this single variable, although there are cases (seen later) where the Distribution platform operates on several variables. Options available to all the variables in the report are in the menu next to the word Distributions.

Select Display Options > Horizontal Layout to see the report in Figure 1.12.

Popup Menus

18 1 Getting Started with JMP Student Edition First Session

Figure 1.12 The Starch Content Distribution Report

The answers to the four questions are all in this report. Read off the mean (25.516634), the median (24.349), and the standard deviation (9.6568876). The histogram is shown on the left.

If a printed copy of this report is needed,

Select File > Print.

Alternatively, this output may be included in a lab report written using a word processor. To move the report into another program, use the cut and paste features of JMP Student Edition:

Select the Selection tool, which looks like a fat plus.

Hold down the Shift key and click on each part of the report that needs to be copied.

In Figure 1.13, all the text columns and the histogram have been selected. None of the headings have been, nor has the box plot. Note that the histogram’s axis is selected separately from the histogram itself.

Figure 1.13 Selection of Report Parts

Select Edit > Copy.

Selection tool


Intro

du

ctio

n to

JM

P S

tud

en

t Ed

ition

In the word processor, select Edit > Paste.

Now that the analysis is completed, close JMP Student Edition.

Select File > Exit.

Where to Go from Here

This simple example has shown all the steps needed to complete a JMP Student Edition analysis. From here, feel free to explore any of the sample data files that came with JMP Student Edition, explore the online help, or continue reading this book.

2The Distribution Platform

Single-variable statistics are the domain of JMP Student Edition’s Distribution platform. It calculates summary statistics, displays graphs, and computes hypothesis tests for these variables.

IntroductionOpen the data file Denim.jmp.

For information on opening a file, see “First Session,” p. 15.

About the Data

This file contains information from an experiment with blue jeans, and is referred to in each introduc-tory section of this book. When blue jeans are manufactured, they usually contain a fair amount of starch, creating stiffness and stability in the fabric. However, most people find this stiffness undesir-able—in fact, some customers say that jeans have a “breaking in” period before they become truly com-fortable. This breaking in period is, in actuality, the time it takes for some of the starch present in the jeans to wear away and wash out.

In an effort to minimize the amount of time needed to break in a new pair of jeans, denim manufactur-ers subject the fabric to a variety of treatments to remove some of the starch. This experiment used three such treatments in differently-sized wash loads. The three different treatments, recorded in the Method column, are as follows:

• Alpha Amalyze is an enzyme added to the wash water that eats the starch.

• Caustic Soda is a chemical dissolved in the wash water that chemically destroys the starch.

• Pumice Stone are a physical abrasive that is added to the wash water that literally pounds the starch out. These abrasive pebbles are the source of the so-called stone-washed jeans.

In addition, after the initial washing process, some jeans are sand blasted. Whether or not the fabric was sand blasted is recorded in the Sand Blasted? column. The samples came from several different rolls of fabric, with each roll identified in the Lot Number column.

After treating the jeans, two measurements were taken: one to quantify the starch content of the fabric (measuring stiffness, recorded as a percentage of weight) and one as a count of destroyed threads (mea-suring wear-and-tear, recorded in the Thread Wear Measured column). The measured thread wear has been converted into an ordinal variable in the Thread Wear column by using the Formula Editor.

22 2 The Distribution Platform Introduction

Launching the Platform

In this example, the variables are examined one at a time.

Select Analyze > Distribution from the menu bar.

This brings up the Distribution platform launch dialog.

Figure 2.1 Distribution Launch Dialog

Select the variables Method and Starch Content (%) from the list on the left by clicking on the first variable name, holding down the Control (Windows) or Option (Macintosh) key, and clicking on the second variable name.

Click the Y, Columns button.

Click OK.

Histograms and textual information on all the variables now appear. For details on these reports, see “Continuous Variable Graphs and Reports,” p. 30 and “Categorical Variable Graphs and Reports,” p. 34.

Two of the histograms from this report are used later in this chapter, and are seen in Figure 2.9 on page 29. Many descriptive statistics can be read directly off the text reports accompanying these histo-grams.

Using Histograms

Histograms appear with bar widths and positions calculated internally by JMP Student Edition. Some-times, it is desirable to change these settings. For example, suppose the bar widths and positions of the Starch Content (%) histogram need modifying. To change them,

Select the hand tool (Figure 2.2) from the Tools toolbar.Figure 2.2 Hand Tool

Position the hand tool over the Starch Content (%) histogram and press the mouse button.

Move the mouse horizontally (assuming the histogram is in its default vertical layout) to change the bar widths of the histogram.

Hand Tool

2 The Distribution Platform Introduction 232

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Move the mouse vertically to change the position of the bars.

These histograms are also useful in looking at some relationships among the variables. For example,

Click on the histogram corresponding to Alpha Amalyze in the Method histogram.

The bar for Alpha Amalyze is highlighted, as are the bars in the other histograms for all the data points that have Alpha Amalyze as their method. Notice that the corresponding rows in the data table are also highlighted.

To bring the data table to the front, select Window > Denim.

To bring the Distribution report to the front, select Window > Denim-Distribution.

Data rows are highlighted in the data table so that they can be assigned row states—specific markers, colors, or labels—that persist in all of JMP Student Edition’s active plots. Whenever a row is selected in any plot, its selection status ripples through all of JMP Student Edition’s open windows.

Highlight and explore the other wash methods, paying attention to the starch content that gets high-lighted with each one. Try to determine if one of the methods results in lower starch content than the others.

Click in the histogram bars for Caustic Soda and Pumice Stone. Look at the corresponding points that are highlighted in the Starch Content (%) histogram.

It is often useful to have confidence intervals on the means or levels in these histograms. To get, for example, a 95% confidence interval on the levels of Method and Starch Content (%),


Select Confidence Interval > .95 from the drop-down menu next to the variable names in the histo-grams’ title bar.

Testing a Mean

Continuing the analysis, suppose that prior research claims that the mean starch content of Alpha Ama-lyze-washed denim is 20%. To test that the mean of the Alpha Amalyze denim has a mean of 20%, two steps are required.

• Make separate histograms for each of the three levels of the Method variable.

• Test the mean using the Alpha Amalyze histogram.

To accomplish these two steps,

Bring up the Distribution launch dialog by again selecting Analyze > Distribution from the menu bar.

Select Starch Content (%) in the list of variables and click the Y, Columns button.

Select Method in the list of variables and click the By button.

Click OK.

Three histograms should appear, with the corresponding level indicated in the title bar of the histo-gram.

In the Method=Alpha Amalyze section, select Test Mean from the drop-down list next to Starch Content (%).

Figure 2.3 Test Mean


Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Since the hypothesized mean is 20%,

Type 20 in the entry field for Specify Hypothesized Mean.Figure 2.4 Test Mean Dialog

The true standard deviation is not known, so leave the other entry field blank. This tells JMP Student Edition to compute a t-test of the mean. If the standard deviation had been known and entered, a z-test would be performed. Also, leave the box for the Wilcoxon Signed-Rank test unchecked. This is a non-parametric test that is not usually covered in an introductory course. The online help contains further information on these topics.

Click OK.

The results of the test are appended to the Distribution report. In this case, the t-test is two tailed, since the percentage could be higher or lower than 20%. Therefore, examine the p-value listed beside Prob > |t|, which in this case is a non-significant 0.5740.

Normality

Many statistical tests make an assumption that the data is approximately normally distributed. Although there are usually more important things to worry about than the exact normality of the data, JMP Stu-dent Edition provides a quick way of assessing normality through the Normal Quantile Plot. Complete details of the Normal Quantile Plot are in the section “Normal Quantile Plots,” p. 31. To produce a Normal Quantile Plot,

Select Normal Quantile Plot from the drop-down list next to one of the variable’s name.Figure 2.5 Normal Quantile Plot


Scroll down the report to see that this command only added a Normal Quantile Plot for one variable in the report. Many times, a command needs to be sent to all the variables in the report, yet it is tedious to select the same command many times. JMP Student Edition therefore provides a way to “broadcast” a command throughout a report, using the Control (Windows) or (Macintosh) key.

Hold down the Control (Windows) or (Macintosh) key and again select Normal Quantile Plot from the drop-down list next to the variable’s name.

This time, a Normal Quantile plot is appended to every histogram. This shortcut works for most com-mands in drop-down menus.

You can also test for Normality by fitting a Normal distribution, then performing a goodness-of-fit test.

Select Fit Distribution > Normal from the platform drop-down list.

When the report appears, select Goodness of Fit from the fitted distribution report.

This produces a report showing the parameters of the distribution, along with a goodness-of-fit statistic testing the null hypothesis that the distribution is, in fact, Normal. Small p-values indicate a non-nor-mal distribution.

Testing Probabilities

Another question that could be asked about this data is whether the three levels of Thread Wear occur with equal frequency. To test this assumption, a distribution of the Thread Wear variable is necessary.

Make sure the original Denim data table is the front window. If not, select Window > Denim.jmp.


Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Request a Distribution of Thread Wear (not Thread Wear Measured).

After the histogram appears, select Test Probabilities from the drop-down list next to Thread Wear in the title bar of the histogram.

An addition to the report appears. A screen shot of the addition appears later in this chapter, in Figure 2.16 on page 36.

Make sure Fix omitted at estimated values, rescale hypothesis is selected. JMP Student Edition automatically scales the numbers entered into the entry fields so that they sum to one. This allows an easy way to test for equal probabilities—simply enter 1 in each entry field.

Enter 1 into each Hypoth Prob entry field in the Test Probabilities section of the report.

Click Done.Figure 2.6 Test Probabilities Results


The results of the test are listed in the column labeled Prob>Chisq. This test shows some highly signif-icant results (p<0.001), so it is safe to say that the three levels occur with different probabilities.

Annotating Results

If these results are to be cut and pasted into a word processing program (for a lab report, or to turn in as a homework assignment), there may be some annotation necessary. For example, you may wish to annotate the tested probabilities from above to state that the results of the test were significant. To add comments to any JMP Student Edition output, use the Annotate tool from the Tools toolbar.

Figure 2.7 The Annotate Tool

Select the Annotate tool, as shown in Figure 2.7.

Click next to the top “0.3333” in the blank space to the right of the Test Probabilities table.

The initial click point is used in the Tag Line option below.

Type “These results are highly significant” in the box that appears.

Click somewhere outside the annotation box.

The annotation turns yellow.

Right-click in the yellow annotation to bring up a formatting menu, shown to the right.

Select Tag Line.

Move the mouse over the lower right corner until the cursor turns into a dou-ble arrow. Click and drag the corner to resize the annotation.

Click and drag inside the annotation box to reposition it with the tag line anchored.

Control-click (Windows) or -click (Macintosh) and drag to move the annotation and tag line together.

The final annotation is shown in Figure 2.8.

Figure 2.8 Final Annotation

Annotate Tool

2 The Distribution Platform The Modeling Type of Variables 292

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Annotations are not preserved in standard (RTF) cut and paste, but are instead combined within their respective reports as a single graphic. To paste a report with annotations into another document, choose Edit > Paste Special and choose one of the graphic formats.

The Modeling Type of VariablesJMP Student Edition bases its reports on the modeling type of the variables it analyzes. Variables can have one of three modeling types:

• Continuous variables are numeric and measured on a continuous scale. For example, temperature measurements are often on a continuous scale, limited only by the exactness of the measuring instru-ment. In the example data set Denim.jmp, the variable Size of Load is a continuous variable.

• Ordinal variables are measured on a discrete scale. There is an implicit order in the measuring scale, although the data are not necessarily numerical. For example, the age of people is often recorded on an ordinal scale — seldom do people report their age as 24.461 years, yet it is obvious that some ages are older than others. In the Denim.jmp data set, Thread Wear, with values “low”, “moderate”, and “severe”, is an ordinal variable.

• Nominal variables simply name data. There is no order in the scale. People’s names, for example, are represented as a nominal variable in JMP Student Edition. Method is a nominal variable in the Denim.jmp data set.

Ordinal and Nominal variables are often referred to collectively as categorical variables.

The modeling type of a variable determines what analyses JMP Student Edition performs. Identical platforms often result in different reports and graphs because the variables analyzed were of differing types. The Distribution platform shows exactly this behavior. It produces histograms in any case, but appends other graphs based on variable types.

Figure 2.9 Distribution Graphs

• Categorical variables show a histogram.

Histograms OutlierBox Plot

30 2 The Distribution Platform Continuous Variable Graphs and Reports

• Continuous variables have an outlier box plot, constructed to show possible outliers in continuous variables. Outlier plots are discussed in the section “Outlier Box Plot,” p. 31.

Continuous Variable Graphs and ReportsInitially, JMP Student Edition produces graphs and text reports to give information from the analysis. The text reports for continuous variables summarize typical univariate statistics, such as the mean, stan-dard deviation, confidence interval on the mean, number of data points, and quantiles.

The popup menu for continuous variables (Figure 2.10) shows the options available for continuous variables.

Figure 2.10 Continuous Variable Popup Menu

Display Options

The Display Options menu contains the following items:

• Quantiles shows or hides the Quantiles table.

• Moments shows or hides the Moments table. This table displays the mean, standard deviation, stan-dard error of the mean, upper and lower 95% confidence limits for the mean, and data set size.

• More Moments adds to the Moments table the variable’s sum, variance, skewness, kurtosis, and the coefficient of variation.

• Horizontal Layout arranges text reports to the right of their corresponding graphs and shows the histogram as a horizontal bar chart. Selecting this option again returns the report to a vertical layout.

Histogram Options

The Histogram Options menu contains the following items:

• Histogram shows or hides the histogram.

2 The Distribution Platform Continuous Variable Graphs and Reports 312

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

• Std Err Bars draws the standard error bar on each level of the histogram. The standard error bar automatically adjusts to reflect the histogram’s bar widths and positions when you change them using the hand tool.

• Count Axis adds an axis that shows the frequency of each value represented by the histogram bars.

• Prob Axis adds an axis that shows the proportion of each value represented by histogram bars.

• The Density Axis is the length of the bars in the histogram.

Any combination of these axes can be added to categorical or continuous histograms. As the length of the bars is changed with the hand tool, the Count and Prob axes change, but the Density axis remains constant.

Normal Quantile Plots

The Normal Quantile Plot option adds a graph to the report that is used to visualize the extent to which the variable is normally distributed. If a variable is normal, the normal quantile plot is approximately a diagonal straight line. This kind of plot is some-times also called a quantile-quantile plot, or q-q plot.

The Normal Quantile plot also shows confidence bounds. If the data fall within these confidence bounds, the data are approxi-mately normal.

Outlier Box Plot

The Outlier Box Plot is a schematic that shows the dispersion of a variable. This makes the identification of points with extreme values, sometimes called outliers, relatively easy.

The ends of the box are the 25th and 75th quantiles, also called the quar-tiles. The difference between the quartiles is the interquartile range. Outliers are often identified as points that fall above the upper quartile + 1.5×(interquartile range) or below the lower quartile – 1.5×(interquartile range).

The line across the middle of the box identifies the median sample value, and the means diamond indicates the sample mean and 95% confidence interval.

The dashed lines in the outlier box plot are sometimes called whiskers, extending from both ends of the box. The whiskers extend to the outermost data point that falls within the distances computed for judg-ing outliers.

32 2 The Distribution Platform Continuous Variable Graphs and Reports

The red bracket along the edge of the box identifies the shortest half, the smallest length that contains 50% of the data. This is useful when determining the shape of underlying distributions.

Quantile Box Plots

The Quantile Box Plot command shows additional quantiles (sometimes called per-centiles) on the axis of the histogram. If a distribution is normal, the quantiles are approximately equidistant from each other. Like the Normal Quantile Plot, the Quantile Box Plot is useful for seeing normality in a graphical way. For example, if the quantile marks are grouped closely at one end, but have greater spacing at the other end (as in this picture), the distribution is skewed toward the end with more spacing.

Note that the quantile box plot is not the same as the outlier box plot from page 31. Quantiles are values that divide a distribution into two groups — where the pth quantile is larger than p% of the values. For example, half the data are below the 50th percentile (median).

Stem and Leaf Plots

The Stem and Leaf command constructs a plot that is essentially a variation on the histogram. It was developed for tallying data in the days when computer printouts were neither graphical nor easy to pro-duce. They remain useful because they show the actual data at the same time as the shape of the data. Each line of the plot has a stem value that is the leading digit of a range of column values. The leaf val-ues are made from the remaining digits of the values. The data values can be reconstructed by joining the stem and leaf (and multiplying by the scale factor, if one exists).

In the example pictured in Figure 2.11, the third line of the table reveals that there are data points with values 40 and 41. Values are reconstructed by using the legend at the bottom of the plot.

Figure 2.11 Stem and Leaf Plot

Stem and leaf plots have similar interactive capabilities to JMP Student Edition’s graphics plots, in that they highlight corresponding data points in the data table when they are selected in the plot.

2 The Distribution Platform Continuous Variable Graphs and Reports 332

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

CDF Plot

The CDF Plot command plots a cumulative distribution function step plot using the observed data (with weights or frequencies if specified). Consult a statistics text for a definition of a density function. A CDF plot (Figure 2.12) estimates the area under the density curve up to each data point.

Figure 2.12 CDF Plot for Size of Load

Fit Distribution

The Fit Distribution menu allows you to fit certain distributions (Normal, Lognormal, Weibull) to the data. After fitting, you can select among several options, including a Goodness of Fit test.

34 2 The Distribution Platform Categorical Variable Graphs and Reports

Categorical Variable Graphs and ReportsThe only text report that appears by default in categorical distribution reports is a frequencies table (Figure 2.13).

Figure 2.13 Frequencies Table

This table lists the levels of a categorical variable, the count (sometimes called the frequency) of each level, and the probability associated with each variable. This probability is simply the ratio of each level’s count to the total count.

The standard error of these probabilities (StdErr Prob) and the cumulative probabilities (Cum Prob) for the data are also computed, but are not initially shown in the results table. To see them,

Right-click on the table to bring up a popup menu (Figure 2.14)

Select Columns to reveal a popup menu of all possible columns for the table.

Tables that are currently shown have a check mark beside them.

Select the column to be shown or hidden.Figure 2.14 Table Popup Menu

The options listed in the drop-down menu for categorical variables (Figure 2.15) work the same as those for continuous variables (see “Continuous Variable Graphs and Reports,” p. 30.)

Figure 2.15 Drop-down Menu for Categorical Variables

Statistical TestsJMP Student Edition contains numerous statistical tests for single variables, including:

2 The Distribution Platform Statistical Tests 352

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

• a test of the mean of a continuous variable

• a test of the standard deviation of a continuous variable

• a test of the probabilities of a categorical variable

These tests are all accessed through the popup menu next to the variable’s name at the top of the report.

Testing a Mean

The Test Mean command prompts for a test value to compare to the sample mean. If a value is entered for the standard deviation, a z-test is computed. Otherwise, the sample standard deviation is used to compute a t-statistic. Optionally, the nonparametric Wilcoxon signed-rank test can be requested. After clicking OK, the Test Mean table is appended to the bottom of the reports for that variable.

Use the Test Mean command repeatedly to test different values. Each time the mean is tested, a new Test Mean table is appended to the text report.

The Test Mean command calculates and displays the following statistics:

• t Test (or z test) lists the value of the test statistic and the p-values for the two-sided and one-sided alternatives. The test assumes the distribution is normal.

• Signed-Rank lists the value of the Wilcoxon signed-rank statistic followed by the p-values for the two-sided and one-sided alternatives. The test assumes nothing about the normality of the distribu-tion, only that it is symmetric.

The probability values given in the Test Mean table are defined:

• Prob > |t| is the probability of obtaining a greater absolute t value by chance alone when the sample mean is not different from the hypothesized value. This is the p-value for observed significance of the two-tailed t-test.

• Prob > t is the probability of obtaining a t value greater than the computed sample t ratio by chance alone when the sample mean is not the hypothesized value. This is the p-value for observed signifi-cance of a one-tailed t-test. The value of this probability is half of Prob > |t|.

• Prob < t is the probability of obtaining a t value less than the computed sample t ratio by chance alone when the sample mean is not the hypothesized value. This is the p-value for observed signifi-cance of a one-tailed t-test. The value of this probability is 1 – Prob>t.

Testing a Standard Deviation

The Test Std Dev command requests a test value for statistical comparison to the sample standard deviation. After clicking OK, the Test Standard Deviation table is appended to the bottom of the reports for that variable.

The Test Std Dev command can be used repeatedly to test different values. Each time a standard devi-ation is tested, a new table is appended to the text report.

The Test Standard Deviation table shows the computed Chi Square statistic that tests whether the hypothesized standard deviation is the same as the computed sample standard deviation, and the prob-abilities associated with that Chi Square value:

36 2 The Distribution Platform Confidence Intervals

• Prob>|ChiSq| is the probability of obtaining a greater absolute Chi Square value by chance alone when the sample standard deviation is not different from the hypothesized value. This is the p-value for observed significance of the two-tailed t-test.

• Prob>ChiSq is the probability of obtaining a Chi Square value greater than the computed sample Chi Square by chance alone when the sample standard deviation is not the hypothesized value. This is the p-value for observed significance of a one-tailed t-test.

• Prob<ChiSq is the probability of obtaining a Chi Square value less than the computed sample Chi Square by chance alone when the sample standard deviation is not the hypothesized value. This is the p-value for observed significance of a one-tailed t-test.

Testing Categorical Probabilities

The Test Probabilities option displays the dialog shown in Figure 2.16, where hypothesized probabili-ties are entered. The Likelihood Ratio and Pearson Chi Square tests are calculated for those probabili-ties.

Figure 2.16 Test Probabilities

Test Probabilities can scale the hypothesized values so that the probabilities sum to one. Therefore, the easiest way to test that all the probabilities are equal is to enter a one in each field. To test a subset of the probabilities, leave the levels that are not involved blank. JMP Student Edition substitutes estimated probabilities for those left blank.

The radio buttons on the dialog allow a choice between rescaling hypothesized values to sum to one or using the entered value without rescaling.

Confidence IntervalsFor continuous variables, confidence intervals for a mean are automatically displayed in the Moments table (Figure 2.17).

2 The Distribution Platform Saving Information 372

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Figure 2.17 The Moments Table for Continuous Variables

To calculate confidence intervals for a specific value of α,

Select Confidence Interval from the popup menu next to the variable name in the title bar.

A dialog box appears that lets you specify which confidence interval to display.

Select the desired level to see, for example, 0.95. Figure 2.18 Confidence Intervals for Continuous Variables

To obtain the equivalent table for categorical variables, select Confidence Interval > 0.95. To obtain a confidence interval alpha that is not listed on the Confidence Interval menu, select Confidence Interval > Other and enter the desired level. An example is shown in Figure 2.19.

Figure 2.19 Confidence Intervals for Categorical Variables

Saving InformationTo save information computed from continuous response variables, use the Save menu commands. Each command generates a new column in the current data table named by appending the response column name (denoted colname in the following definitions) to the saved statistic’s name.

The Save commands can be used repeatedly. This enables the same statistic to be saved multiple times under different circumstances, such as before and after combining histogram bars. If the Save com-

38 2 The Distribution Platform Whole-Platform Options

mand is used multiple times, the column name for the statistic is named colname1, colname2, and so forth, to create unique column names.

The Save menu contains the following commands:

• Level Numbers creates a new column, called Level colname. The level number of each observation corresponds to the histogram bar that contains the observation. The histogram bars are numbered from low to high, beginning with 1.

• Level Midpoints creates a new column, called Midpoint colname. The midpoint value for each observation is computed by adding half its level width to its lower level bound.

• Ranks creates a new column called Ranked colname that contains a ranking for each of the corre-sponding column’s values, starting at 1. If there are duplicates in the column, they are assigned con-secutive ranks in order of their occurrence in the spreadsheet.

• Ranks averaged creates a new column, called RankAvgd colname. If a value is unique, its averaged rank is the same as the rank. If a value occurs k times, its average rank is computed as the sum of its value’s ranks divided by k.

• Prob Scores creates a new column, called Prob colname. For N non-missing scores, the probability score of a value is computed as the averaged rank of that value divided by N+1. This column is sim-ilar to the empirical cumulative distribution function.

• Normal Quantiles creates a new column, called N-Quantile colname. These normal scores are Van Der Waerden approximations to the expected order statistics for the normal distribution.

• Standardized creates a new column, called Std colname. This contains the original column’s stan-dardized values (each value in the column has had the column mean subtracted, which is then divided by the column standard deviation).

Whole-Platform OptionsEach statistical platform has a popup menu in the outermost outline level next to the platform name. Options and commands in this menu affect all text reports and graphs on the platform.

The whole-platform options for the Distribution platform include the following:

• Uniform Scaling scales all axes with the same minimum, maximum, and intervals so that the distri-butions are easily compared. This option applies to reports for all response variables when selected.

• Stack lets you orient all the output in the report window as either portrait or landscape.

• Script lets you rerun or save the JSL script that produced the platform results. If the script is saved to a file, you can edit it; if it is saved with the current data table, it is available to run the next time you open the table. The JSL generated by Save Script for All Objects is the same as Save Script to Script Window if there are no By-Groups. When there are By-Groups the script includes JSL Where clauses that identify the By-Group levels.

2 The Distribution Platform Capability Analysis 392

Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Figure 2.20 The Script Menu

• Data Table Window gives a view of the underlying data table, which is especially useful when there are By-Groups.

Capability AnalysisThe Capability Analysis option gives a capability analysis for quality control applications. The capabil-ity study measures the conformance of a process to given specification limits. A dialog prompts you for Lower Spec Limit, Upper Spec Limit, and Target. You only have to enter one of the three values. Only those fields you enter are part of the resulting Capability Analysis table. Optionally, you can enter a known value for sigma, the process standard deviation.

Capability Analyses can calculate capability indices using several different short-term estimates for σ. After requesting a Distribution, select Capability Analysis from the popup menu on the outline bar for the variable of interest. The Dialog box shown in Figure 2.21 appears, allowing specification of long-term or one or more short-term sigmas, grouped by a column or a fixed sample size.

Figure 2.21 Capability Analysis Dialog Box

All capability analyses use the same formulas. The difference between the options lies in how sigma is computed. These options for sigma can be explained as:

• Long-term uses the overall sigma. This is the option used for Ppk statistics, and has sigma computed as

40 2 The Distribution Platform Capability Analysis

• Specified Sigma allows the user to enter a specific, known sigma used for computing capability analyses. Sigma is, obviously, user-specified and is therefore not computed. This is the option used for control chart-generated capability analyses, where the sigma used in the chart is entered (in the dialog) as the specified sigma.

• Short Term, Grouped by fixed subgroup size computes σ using the following formula. In this case, if r is the number of subgroups and each ith subgroup is defined by the order of the data, sigma is computed as

• Short Term, Grouped by Column brings up a column list dialog from which you choose the group-ing column. In this case, with r equal to the number of subgroups, sigma is computed as

(Note that this is the same formula for Short Term, Grouped by fixed subgroup size and is com-monly referred to as the Root Mean Square Error or RMSE.)

Note: There is a preference for Distribution called Ppk Capability Labeling that will label the long-term capability output with Ppk labels. This option is found using File > Preferences.

When you click OK, the platform appends a Capability Analysis table, like the one in Figure 2.22, at the bottom of the text reports. You can remove and redo a Capability Analysis as many times as you want.

The specification limits can be stored and automatically retrieved as a column property. To do this, choose Spec Limits from the Save command menu. When you save the specification limits, they appear on the histogram when opened at a later time.

σxi x–( )2

n 1–--------------------

i 1=

n

∑=

σ

xij xi.

–( )2

i 1=

n

∑

n r– 1–------------------------------------=

σ

xij xi.

–( )2

i 1=

n

∑

n r– 1–------------------------------------=


Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

Figure 2.22 The Capability Analysis Table

The Capability Analysis table is organized into two parts. The upper part of the table shows these quan-tities:

• The Specification column lists the names of items for which values are shown. They are Lower Spec Limit, Upper Spec Limit, and Spec Target.

• The Value column lists the values you specified for each limit and the target

• %Actual is the observed percent of data falling outside the specification limits.

The lower portion of the Capability Analysis table lists five basic process capability indexes, their values, and their upper and lower Confidence Intervals. It also lists the percent and PPM for areas outside the spec limits.The PPM column (parts per million) is the Percent column multiplied by 10,000.

This Sigma Quality measurement is frequently used in Six Sigma methods, and is also referred to as the process sigma.

For example, if there are 3 defects in n=1,000,000 observations, the formula yields 6.03, or a 6.03 Sigma process. The above and below columns do not sum to the total column because Sigma Quality uses values from the Normal distribution, and is therefore not additive.

Table 2.1 “Capability Index Names and Computations,” p. 41, describes these indices and gives com-putational formulas.

Table 2.1 Capability Index Names and ComputationsIndex Index

NameComputation

CP process capability ratio, Cp

(USL – LSL)/6s where USL is the upper spec limitLSL is the lower spec limit

Sigma Quality Normal Quantile 1 Expected # defects( )n

------------------------------------------------–⎝ ⎠⎛ ⎞ 1.5+=


CIs for CP

Lower CI on CP

Upper CI on CP

CPK (PPK for AIAG)

process capability index, Cpk

min(CPL, CPU)

CIs for CPK

Expected Value

Let

denote the noncentrality parameter and

represent the specification limit in σ units. Then the expected value is

Variance Using c and d from above, the variance is

Lower CI

Upper CI

Table 2.1 Capability Index Names and Computations (continued)

CP

χ1 α–w

------------ n 1–,

2

n 1–----------------------------

χ1 1 α–( )–w

-------------------------- n 1–,

2

n 1–----------------------------------------

E Cˆ

pk⎝ ⎠⎛ ⎞

cn µ USL LSL+

2--------------------------–⎝ ⎠

⎛ ⎞

σ-------------------------------------------------=

d USL LSL–σ

--------------------------=

16--- n 1–

2n------------

Γ n 2–2

------------⎝ ⎠⎛ ⎞

Γ n 1–2

------------⎝ ⎠⎛ ⎞

--------------------- d n 2 2π--- c

2–2

--------⎝ ⎠⎛ ⎞ 2c 1 2Φ c–( )–( )–⎝ ⎠⎛ ⎞exp–⎝ ⎠

⎛ ⎞

Var Cpk⎝ ⎠⎛ ⎞

d2

36------ n 1–

n 3–------------⎝ ⎠⎛ ⎞ d

9 n---------- n 1–

n 3–------------⎝ ⎠⎛ ⎞ 2

π--- c

2–2

--------⎝ ⎠⎛ ⎞exp c 1 2Φ c–( )–( )+–

19--- n 1–

n n 3–( )-------------------- 1 c

2+( ) n 1–

72n------------⎝ ⎠⎛ ⎞

Γ n 2–2

------------⎝ ⎠⎛ ⎞

Γ n 1–2

------------⎝ ⎠⎛ ⎞

---------------------

2

d n 2 2π--- c

2–2

--------⎝ ⎠⎛ ⎞ 2c 1 2Φ c–( )–( )–⎝ ⎠⎛ ⎞exp–⎝ ⎠

⎛ ⎞2

–+

E Cˆ

pk⎝ ⎠⎛ ⎞ k Var C

ˆpk( )–

E Cˆ

pk⎝ ⎠⎛ ⎞ k Var C

ˆpk( )+


Dis

tribu

tion

: Sin

gle

-Varia

ble

Sta

tistic

s

In Japan, a capability index of 1.33 is considered to be the minimum acceptable. For a normal distribu-tion, this gives an expected number of nonconforming units of about 6 per 100,000.

Exact 100(1 – α)% lower and upper confidence limits for CPL are computed using a generalization of the method of Chou et al. (1990), who point out that the 100(1 – α) lower confidence limit for CPL (denoted by CPLLCL) satisfies the equation

where Tn–1(δ) has a non-central t-distribution with n – 1 degrees of freedom and noncentrality param-eter δ.

Exact 100(1 – α)% lower and upper confidence limits for CPU are also computed using a generaliza-tion of the method of Chou et al. (1990). who point out that the 100(1 – α) lower confidence limit for CPU (denoted CPULCL) satisfies the equation

where Tn–1(δ) has a non-central t-distribution with n – 1 degrees of freedom and noncentrality param-eter δ.

At the bottom of the report, Z statistics are reported. Z represents (according to the AIAG Statistical Process Control manual) the number of standard deviation units from the process average to a value of

CPM process capability index, Cpm

CIs for CPM

Lower CI on CPM

Upper CI on CPM

CPL process capability ratio case of one-sided lower spec-ification

(mean – LSL)/3s where s is the estimated standard deviation

CPU process capability ratio of one-sided upper specifica-tion

(USL - mean)/3s

Table 2.1 Capability Index Names and Computations (continued)min target LSL– USL target–,( )

3 s2

mean target–( )2+

----------------------------------------------------------------------------

CPM

χ1 α–

2------------ γ,

2

γ---------------------

CPM

χ1 1 α–( )–

2-------------------------- γ,

2

γ---------------------------------

Pr Tn 1– δ 3 n=( )CPLLCL 3CPL n≤{ } 1 α–=

Pr Tn 1– δ 3 n=( ) CPULCL 3CPL n≥( ){ } 1 α–=


interest such as an engineering specification. When used in capability assessment, Z USL is the distance to the upper specification limit and Z LSL is the distance to the lower specification limit.

Z USL = (USL-Xbar)/sigma = 3 * CPU

Z LSL = (Xbar-LSL)/sigma = 3 * CPL

Z Bench = Inverse Cumulative Prob(1 - P(LSL) - P(USL))

where

P(LSL) = Prob(X < LSL) = 1 - Cum Prob(Z LSL)

P(USL) = Prob(X > USL) = 1 - Cum Prob(Z USL)

Note: You can also do a non-normal capability analysis through Fit Distribution options, described in the next section. After you fit a distribution, you have the option to generate quantiles and a target value for the fitted distribution. If you give a target value, a capability analysis is automatically gener-ated by using the quantile values and target you specified.

3The Fit Y by X Platform

Any time two variables need to be compared, the Fit Y by X platform is the choice to make. This single platform produces one way ANOVA, scatterplots, and contingency table analysis—most of the two-vari-able analyses seen in an introductory statistics course.

IntroductionAfter starting JMP Student Edition,

Open the data file Denim.jmp.

Details about this data are found in Chapter 2, “The Distribution Platform” in the section “About the Data,” p. 21.


In this introduction section, the variables are examined in pairs.

Select Analyze > Fit Y By X from the menu bar.

This brings up the Fit Y By X platform launch dialog as seen in Figure 3.1

Figure 3.1 Fit Y By X Launch Dialog

Notice the word “Contextual” in the title bar. It is there because this dialog launches other platforms depending on the modeling types (continuous or categorical) of the variables in the analysis. More

46 3 The Fit Y by X Platform Introduction

information on modeling types is found in “The Modeling Type of Variables,” p. 29. Initially, this example consists of three analyses, with Starch Content (%) as the Y variable in all of them. Method, Size of Load (lbs), and Sand Blasted? are the X variables. All three analyses are requested at the same time to illustrate some of JMP Student Edition’s interactive capabilities. These analyses would be equally valid if performed separately.

Select Starch Content (%) from the list of columns and click the Y, Response button.

To select all three X variables, click on Method, hold down the Shift key, then click Sand Blasted?.

Note that these dialog boxes respond to dragging as well as button clicks, as in the next step.

Drag these highlighted variables to the box to the right of the X, Factor button.

Click OK.

Three plots appear as in Figure 3.2.

Figure 3.2 Fit Y by X Results

On the far left and far right, dot plots of each level of a nominal variable are plotted side by side, a situ-ation leading to one-way ANOVAs. In the middle plot, JMP Student Edition produces a scatterplot of two continuous variables, a situation leading to fitting lines and curves.

Computing a t-test

As a simple example, examine the plot on the far right, relating starch content to whether the fabric was sand blasted or not. Is the starch content different for the two levels of Sand Blasted? This is a typical situation examined with a two-sample t-test. To conduct the t-test,

Select t test from the drop-down menu in the plot’s title bar.

The t-test report appears in the outline beneath the plot labeled t test.

Figure 3.3 t-test Results

Some things should be noticed about this report.

3 The Fit Y by X Platform Introduction 473

Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

• There is a statement on the second line of the t-test report that says “Assuming unequal variances”. This test is also known as the unpooled t-test. If you want the pooled version (where the variances are assumed to be equal), select the Means/Anova/Pooled t command.

Select the Means/Anova/Pooled t command from the report’s drop-down menu.

• The plot gets embellished with means diamonds, and other text tables. All of these are discussed later in this tutorial in “Analysis of Variance (anova),” p. 50. Here, note that the p-value for the unpooled t-test is listed beside Prob>|t| in the original t test report. Notice that this is the same value as listed in the new Analysis of Variance report in the column labeled Prob>F. In essence, JMP Student Edition has tested the same hypothesis twice, with two different methods, and both meth-ods agree (as they always should!). In fact, the square of the t statistic (listed under t-Test) is equal to the value of the F statistic (listed in the ANOVA table as F Ratio).

From the drop-down menu in the Oneway Analysis title bar, select Means/Anova/t test to remove the pooled t report.

• This is a two sample t-test, not a matched pairs t-test. If the data from the two groups have a natural pairing (for example, the before-and-after measurements of a patient taking an experimental medi-cation), use the Matched Pairs platform. Details on matched pairs are found in “The Matched Pairs Platform,” p. 71 in the “The Matched Pairs Platform” chapter.

Pooled t test

Now, examine the plot on the left side in the report in Figure 3.2, of Starch Content (%) vs. Method. Denim washed with Alpha Amalyze appears to have a lower starch content than denim washed with


Caustic Soda or Pumice Stone. For more specificity, it is helpful to look at text reports of these results, examining the mean, median, standard deviation, and quantiles for the three levels of the Method vari-able, which are produced as follows.

From the drop-down menu in the Oneway Analysis title bar, select Quantiles.

Text reports appear below the plot, and box plots are superimposed on the plot. For details on box plots, see the section titled “Quantile Box Plots,” p. 32, or the online help.

From the same drop-down menu, select Means and Std Dev.

In addition to the new text reports, mean error bars and standard deviation lines appear on the plot. Box plots are superimposed on the plot, giving clues to the underlying distribution of each level. Details on these additions are found in “One Way anova—The Continuous by Categorical Case,” p. 61.

These additions can also be removed.

In the same drop-down menu, select Quantiles and Means and Std Dev again.

Selecting and Marking Points

This plot is not only useful for computing results, but also for selecting results in other plots.

Select the Lasso tool from the Tools toolbar, as shown in Figure 3.4.Figure 3.4 Lasso Tool

The Lasso tool is used to draw curves around points. The “captured” points become selected.

Lasso Tool


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

While holding down the mouse button, drag the Lasso tool completely around the points for Alpha Amalyze, as shown in Figure 3.5.

Release the mouse button.Figure 3.5 Selecting Points with the Lasso Tool

JMP Student Edition briefly flashes how many points are contained in the selection region (32 in this case) and selects the points. Notice that these points are highlighted in all the plots, and in the data table. The Lasso tool works with all plots that show individual points, like scatterplots and leverage plots.

To make these points distinctive, assign them a unique color and marker.

With the points selected, right-click inside the plot, select Row Colors from the popup menu, and choose a red color from the color palette.

The Alpha Amalyze points turn red in all the plots.

Again, right-click inside the plot and select Row Markers. Select the small triangle from the markers palette.

The Alpha Amalyze points change to the triangle in all the plots.

In fact, there is an easier way to change the colors and markers of points in a plot if there is a certain column that divides up the data. For example, suppose that plots are needed that clearly distinguish the three levels of the Method variable. To mark all the data at once,

From the Rows menu, select Color or Mark by Column.

In the resulting dialog box, select Method from the list of variables. Make sure that the Set Color by Value, Set Marker by Value, and Make Window with Legend checkboxes are checked.

Unique colors and markers are assigned to each level of the Method variable in all the plots.


Analysis of Variance (ANOVA)

Is knowledge of the wash method useful in predicting starch content of the denim? The statistical test to answer this question is called a one-way ANOVA, and is produced in the same way as the t-test above.

Select Means/Anova from the drop-down menu next to Oneway Analysis.

Note that his command reads Means/Anova/t test when the categorical variable only has two levels. In all other cases (like this one), the t test is not appropriate, so is not available on the menu.

An ANOVA table appears beneath the plots, and means diamonds appear on the plot. A means diamond illustrates a sample mean and its 95% confidence interval, as shown by the schematic in Figure 3.6. The horizontal line across each diamond represents the group mean. The vertical span of each diamond represents the 95% confidence interval for each group. Overlap marks are drawn above and below the group mean. For groups with equal sample sizes, overlapping overlap marks indicate that the two group means are not significantly different at the 95% confidence level.

Figure 3.6 Means Diamonds Illustrated

Examining the ANOVA table shows that Method is a highly significant predictor of starch content. In other words, at least one level of the Method variable has a significantly higher or lower starch content than the others. The obvious question is which levels are different from each other. JMP Student Edi-tion uses comparison circles to explore this.

Comparison Circles

To show comparison circles,

Select Compare Means > Each Pair, Student’s t

Complete details of comparison circles are on page 63. Put simply, they show differences among levels of a variable, and are clickable. When a circle is clicked, it turns red, levels that are not significantly dif-ferent from it turn red, and levels that are significantly different from it turn gray. To see this,

Click on the bottom comparison circle, corresponding to Alpha Amalyze.

Group Mean

Overlap Marks

95% CI


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Figure 3.7 Starch Content Comparison Circles

The display changes to the one shown in Figure 3.7. This shows that Alpha Amalyze is significantly dif-ferent from the other two wash methods.

Click on the other two circles to discover their relationships.

These comparison circles are based on the confidence interval around the mean, which is itself based on the α level. By default, the α-level is 5%. However, it can be changed.

From the popup menu in the Oneway title bar, select Set α Level > .10.

Notice that the comparison circles change diameter when the α-level changes.

Fitting Lines

The middle plot of the report in Figure 3.2 is of two continuous variables, a situation that allows fitting of lines and curves through least-squares regression.

For example, suppose you want to predict starch content based on the size of the wash load. A good guess may simply be the mean starch content from all the data points. To see this mean,

Select Fit Mean from the drop-down list in the title bar of the plot.

A line representing the mean appears on the plot, and Fit Mean appears in a legend below the plot. Notice that Fit Mean below the plot has its own drop-down menu, as shown in Figure 3.8.

Figure 3.8 Fit Mean Results

Platform Menu

Fit Menu


A more interesting statistical question is whether a line or a curve is a better predictor of starch content than this simple mean. To fit a regression line to this data,

Select Fit Line from the platform menu in the plot’s title bar.

A line is superimposed on the graph. This line should be compared with the simple mean to see if it is helpful in prediction. To do this comparison, JMP Student Edition can draw confidence intervals for the fit around the fitted line. If these confidence intervals do not contain the horizontal mean, then the fitted line is helpful.

Select Confid Curves Fit from the Linear Fit menu in the legend below the plot.

As seen in Figure 3.9, the dotted confidence interval around the linear fit does not contain the mean. Therefore, the linear fit is statistically significant. It is statistically sound to use the fitted line in predic-tions.

Figure 3.9 Fit Line Results

There is also an option to produce shaded confidence curves, using the Confid Shaded Fit command.

The equation of the line, as well as several computed statistics, are found in the Linear Fit report. Val-ues of the slope and intercept are also printed in the Parameter Estimates section of the report.

Aside from the graphical confidence-curve method detailed above, there are numerical measures of the significance. One is the p-value associated with the slope of the line, also found in the Parameter Esti-mates report. In this case, the p-value is 0.0045, significant by almost any standard, reinforcing the graphical results above.


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Correlation Coefficient

Another measure of fit is the correlation coefficient, frequently denoted by r. Its value does not appear on any of the reports so far, although the square of its value (r2) is listed beside RSquare in the Summary of Fit text report. To compute the value of r itself, request a density ellipse.

Select Density Ellipse > .95 from the platform menu on the plot’s title bar.

A new report named Correlation appears at the bottom of the text reports. It is initially closed, but can be opened by clicking on the blue disclosure icon (Figure 3.10).

Disclosure Icon

Figure 3.10 The Disclosure Icon

The correlation coefficient is listed under the word Correlation. It is interesting to note that its signifi-cance (p=0.0045) is the same as that listed for the slope coefficient in the Parameter Estimates table, and the same as the Prob>F value in the Analysis of Variance table.

Residuals

One of the best diagnostic tools for a linear fit is its residuals. All (good) introductory statistics text-books discuss interpretation of residuals, which is not duplicated here. However, the first step in inter-pretation of residuals is to see them plotted. Of course, there are different residuals for each fit, so residuals commands are found in the fit menus in the legends below the plots. To see residuals for the linear fit,

Select Plot Residuals from the fit popup menu in the legend below the plot.

A plot of the residuals appears at the bottom of the report. The red horizontal line on the plot shows the mean of the residuals (which should, ideally, be near zero).

Figure 3.11 Residual Plot

The plot is getting cluttered, so before continuing, remove the fits that are there now.

For each of the fits below the plot, select Remove Fit from the fit popup menu.

Another interesting question is whether a single line (like this model) is enough to describe starch con-tent for all wash methods, or if a different line is needed for each level of the Method variable. In other

Click here...

...to revealthis report


words, is starch content related to load size in the same way when washed in Alpha Amalyze as when washed in Caustic Soda or Pumice Stone? Although this question is more in the realm of the Fit Model platform (detailed in “The Fit Model Platform” chapter), some initial investigation is easy with this platform.

First, instruct JMP Student Edition to group its calculations for each level of the Method variable.

From the platform menu on the plot’s title bar, select Group By.

In the resulting dialog box, select Method as the grouping variable.

Now, request JMP Student Edition to fit a line as before.

Select Fit Line from the Platform popup menu on the title bar above the plot.

Separate lines for each wash method appear on the plot, as shown in Figure 3.12.

Figure 3.12 Group By and Linear Fits

The statistical question is whether these lines are different enough to warrant the extra trouble in reporting all three, instead of the more compact (but possibly less accurate) reporting of the single line found previously.

Two-Way Contingency Tables

In the next example, both X and Y are categorical variables. The question is whether the method of washing denim has an effect on thread count. The analysis uses contingency tables—orderly ways of arranging count data. To generate a contingency table for this problem,

Select Analyze > Fit Y by X from the menu bar.

Assign Thread Wear (not Thread Wear Measured) to the Y, Response role.

Assign Method to the X, Factor role.

Click OK.

Since both variables are categorical, a mosaic plot appears, followed by a contingency table. Details of these displays are in the section “Contingency Analysis—The Categorical by Categorical Case,” p. 66.

Note that the mosaic plot is clickable, like all plots in JMP Student Edition. For example, to select all rows washed in Alpha Amalyze with a low thread wear,

Click in the lower, red section in the mosaic plot in the bar above Alpha Amalyze (see Figure 3.13)


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Figure 3.13 Mosaic Plot

Just below the contingency plot are tests for the independence of the two variables. The p-value for this test appears in the column labeled Prob>ChiSq, which in this case is the non-significant 0.76. There is not enough evidence to say that these two variables are not independent—in other words, there is not enough evidence to say that the thread count of denim is affected by wash method.

Logistic Regression

The Logistic platform fits the probabilities for response categories to a continuous x predictor. The fit-ted model estimates probabilities attributed to each x value. The logistic platform is the nominal/ordinal by continuous personality of the Fit Y by X command. There is a distinction between nominal and ordi-nal responses on this platform:

• Nominal logistic regression estimates a set of curves to partition the attributed probability among the responses.

• Ordinal logistic regression models the probability of being less than or equal to a given response. This has the effect of estimating a single logistic curve, which is shifted horizontally to produce probabilities for the ordered categories. This model is less general but more parsimonious, and is rec-ommended for ordered responses.

As an example,

Select Analyze > Fit Y By X

Assign Thread Wear as Y, Response and Size of Load (%) as X, Factor.

Click OK.

The report that appears shows the probability that the thread wear is low, moderate, or severe for each load size.

Click in this area to select Alpha Amalyze-washed jeans with low thread wear


The p-value of 0.0657 hints at a weak association between these two variables.

The Formula Editor

A powerful (and often under used) feature of JMP Student Edition is its formula editor. Formulas serve a wide variety of purposes, from assigning simple values to computing complex calculations with parameters and conditional clauses. They are especially useful when transforming data.

A column whose values are computed by a formula is both linked and locked. It is linked to (and depen-dent on) all other columns that are part of its formula. Its values are automatically recomputed when-ever the values in these columns are edited. It is locked so that its data values cannot be edited individually. In the Denim.jmp sample data table, the ordinal variable Thread Wear is computed with a formula that partitions the values of Thread Wear Measured into low, moderate, and severe categories. To see the formula,

Right-click in the heading of the column Thread Wear.Select Formula from the menu that appears.

Click Cancel to return to the data table.

The Formula Editor window operates like a pocket calculator with buttons, displays, and an extensive list of easy-to-use features.

Example

The essential features of the formula editor are best seen through an example. Suppose that it is neces-sary to calculate the logarithm of the values in the Starch Content (%) column (This calculation is common in real-world statistics). A new column is needed to store the new values in. There are two methods to create a new column.


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

From the main menu bar, select Cols > New Column.

Alternatively, double-click in the area to the right of the last column in the data table.

When a new column is created, the default title is highlighted and ready to be changed.

Change the title of the new column to Log Starch.

Now, add a formula to the column.

Right-click in the heading of the column Log Starch.Select Formula from the menu that appears.

When the formula editor appears,

Click on Transcendental from the Functions list, then select Log from the resulting menu.

Note that Log is the natural logarithm (base e). Common (base 10) logs are computed using the Log10 function.

Select Starch Content (%) from the columns list.

Click OK to close the Formula Editor and apply the formula.

Further examples, as well as complete documentation of all the formula editor functions, are found in JMP Student Edition’s online help.

58 3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case

Scatterplots—The Continuous by Continuous CaseIf both the X and Y variables are continuous, JMP Student Edition produces a bivariate analysis that initially shows a scatterplot. There are a number of options once the scatterplot appears, all accessed through the popup menu beside the variable name in the title bar (Figure 3.14)

Figure 3.14 Bivariate Popup Menu

Show Points alternately hides or displays the points in the plot.

It is often useful to first fit the mean as a reference line for other fits. The Fit Mean command adds a horizontal line to the plot at the mean of the response variable (Y). As with all the fitting commands in this platform, a legend appears below the plot with its own drop-down menu, where additional com-mands for each fit are accessed.The Fit Mean Fit table shows the value of the mean, its standard devia-tion, its standard error, and the sum of squared errors around the mean

Figure 3.15 Fit Commands

The Fit Line command adds a straight-line fit to the plot using least squares regression. Its drop-down menu (accessed as in Figure 3.15) has commands to save predicted values and residuals for the linear fit as new columns in the current data table.

3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case 593

Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

If the confidence area around this line (produced through the Confid Curves Fit command) includes the horizontal line at the response mean, then the slope of the line of fit is not significantly different from zero at the 0.05 significance level.

The Fit Polynomial command fits a polynomial curve of the degree selected from the Fit Polynomial submenu. After selecting the polynomial degree, the curve is fit to the data points using least squares regression. 95% confidence limits are plotted with the Confid Curves Fit and Confid Curves Indiv display options.

The Fit Polynomial option can be selected multiple times with different polynomial degrees for com-parison. As with the linear fit, options can save predicted values and residuals as new columns in the current data table for each polynomial fit.

Each time a linear or polynomial fit is chosen, three additional tables are appended to the report (see Figure 3.16). A Lack of Fit table also appears if there are replicates. For details of the lack of fit test, see “The Lack of Fit Table,” p. 90 or the online help.

Figure 3.16 Linear Fit Text Reports

The Summary of Fit Table

The summary of fit table shows the following information:

• R2 (RSquare), a measure of how well the line fits

• The adjusted R2 (RSquare Adj), used to compare models with different numbers of variables

• The Root Mean Square Error, an estimate of the standard deviation of the random error

• The mean of the response variable

• The number of observations (or, if weighted variables are involved, the sum of the weights).

60 3 The Fit Y by X Platform Scatterplots—The Continuous by Continuous Case

The Lack of Fit Table

The Lack of Fit table shows a special diagnostic test that appears only when the data and the model provide the opportunity. It is a test to see if a different form of the model would fit the data better. A significant F statistic in this table indicates that a different model should be examined. For details of the lack of fit test, see “The Lack of Fit Table,” p. 90 or the online help.

Analysis of Variance Table

This table look similar to the ANOVA tables in most textbooks. However, there may be some differences in the terminology for the ANOVA table’s parts. JMP Student Edition uses the following:

• Source lists the three sources of variation, called Model, Error, and C Total. The “C” in C Total stands for corrected, as in corrected for the mean.

• DF records the associated degrees of freedom (DF) for each source of variation

• Sum of Squares records an associated sum of squares (SS for short) for each source of variation.

• Mean Square is a sum of squares divided by its associated degrees of freedom.

• F Ratio is the model mean square divided by the error mean square. The underlying hypothesis of the fit is that all the regression parameters (except the intercept) are zero. If a parameter is a signifi-cant model effect, the F Ratio is usually higher than expected by chance alone.

• Prob > F is the observed significance probability (p-value) of obtaining a greater F value by chance alone if the specified model fits no better than the overall response mean.

Parameter Estimates Table

The terms in the Parameter Estimates table for a linear fit (seen previously in Figure 3.16) are the inter-cept and the single X variable.

The Parameter Estimates table displays the following:

• Term lists the name of each parameter in the requested model. The intercept is a constant term in all models.

• Estimate lists the parameter estimates of the linear model. These estimates are the coefficients in the linear model.

• Std Error lists the estimates of the standard errors of the parameter estimates. They are used in con-structing tests and confidence intervals.

• t Ratio lists the test statistics for the hypothesis that each parameter is zero. It is the ratio of the parameter estimate to its standard error. Looking for a t ratio greater than 2 in absolute value is a common rule of thumb for judging significance, because it approximates the 0.05 significance level.

• Prob>|t| lists the observed significance probability calculated from each t ratio. It is the probability of getting, by chance alone, a t ratio greater (in absolute value) than the computed value, given a true hypothesis. Often, a value below 0.05 is interpreted as evidence that the parameter is signifi-cantly different from zero.

3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case 613

Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Other Fitting Commands

The Fit Special command displays a dialog with choices for transformations of both the Y and X vari-ables. Transformations include log, square root, square, reciprocal, and exponential.

The fitted line is plotted on the original scale, so it appears as a curve on the plot. The regression report is shown with the transformed variables, but an extra report shows measures of fit transformed in the original Y scale (if there was a Y transformation).

The Fit Each Value command fits a value to each unique X value. Fitting each value is like doing a one-way analysis of variance, but in the continuous by continuous bivariate platform. Compare it to other fitted lines to see the concept of lack of fit.

The Density Ellipse command draws an ellipse that contains the specified mass of points, determined by the probability chosen from the Density Ellipse submenu. The Other selection allows the specifica-tion of any probability greater than zero and less than or equal to one.

The density ellipsoid is a good graphical indicator of the correlation between two variables. The ellip-soid collapses diagonally as the correlation between the two variables approaches either 1 or –1. The ellipsoid is more circular (less diagonally oriented) if the two variables are uncorrelated.

The Density Ellipse table that accompanies each Density Ellipse fit shows the correlation coefficient (r) for the X and Y variables and the probability that the correlation between the variables is significant.

The Group By command in the fitting menu displays a dialog, allowing selection of a classification (grouping) variable. When a grouping variable is selected, the Fit Y by X platform computes a separate analysis for each level of the grouping variable, and overlays the regression curves or ellipses on the scat-terplot. The fit for each level of the grouping variable is identified beneath the scatterplot, with individ-ual popup menus to save or remove fitting information.

The Group By command is checked in the fitting menu when a grouping variable is in effect. To change a grouping variable that is already in effect,

Select the Group By command to remove (uncheck) the existing variable.

Then, select the Group By command again and respond to its dialog as before.

One Way ANOVA—The Continuous by Categorical CaseIf the X variable is categorical and the Y variable is continuous, JMP Student Edition produces a one way ANOVA, initially displaying a plot that shows a vertical distribution of Y points for each X value. There are a number of options once this scatterplot appears, all accessed through the popup menu beside the variable name in the title bar (Figure 3.17).

62 3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case

Figure 3.17 One Way ANOVA Popup Menu

The Quantiles command displays the Quantiles table, which lists the 0% (minimum), 10%, 25%, 50% (median), 75%, 90%, and 100% (maximum) quantiles for each group. It also activates Box Plots from the Display Options menu.

The Means/Anova/t test command fits means for each group and performs a one-way analysis of vari-ance to test if there are differences among the means. Three tables are produced: a summary table, a one-way analysis of variance table, and a table that lists group frequencies, means, and standard errors computed with the pooled estimate of the error variance. If there are only two groups, a t-test also shows. This option automatically activates the Means Diamonds display option. See “Analysis of Vari-ance (anova),” p. 50 for a detailed description of means diamonds.

The Means and Std Dev command fits means for each group, but uses standard deviations computed within each group rather than the pooled estimate of the standard deviation used to calculate the stan-dard errors of the means. This command also displays Means Dots, Error Bars, and Std Dev Lines display options.

Compare Means has a submenu that provides the following four multiple comparison methods for comparing sets of group means. All activate the Comparison Circles display option.

• Each Pair, Student’s t displays a table with Student’s t statistics for all combinations of group means.

• All Pairs, Tukey HSD displays a table that shows the Tukey-Kramer HSD (honestly significant dif-ference) comparisons of group means.

• With Best, Hsu’s MCB displays a table that shows Hsu’s MCB (Multiple Comparison with the Best) comparisons of group means to the best (maximum or minimum) group mean.

• With Control, Dunnett’s displays a table showing Dunnett’s comparisons of group means with a control group.


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Each multiple comparison test begins with a comparison circles plot, a visual representation of group mean comparisons. The plot follows with a table of means comparisons. The illustration in Figure 3.18 shows the alignment of comparison circles with the confidence intervals of their respective group means.

Figure 3.18 Alignment of Comparison Circles

Compare each pair of group means visually by examining how the comparison circles intersect. The outside angle of intersection tells whether group means are significantly different (see Figure 3.19). Cir-cles for means that are significantly different either do not intersect or barely intersect, so that the out-side angle of intersection is less than 90°. If the circles intersect by an angle of more than 90°, or if they are nested, the means are not significantly different.

If the intersection angle is close to 90°, it is easy to verify whether the means are significantly different by clicking on the comparison circle, thus highlighting it. The highlighted circle appears with a thick solid line. Circles representing means that are not significantly different from the highlighted circle show with thin lines (see Figure 3.20). Circles representing means that are significantly different show with a thick gray pattern. To deselect circles, click in the graph outside the circles.

Figure 3.19 Angles in Comparison Circles

angle greaterthan 90 degrees

angle equalto 90 degrees

angle lessthan 90 degrees

not significantlydifferent

borderlinesignificantly

different

significantlydifferent

64 3 The Fit Y by X Platform One Way anova—The Continuous by Categorical Case

Figure 3.20 Comparison Circles after Clicking

The Nonparametric submenu allows computation of three nonparametric tests: the Wilcoxon, Median, and van der Warden tests. Nonparametric tests are useful to test whether group means or medians are located the same across groups. However, the usual analysis of variance assumption of nor-mality is not made. Nonparametric tests use functions of the response variable ranks, called rank scores.

• Wilcoxon rank scores are the simple ranks of the data. The Wilcoxon test is the most powerful rank test for errors with logistic distributions.

• Median rank scores are either 1 or 0 depending on whether a rank is above or below the median rank. The Median test is the most powerful rank test for errors with doubly exponential distributions.

• Van der Waerden rank scores are the ranks of the data divided by one plus the number of observations transformed to a normal score by applying the inverse of the normal distribution function. The Van der Waerden test is the most powerful rank test for errors with normal distributions.

The UnEqual Variances command tests for equality of group variances. It uses (and reports) four dif-ferent tests: O’Brien’s test, the Brown-Forsythe test, Levene’s test, and Bartlett’s test. When the variances across groups are not equal, the usual analysis of variance assumptions are not satisfied, so the standard ANOVA F test is not valid. There is a valid variant of the standard ANOVA, called the Welch ANOVA, which is displayed.

Set Alpha Level has a submenu that allows a choice from the most common alpha levels, or the speci-fication of any level with the Other selection. Changing the alpha level recalculates any confidence lim-its, adjusts the means diamonds on the plot if they are showing, and modifies the upper and lower confidence level values in reports.

Normal Quantile Plot shows overlaid normal quantile plots for each level of the X variable. Along with the standard normality-assessing capabilities of the single-variable Normal Quantile Plot, this plot shows both the differences in the means (vertical position) and the variances (slopes) for each level of the categorical X factor (Figure 3.21).


Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

Figure 3.21 Normal Quantile Plot

Normal Quantile Plot has these additional options:

• Plot Actual by Quantile generates a quantile plot with the response variable on the y-axis and quan-tiles on the x-axis. The plot shows quantiles computed within each level of the categorical X factor.

• Plot Quantile by Actual reverses the x- and y-axes, as shown in Figure 3.21.

• Line of Fit draws the straight diagonal reference lines for each level of the X variable.

• Probability Labels shows probabilities on the right axis of the Quantile by Actual plot and on the top axis of the Actual by Quantile plot.

CDF plots the cumulative distribution function for all the groups in the Oneway report.

Save has a submenu of commands to save the following quantities as new columns in the current data table:

• Save Centered saves values computed as the response variable minus the mean of the response vari-able within each level of the factor variable.

• Save Standardized saves standardized values of the response variable computed within each level of the factor variable. This is the centered response divided by the standard deviation within each level.

• Save Normal Quantiles saves normal quantile values computed within each level of the categorical factor variable.

Display Options allows addition or removal of plot elements.

• All Graphs shows or hides all graphs.

• Points shows data points on the scatterplot.

• Box Plots shows outlier box plots for each group.

• Means Diamonds draws Means Diamonds. Complete details of means diamonds is found in “Anal-ysis of Variance (anova),” p. 50.

• Mean Lines draws a line at the mean of each group.

• Mean CI Lines draws lines at the upper and lower 95% confidence levels for each group.

• Mean Error Bars identifies the mean of each group with a large marker and shows error bars one standard error above and below the mean.

• Grand Mean draws the overall mean of the Y variable on the scatterplot.

• Std Dev Lines shows dotted lines one standard deviation above and below the mean of each group.

Slopes showstandard deviations

Separations showdifferences in means

66 3 The Fit Y by X Platform Contingency Analysis—The Categorical by Categorical Case

• Comparison Circles show comparison circles computed for the multiple comparison method selected in the platform menu.

• Connect Means connects the group means with a straight line.

• Mean of Means

• X-Axis Proportional makes spacing on the x-axis proportional to the sample size of each level.

• Points Spread spreads points over the width of the interval.

• Points Jittered adds random horizontal jitter so that points that overlay on the same Y value can be seen.

Script Submenu

The Script submenu contains commands related to saving a script to redo an analysis.

• Redo Analysis repeats the analysis represented in the report.

• Save Script to Data Table generates a script that can redraw the report, and attaches it to the data table.

• Save Script to Report appends a script to the top of the report.

• Save Script to Script Window produces a script that can re-create the report in a text window. This script can then be edited or saved to an external file.

• Save Script for All Objects is useful when several analyses — like those from a By group, or from several variables in a single Distribution report— are in the same window. The resulting script gen-erates all reports in the window.

• Data Table Window brings the data table to the front of the display.

Contingency Analysis—The Categorical by Categorical CaseIf both the X and Y variables are categorical, JMP Student Edition produces a contingency analysis that initially shows a mosaic plot, contingency table (sometimes referred to as a crosstabs table), and a table of chi-square tests.

Figure 3.22 Contingency Analysis

3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case 673

Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

The popup menu for contingency analyses contains items to turn parts of the report on and off. Mosaic Plot, Contingency Table, and Tests all operate as toggles. Display Options > Horizontal Mosaic rotates the mosaic plot 90 degrees. The final item, Script, is explained in the section “Script Submenu,” p. 66.

Figure 3.23 Contingency Popup Menu

The contingency table itself has a popup menu to turn its cell contents on and off.

Figure 3.24 Contingency Table Popup Menu

• Count is the cell frequency, margin total frequencies, and grand total (total sample size).

• Total % is the percentage of cell counts and margin totals to the grand total.

• Row % is the percentage of each cell count to its row total

• Col % is the percentage of each cell count to its column total

• Expected is the expected frequency of each cell under the assumption of independence. It is com-puted as the product of the corresponding row total and column total, divided by the grand total

• Deviation is the observed (actual) cell frequency minus the expected cell frequency

• Cell ChiSq is the Chi Square values computed for each cell as

Logistic Regression—The Categorical by Continuous CaseIf the Y variable is categorical and the X variable is continuous, JMP Student Edition produces a logis-tic analysis that initially shows a logistic plot and text reports.

The cumulative logistic probability plot gives a complete picture of what the logistic model is fitting. At each x value, the probability scale in the y direction is divided up (partitioned) into probabilities for

Observed Expected–( )2

Expected---------------------------------------------------------

68 3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case

each response category. The probabilities are measured as the vertical distance between the curves, with the total across all Y category probabilities sum to 1.

Figure 3.25 Interpreting the Logistic Plot

Markers for the data are drawn at their x-coordinate, with the y position jittered randomly within the range corresponding to the response category for that row. You can see that the points tend to push the lines apart and make vertical space where they occur in numbers, and allow the curves to get close together where there is no data. The data pushes the curves because the criterion that is maximized is the product of the probabilities fitted by the model. The fit tries to avoid points attributed to have a small probability, which are points crowded by the curves of fit.

There are only a couple of options in the Logistic drop-down menu (see Figure 3.26)—turning the plot on and off through the Logistic Plot command, and Script options. (See “Script Submenu,” p. 66 for details on scripting options.)

The Whole Model Test Table

The Whole Model Test table shows if the model fits better than constant response probabilities. This table is analogous to the Analysis of Variance table for a continuous response model. It is a specific like-lihood-ratio Chi-square test that evaluates how well the categorical model fits the data. The negative sum of logs of the observed probabilities is called the negative log-likelihood (–LogLikelihood). The

P(thread wearis moderate) at load size 250

P(thread wearis low) at loadsize 250

P(thread wearis severe) at load size 250

These three probabilitiessum to one

3 The Fit Y by X Platform Logistic Regression—The Categorical by Continuous Case 693

Fit Y

by X

: Co

mp

arin

g T

wo

Varia

ble

s

negative log-likelihood for categorical data plays the same role as sums of squares in continuous data. The difference in the log-likelihood from the model fitted by the data and the model with equal proba-bilities is a Chi-square statistic. This test statistic examines the hypothesis that the x variable has no effect on the response.

In Figure 3.25, the p-value of the Chi-square statistic is 0.3657, which is not statistically significant.

Platform Options

The menu of platform options is shown in Figure 3.26. They turn various plot elements on and off, and allow adjustment of the line color. In addition, the standard Script menu appears.

Figure 3.26 Logistic Plot Popup Menu

4The Matched Pairs Platform

Some two-variable data have a natural pairing to them. A classic example is a before-and-after study of the effect of a medication. Data in this form are handled by the Matched Pairs platform.



Details about this data are found in “About the Data,” p. 21 in the “The Distribution Platform” chapter.

Preparing the Data

This example examines the starch content of blue jeans, with one group having been sand blasted, and the other not. The examination is of jeans that come from the same lot, so they form a paired situation, and call for the Matched Pairs platform.

To use the Matched Pairs platform, the paired data must be in two columns. However, in the Denim file, all the starch data is in the single column Starch Content (%). Therefore, the column needs to be split into two starch columns, based on whether the denim was sand blasted or not. To split the data in this way,

Select Tables > Split.

In the dialog that results, select Sand Blasted? from the list of columns and click Split By.

Select Starch Content(%) from the list of columns and click Split Columns.

Select Lot Number and Method and click Group.

At this point, the split command is set to make a new data table, having split Starch Content (%) into two columns based on the value in the Sand Blasted? column. In the forthcoming example analysis, all of the original variables are not used, so they do not need to be included in this new table. In fact, no variables other than the ones already in the dialog need to be retained. To drop the unnecessary vari-ables,

Select the Drop All radio button at the bottom of the Split dialog.

In the Output table name box, type “Paired Denim” to name the new data table.

The dialog should appear like the one in Figure 4.1.

72 4 The Matched Pairs Platform Introduction

Figure 4.1 Split Columns Dialog

Click OK to create the data table.

The data table appears as in Figure 4.2, with new columns no and yes containing starch information.

Figure 4.2 Paired Denim Data


Select Analyze > Matched Pairs from the menu bar.

This brings up the Matched Pairs platform launch dialog as shown in Figure 4.4.

4 The Matched Pairs Platform Introduction 734

Paire

d V

aria

ble

s

To launch the dialog, select the two paired variables to be analyzed.

Select the no and yes variables from the columns list and click the Y, Paired Response button.

The resulting report is easily interpretable, as shown in Figure 4.3.

Figure 4.3 Paired Denim Matched Pairs Report

The text reports below this plot show the same result—there is a difference in starch content of denim based on sand blasting, with a p-value of 0.002.

The Matched Pairs Launch Dialog

The Matched Pairs platform launch dialog (Figure 4.4) requires at least two variables to be entered. These two variables are the values that are paired.

Figure 4.4 Matched Pairs Launch Dialog

Optionally, a grouping variable can be entered in the X, Grouping role to have JMP Student Edition estimate means for the groups, and test both between and among the pairs. See the online help for an example using a grouping variable.

The Matched Pairs Scatterplot

After it is launched, the Matched Pairs platform displays a scatterplot and numerical results. The pri-mary graph in the platform is a plot of the difference of the two responses on the y-axis, and the mean of the two responses on the x-axis. This graph is the same as a scatterplot of the two original variables,

The horizontal gray line represents zero

The red line is the difference given by the data

The dotted lines are a 95% confidenceinterval on the difference. If it doesn’tcontain zero, the difference is significant

74 4 The Matched Pairs Platform Introduction

but turned 45° clockwise (see Figure 4.5). A 45° rotation turns the original coordinates into a differ-ence and a sum. By rescaling, this plot shows the difference between the two variables, and the mean of the two variables. See the online help for more details of this transformation.

Figure 4.5 Comparison of Scatterplot and Matched Pairs Plot

Notice the following in Figure 4.6:

• The 45° tilted square shows the frame of the scatterplot of the original columns.

• The mean difference is shown as the horizontal line, with the 95% confidence interval above and below. If the confidence region includes the horizontal line at zero, then the means are not signifi-cantly different at the 0.05 level. In the example shown in Figure 4.6, the difference is significant.

• The mean of the mean of pairs is shown by the vertical line. Figure 4.6 The Matched Pairs Scatterplot

The Matched Pairs menu, shown in Figure 4.7, allows two plot options — plotting the difference by the mean, as in Figure 4.6, or plotting the difference by the row number. The square reference frame can also be toggled on and off, and standard scripting items are available. See “Script Submenu,” p. 66 for details on the Script submenu.

Figure 4.7 The Matched Pairs Men

Mean Difference

Line where the two variables are equal

Mean of Means

95% Confidence Interval

4 The Matched Pairs Platform Interpreting the Matched Pairs Plot 754

Paire

d V

aria

ble

s

Interpreting the Matched Pairs PlotThere are many possibilities for making statements regarding the patterns to be discovered in the new, rotated coordinates. The examples below show six different situations and their interpretations.

Figure 4.8 No Change

The distribution vertically is small and centered at zero. The change from Y1 to Y2 is not significant. This is the high-positive-correlation pattern that is the typical situation.

Figure 4.9 Highly Significant Shift Down

The Y2 score is consistently lower than Y1 across all subjects.

76 4 The Matched Pairs Platform Interpreting the Matched Pairs Plot

Figure 4.10 No average Shift, But Amplified Relationship

This situation shows a low variance of the difference, and high variance of the mean of the two values within a subject. Overall, the mean is the same from Y1 to Y2, but individually, the high scores got higher and the low scores got lower.

Figure 4.11 No Average Shift, But Reverse Relationship

This example shows a high variance of the difference, and low variance of the mean of the two values within a subject. Overall, the mean is the same from Y1 to Y2, but the high Y1s are associated with low Y2s, and vice-versa. This is a high-negative-correlation pattern, and is unusual.

4 The Matched Pairs Platform Interpreting the Matched Pairs Plot 774

Paire

d V

aria

ble

s

Figure 4.12 No Average Shift, Positive Correlation, but Damped Instead of Accelerated

Overall, the mean is the same from Y1 to Y2, but the high scores drop a little, and low scores increase a little.

5The Fit Model Platform

General linear models—those that have more complicated forms than can be fit with simple linear regression—are fit with the Fit Model platform. Standard least squares fitting, including stepwise pro-cedures, are fit using this single platform.




In this introduction, models are developed to determine which variables (if any) are predictors of the starch content of denim, and which are predictors of thread wear in denim.


Select Analyze > Fit Model from the menu bar.

This brings up the Fit Model dialog, which is illustrated in Figure 5.1.

To begin with, fit a simple model with only main effects—no interactions, no powers.

In the list of columns, select Starch Content (%) and click the Y button.

To select all the model effects together, click on Method, hold down the Shift key, click on Sand Blasted?, then click the Add button.

Click Run Model.

Unlike the other platform launchers in JMP Student Edition, the Fit Model dialog does not go away once the model is run. To see that it is still available,

Click Window in the top menu bar to see a list of open windows.

80 5 The Fit Model Platform Introduction

Figure 5.1 The Fit Model Dialog

Setting Titles

This report generated here is used later in this presentation, so it will need to be identified among the other windows. To make identification easy, change its title to something meaningful as follows.

Make sure the report window is the front window.

Select Window > Set Title. (Window > Set Report Title on Macintosh)

In the resulting dialog box, type “Main Effects Only”.

Title bars within the reports are also editable.

Double click on the title bar that says Response Starch Content (%).

Again type “Main Effects Only” and press Enter.

5 The Fit Model Platform Introduction 815

Fit M

od

el

Examining Results

Now, examine these results. There is a lot of information in this report, and although only a portion is used in this example, all of it is documented in sections that follow. In this initial look at the data, first check if the model as a whole is significant. Then, look at the p values associated with each effect, printed in several places, including just below each leverage plot. Leverage plots are detailed in “Lever-age Plots,” p. 91.

The first leverage plot is for the entire model, and its p value indicates that the model is significant. To examine the p values for the individual effects, scroll the window until each effect is visible. Scrolling is accomplished using scroll bars along the edges of the window, or by using the Scroller tool.

Figure 5.2 The Scroller Tool

To scroll using the Scroller tool,

Shift-click the Scroller tool in the Tools toolbar.

Shift-clicking a tool keeps the tool active for multiple clicks. Without Shift-clicking, JMP Student Edi-tion reverts back to the arrow tool after other tools’ first use.

Move the Scroller tool over the results report.

Scroller Tool


Hold the mouse button and move the scroller tool to see the window move.

Try moving the mouse while releasing the mouse button, repeatedly, in short movements. (That is, several short click-and-drags.)

This illustrates the “inertia” that the scroller tool imparts on reports.

Least Squares Means

Least squares means, called LSMeans by JMP Student Edition, show the values of the response (starch content in this case) for levels of a nominal effect. The response values are adjusted for the other terms in the model, so that the effect of each variable can be examined.

By default, the LSMeans are displayed in this example model. To see a plot of the LSMeans,

Select LSMeans Plot from the drop-down menu on the title bar of an effect.Figure 5.3 LS Means Table and Plot

In this case, the plot suggests that when controlled for the other effects in the model, the starch content for Caustic Soda is higher than Pumice Stone, which is in turn higher than Alpha Amalyze.

Re-running an Analysis

After moving around the report a bit and observing the p-values for each effect, it should be clear that they are all significant at the 0.05 level. However, this analysis is fairly primitive, since it does not con-sider any interactions among the variables. Remove the existing effects from the model and re-run the analysis with interactions by doing the following.

Click on the Window menu and select the Fit Model dialog.

There are two ways to remove effects from the model. We want to remove all the effects in this case.

Select one of the variables and click the Remove button located above the effects list.

Double-click on each of the other variables.

Now, add in an interaction effect.

Select Method and Size of Load (lbs) in the list of columns and click the Cross button.

5 The Fit Model Platform Introduction 835

Fit M

od

el

Since it is rather tedious to specify all main effects and all crossed effects one at a time, JMP Student Edition provides some pre-defined macros to add popular effects combinations to models. These mac-ros are completely discussed in “Macros,” p. 86. For now, request a full factorial model—all main effects with all possible interactions.

Select Method, Size of Load (lbs), and Sand Blasted? in the model effects list, remembering that the Control (Windows) or (Macintosh) key allows for multiple selections.

Click the Macros button and select Full Factorial from the popup menu.

The appropriate effects appear in the effects list.

Make sure Starch Content (%) is still in the Y role at the top of the launch dialog.


Click Run Model.

Another report appears, this time much larger. A prudent model maker would, at this point, examine the p values of each effect, remove them one at a time from the model, then re-run the model to repeat the process until all effects are significant.

For example, noting that neither of the levels of the Method*Size of Load*Sand Blasted effect are sig-nificant at the 0.05 level,

Again bring the Fit Model dialog to the front using the Window menu.

Remove the Method*Size of Load*Sand Blasted effect.

Click Run Model.

This is exactly the reason that the Fit Model dialog persists, even after clicking the Run Model button. Many models have to be tweaked after initial results are examined. If you examine the new model results, you see that the Method*Size of Load effect levels are not significant, so it can be removed from the model as well. This iterative procedure may be repeated several times.

Linear Contrasts

Another common task is to test that levels within an effect are different from each other. This is accom-plished by using linear contrasts. For example, to test that the Alpha Amalyze wash method is signifi-cantly different from the Pumice Stone wash method,

Select LS Means Contrast from the drop-down menu in the title bar of the Method variable.

JMP Student Edition attaches a Contrast Dialog (Figure 5.4) to the report, where details of the linear contrast are specified.

Figure 5.4 Contrast Dialog

Click the + button once next to Alpha Amalyze.

5 The Fit Model Platform The Fit Model Dialog Box 855

Fit M

od

el

Click the - button once next to Pumice Stone.

Click Done.

This test shows a highly significant p-value, giving confirmation that Alpha Amalyze is significantly dif-ferent from Pumice Stone in affecting the resulting starch content of denim.

The Fit Model Dialog BoxRegardless of the model to be fit, the Fit Model dialog box is the first step. The dialog box is completely illustrated in Figure 5.1 on page 80. This is where the roles of each variable get specified and the type of fit is selected. In JMP Student Edition, the type of fit (standard least squares, nominal logistic, or ordi-nal logistic) is referred to as the fitting personality.

The Fit Model dialog is different from JMP Student Edition’s other launch dialogs in that it does not disappear after the model is launched. This facilitates experimentation with the model. If one of the variables is not significant, it can be removed and the model re-run quickly. To remove a variable from its role, highlight it and click Remove, or, alternatively, double-click on the variable’s name.

Roles

To assign a variable to a role, select the variable name and click the appropriate button. The roles a vari-able can take are:

• Y, which identifies one or more response variables (the dependent variables)

• Weight, an optional role that identifies a column whose values signify the importance of each row in the model

86 5 The Fit Model Platform The Fit Model Dialog Box

• Freq, an optional role that identifies a column whose values designate the frequency of rows in the analysis

Model Effects

Effects are added to the model by using the buttons in the Construct Model Effects section of the dia-log.

To add a simple regressor to the model, select the variable name and click the Add button.

To add a crossed effect to a model, select the two variables to be crossed (use Control-click or -click for multiple selections) in the Select Columns list and click the Cross button.

When levels of an effect (call it B) only occur within a single level of an effect (call it A), then B is said to be nested within A, and A is called the outside effect. To add a nested effect,

• Select the outside effects in the column selection list and click Add or Cross.

• When the outside effect appears in the Model Effects list, select it again.

• Select the nested variable in the column selection list and click Nest.

Macros

Common models can be generated using the macros drop-down list.

Figure 5.5 Macros drop-down list

The following models are available:

Full Factorial

To look at many crossed factors, such as in a factorial design, use Full Factorial. It creates the set of effects corresponding to all crossings of all variables selected in the columns list. For example, with selected variables A, B, and C, the Full Factorial selection places A, B, C, A*B, A*C, B*C, and A*B*C in the Model Effects list.

Factorial to Degree

To create a limited factorial, select Factorial to Degree and enter the degree of interactions in the Degree box. A second degree factorial is a very common analysis.

5 The Fit Model Platform The Fit Model Dialog Box 875

Fit M

od

el

Factorial Sorted

The Factorial Sorted selection creates the same set of effects as Full Factorial, but lists them in order of degree. All main effects are listed first, followed by all two-way interactions, then all three-way inter-actions, and so forth.

Response Surface

Response surface models find the values of the terms that produce a maximum or a minimum expected response. This is accomplished by fitting a collection of terms in a quadratic model. The critical values for the surface are calculated from the parameter estimates and presented with a report on the shape of the surface.

To specify a Response Surface effect, select the variable name, then select Response Surface Effect from the Attributes menu. Response surface effects appear with an ampersand (&) appended to their name.

Mixture Response Surface

Mixture response surface variables are selected in the same way as Response Surface Effect variables. Select Mixture Response Surface from the Attributes menu after selecting a variable name.

Polynomial to Degree

Polynomial effects are a series of terms that are powers of a single variable. To specify a polynomial effect,

• click one or more variables in the column selection list

• enter the degree of the polynomial in the Degree box

• select the Polynomial to Degree command in the Macros popup menu.

Scheffe Cubic

Cubic Models are an advanced topic not usually covered in an advanced course. See the JMP help if you are interested in Scheffe cubics.

Fitting Personalities

The three available personalities for model fitting are available in the Personality drop-down list, shown in Figure 5.6.

Figure 5.6 Fitting Personalities

• Standard Least Squares models one or more continuous responses in the usual way through fit-ting a linear model by least squares.

• Stepwise regression is an approach to selecting a subset of effects for a regression model. The Stepwise feature computes estimates that are the same as those of the Standard Least Squares

88 5 The Fit Model Platform Fit Model Report Items

personality, but it facilitates searching and selecting among many models. The Stepwise personality allows only one continuous Y.

Multiple categorical responses call for MANOVA or other advanced methods, only available in the pro-fessional version of JMP.

Emphasis Choices

The Emphasis popup menu controls which plots and tables are initially shown in the analysis report:

• Effect Leverage initially displays leverage and residual plots for the whole model. Select effect details and other statistical reports from the report itself.

• Effect Screening shows whole-model information, followed by a scaled parameter report and the Prediction Profiler.

• Minimal Report suppresses all plots. Request plots and reports from the report itself.

Run Model

The Run Model button submits the model to the fitting platform, but does not close the dialog win-dow. Use the dialog to make changes to the model for additional fits, or make changes to the data and then refit the same model.

Fit Model Report ItemsWhen a model is fit with the Standard Least Squares or Stepwise personality, several reports appear based on the Emphasis selected in the Fit Model dialog. Any report that does not appear by default can be requested from the platform menu.

5 The Fit Model Platform Fit Model Report Items 895

Fit M

od

el

Regression Reports

Regression reports give textual information about the fit.

The Summary of Fit Table

The Summary of Fit table appears first and shows the following numeric summaries of the response for the multiple regression model:

Rsquare (R2) estimates the proportion of the variation in the response around the mean that can be attributed to terms in the model, rather than to random error.

It is also the square of the correlation between the actual and predicted response. An R2 of 1 occurs when there is a perfect fit (the errors are all zero). An R2 of 0 means that the fit predicts the response no better than the overall response mean.

Rsquare Adj adjusts R2 to make it more comparable over models with different numbers of parame-ters. Since adding terms to an existing model always increases R2, this adjustment compensates for add-ing terms to a model that already has terms in it. It is a ratio of mean squares instead of sums of squares.

Root Mean Square Error estimates the standard deviation of the random error. It is the square root of the mean square for error in the corresponding analysis of variance table, and it is commonly denoted as s.

The Mean of Response is the overall mean of the response values. It is important as a base model for prediction because all other models are compared to it. The variance measured around this mean is the Corrected Total (C Total) mean square in the Analysis of Variance table.

Observations (or Sum of Weights) records the number of observations used in the fit. If there are no missing values and no excluded rows, this is the same as the number of rows in the data table. If there is a column assigned the role of weight, this is the sum of that column’s values. Weights are used in weighted least squares — an advanced topic.

The Analysis of Variance Table

The Analysis of Variance table shows the basic calculations for a linear model. The table compares the model to a model containing only the mean:

Source lists the three sources of variation, called Model, Error, and C Total.

DF records an associated degrees of freedom for each source of variation.

The C Total degrees of freedom is for the simple mean model. There is only one degree of freedom used (the estimate of the mean parameter) in the calculation of variation, so the C Total DF is always one less than the number of observations.

The total degrees of freedom are partitioned into the Model and Error terms:

• The Model degrees of freedom is the number of parameters (except for the intercept) used to fit the model.

• The Error DF is the difference between the C Total DF and the Model DF.

Sum of Squares records an associated sum of squares for each source of variation. Each is the sum of squares of the differences between the fitted response and the actual response.

• The Total (C Total) SS is the sum of squared distances of each response from the sample mean.


• The Error SS is the sum of squared differences between the fitted values and the actual values. This sum of squares corresponds to the unexplained error (residual) after fitting the regression model.

A Mean Square is a sum of squares divided by its associated degrees of freedom. This computation converts the sum of squares to an average.

The F Ratio is the model mean square divided by the error mean square. It tests the hypothesis that all the regression parameters (except the intercept) are zero. If there is a significant effect in the model, the F Ratio is higher than expected by chance alone.

Prob>F is the probability of obtaining a greater F value by chance alone if the specified model fits no better than the overall response mean. Significance probabilities of 0.05 or less are often considered evi-dence that there is at least one significant regression factor in the model.

Note that large values of Model SS and small values of Error SS lead to large F ratios and low p values—desirable if the goal is to declare that terms in the model are significantly different from zero. Most practitioners check this F test first and make sure that it is significant before delving further into the details of the fit. This significance is also shown graphically by the whole-model leverage plot, described in “Leverage Plots,” p. 91.

The Lack of Fit Table

The Lack of Fit table shows a special diagnostic test and appears only when the data and the model pro-vide the opportunity. Sometimes, it is possible to estimate the error variance independently of whether the right form of the model is the one under consideration. This occurs when observations are exact replicates of each other in terms of the X variables. The error for these exact replicates is called pure error. This is the portion of the sample error that cannot be explained or predicted no matter which form the model uses for the X variables.

The difference between the residual error from the model and the pure error is called lack of fit error. A lack of fit error can be significantly greater than pure error if a regressor is in the model with the wrong functional form, or if too few interaction effects exist in an analysis of variance model. In these cases, consider adding interaction terms, if appropriate, to try to better capture the functional form of a regressor.

There are two common situations where there is no lack of fit test:

• There are no exactly replicated points with respect to the X data, and therefore there are no degrees of freedom for pure error.

• The model is saturated, meaning that the model itself has a degree of freedom for each different X value; therefore, there are no degrees of freedom for lack of fit.

The Lack of Fit table shows information about the error terms:

Source lists the three sources of variation called Lack of Fit, Pure Error, and Total Error. Note that the pure error DF is pooled from each group where there are multiple rows with the same values for each effect.

The remaining portions of the Lack of Fit table are similar to those of the Analysis of Variance Table. The only additional information is the Max RSq, the maximum R2 that can be achieved by using only the variables in the model.


Fit M

od

el

The Parameter Estimates Table

The Parameter Estimates table shows the estimates of the parameters in the linear model and a t-test for the hypothesis that each parameter is zero. Simple continuous regressors have only one parameter. Models with complex classification effects have a parameter for each anticipated degree of freedom.

The Parameter Estimates table shows these quantities:

Term names the estimated parameter. The first parameter is always the intercept. Simple regressors show as the name of the data table column. Regressors that are dummy indicator variables constructed from nominal or ordinal effects are labeled with the names of the levels in brackets. The dummy vari-ables are coded as 1, except for the last level, which is coded as –1 across all the other dummy variables for that effect.

Estimate lists the parameter estimates for each term.

Std Error is the standard error, an estimate of the standard deviation of the distribution of the parame-ter estimate. This is the value used to construct t-tests and confidence intervals for the parameter.

t Ratio is a statistic that tests whether the true parameter is zero. It is the ratio of the estimate to its stan-dard error.

Prob>|t| is the probability of getting an greater t statistic (in absolute value), given the hypothesis that the parameter is zero. This is the two-tailed test against the alternatives in each direction. Probabilities less than 0.05 are often considered as significant evidence that the parameter is not zero.

The Effect Test Table

The Effect Test table shows the following information for each effect:

Source lists the names of the effects in the model.

Nparm is the number of parameters associated with the effect. Continuous effects have 1 parameter. Nominal effects have one less parameter than the number of levels. Crossed effects multiply the num-ber of parameters for each term. Nested effects depend on how levels occur.

DF is the degrees of freedom for the effect test. Note that if DF is zero, no part of the effect is testable. Whenever DF is less than Nparm, the note Lost DFs appears to the right of the line in the report.

Sum of Squares is the sum of squares for the hypothesis that the listed effect is zero.

F Ratio is the F statistic for testing that the effect is zero. It is the ratio of the mean square for the effect divided by the mean square for error.

Prob>F is the significance probability for the F ratio. It is the probability that if the null hypothesis is true, a larger F statistic would only occur due to random error. Values less than 0.0005 appear as 0.0000.

Leverage Plots

Leverage plots reveal the significance of an effect in the model. These plots show point-by-point what the residual would be both with and without that effect in the model (See Figure 5.8). The fitting plat-form produces a leverage plot for each effect in the model. An example leverage plot is shown in Figure 5.7.


Figure 5.7 Example Leverage Plot from Denim Data

In addition, there is a special leverage plot titled Whole Model that shows the actual values of the response plotted against the predicted values. This Whole Model leverage plot dramatizes the test that all the parameters (except intercepts) in the model are zero. This illustrates the same test reported in the Analysis of Variance report.

In general, the horizontal line on the plot represents what the values of the model would be if the effect was removed from the model. The sloped line represents the values of the model with the effect included. Significance of the effect is seen by comparing the slope of the sloped line with that of the horizontal one, as in Figure 5.9.

Figure 5.8 General Leverage Plot

Figure 5.9 Significance of Effects

residual residual constrained by hypothesis

points farther out pull on theline of fit with greater leveragethan the points near the middle

Significant Borderline Not Significant

confidence curvecrosses horizontalline

confidence curveasymptotic tohorizontal line

confidence curvedoes not crosshorizontal line


Fit M

od

el

Effect Details

In a Standard Least Squares analysis, the following effect details are available, dealing with least squares means, designated LS Means by JMP Student Edition. More detailed descriptions of each command are available in the online help.

Figure 5.10 Effect Details

• LS Means Table shows predicted values from the specified model across the levels of a categorical effect. The other model factors are controlled —that is, set to neutral values. Least squares means shows which levels produce higher or lower responses, holding the other variables in the model con-stant. Least squares means are also called adjusted means or population marginal means.

• LS Means Plot plots the LSMeans for nominal and ordinal main effects and two-way interactions.

• LS Means Contrast displays a dialog for specifying contrasts with respect to an effect. (See “Linear Contrasts,” p. 84 for an example of using contrasts.) This command is enabled only for categorical effects. To construct a contrast, click the + and - buttons beside the levels to be compared. If possi-ble, the dialog normalizes after each click to make the sum for a column zero and the absolute sum equal to two after each click. It adds to the plus or minus score proportionately.

• The LS Means Student’s t command requests multiple comparison tests.

Exploring the Estimates

The following commands allow you to further explore the estimated coefficients of the model.

Expanded Estimates

The standard Fit Model output includes a Parameter Estimates Table, as seen in Figure 5.11. For con-tinuous effects, the estimates are the (estimated) coefficients of each term in the linear model. For nom-inal effects, the estimates are the coefficients of dummy variables whose value is 1 for all levels of the variable except the last, which gets the value -1. Ordinal effects show coefficients for dummy variables that measure the difference at levels of the variable from the mean of all levels of the effect.

Figure 5.11 Parameter Estimates Table

The Expanded Estimates command shows the same information, but with a coefficient for each con-tinuous variable and each level of other variables.


Figure 5.12 Expanded Estimates Table

Compare the expanded estimates with the prediction formula for this model, shown here.

Custom Test

In introductory statistics courses, null hypotheses are often about one variable at a time, frequently hypothesizing that a parameter is zero. However, it is possible to test far more complicated null hypoth-eses than this. For example, it is reasonable to test that several parameters are zero, one, or another value, or that some parameters are equal to others. These tests are known in statistics as a general linear hypotheses, and are tested using JMP’s Custom Test command.

To test a custom hypothesis, select Custom Test from the Estimates popup menu, which displays the dialog shown in Figure 5.13.

Figure 5.13 Custom Test Launch

The space beneath the Custom Test title bar is an editable area for entering a test name.

You can enter a descriptive label for this test—useful if you are doing several tests


Fit M

od

el

Parameter lists the names of the model parameters. To the right of the list of parameters are columns of zeros corresponding to these parameters. Click in these cells to enter a new hypothesized parameter value corresponding to the desired test.

One of the parameters is labeled “=”. In the edit box to its right, enter the value that you are testing the contrast against. For example, you may be testing that a certain combination of the factors sums to 1. You would enter a 1 beside the “=” in this dialog.

Add Column adds another column of zeros so that several linear functions of the parameters can be jointly tested. Use the Add Column button to add as many columns to the test as needed.

When the test is specified, click Done to see the test performed. The results are appended to the bot-tom of the dialog.

When the custom test is done, the report lists the test name, the function value of the parameters tested, the standard error, and other statistics for each test column in the dialog. A joint F test for all columns is at the bottom. Sample output for a custom test (that the Size of Load coefficient is equal to 1) is shown in Figure 5.14.

Figure 5.14 Custom Test Output

Note: For tests within a categorical effect, instead of using a Custom test, consider using the contrast dialog, which tests hypotheses about the least squares means.

Correlation of Estimates

The Correlation of Estimates option in the Estimates platform menu produces a correlation matrix for the all effects in a model.

Row Diagnostics

Leverage Plots (the Plot Actual by Predicted and Plot Effect Leverage commands) are covered previ-ously in this chapter under “Leverage Plots,” p. 91.


• Plot Actual by Predicted displays the observed values by the predicted values of Y. This is the lever-age plot for the whole model.

• Plot Effect Leverage produces a leverage plot for each effect in the model showing the point-by-point composition of the test for that effect.

• Plot Residual By Predicted displays the residual values by the predicted values of Y. You typically want to see the residual values scattered randomly about zero.

• Plot Residual By Row displays the residual value by the row number of its observation.

• Durbin-Watson Test displays the Durbin-Watson statistic to test whether or not the errors have first-order autocorrelation. The autocorrelation of the residuals is also shown. The Durbin-Watson table has a popup command that computes and displays the exact probability associated with the statistic. This Durbin-Watson table is only appropriate for time series data when you suspect that the errors are correlated across time.

Save Commands

The Save submenu offers the following choices. Each selection generates one or more new columns in the current data table titled as shown, where colname is the name of the response variable:

Prediction Formula creates a new column, called Pred Formula colname, containing the predicted values computed by the specified model. It differs from the Save Predicted Values column in that the prediction formula is saved with the new column. This is useful for predicting values in new rows or for obtaining a picture of the fitted model.

Use the Column Info command and click the Edit Formula button to see the prediction for-mula. The prediction formula can require considerable space if the model is large. If you do not need the formula with the column of predicted values, use the Save Predicted Values option.

Predicted Values creates a new column called Predicted colname that contain the predicted val-ues computed by the specified model.

Residuals creates a new column called Residual colname containing the residuals, which are the observed response values minus predicted values.

Mean Confidence Interval creates two new columns called Lower 95% Mean colname and Upper 95% Mean colname. The new columns contain the lower and upper 95% confidence limits for the line of fit.

Note: If you hold down the Shift key and select Save Mean Confidence Interval, you are prompted to enter an α-level for the computations.

Individual Confidence Interval creates two new columns called Lower95% Indiv colname and Upper95% Indiv colname. The new columns contain lower and upper 95% confidence limits for individual response values.

Note: If you hold down the Shift key and select Save Individual Confidence Interval, you are prompted to enter an α-level for the computations.

Studentized Residuals creates a new column called Studentized Resid colname. The new col-umn values are the residuals divided by their standard error.

Std Error of Predicted creates a new column, called StdErr Pred colname, containing the stan-dard errors of the predicted values.


Fit M

od

el

Std Error of Residual creates a new column called, StdErrResid colname, containing the stan-dard errors of the residual values.

Std Error of Individual creates a new column, called StdErr Indiv colname, containing the stan-dard errors of the individual predicted values.

Effect Leverage Pairs creates a set of new columns that contain the values for each leverage plot. The new columns consist of an X and Y column for each effect in the model. The columns are named as follows. If the response column name is R and the effects are X1 and X2, then the new column names are

X Leverage of X1 for R Y Leverage of X1 for R

X Leverage of X2 for R Y Leverage of X2 for R.

6Stepwise Regression

Stepwise regression is an approach to selecting a subset of effects for a regression model. It is used when there is little theory to guide the selection of terms for a model and the modeler, in desperation, wants to use whatever seems to provide a good fit.

The approach is somewhat controversial. The significance levels on the statistics for selected models violate the standard statistical assumptions because the model has been selected rather than tested within a fixed model. On the positive side, the approach has been of practical use for 30 years in help-ing to trim out models to predict many kinds of responses. The book Subset Selection in Regression, by A. J. Miller (1990), brings statistical sense to model selection statistics.

This chapter uses the term “significance probability” in a mechanical way to represent that the calcula-tion would be valid in a fixed model, recognizing that the true significance probability could be nowhere near the reported one.

IntroductionIn JMP, stepwise regression is a personality of the Model Fitting platform—it is one of the selections in the Fitting Personality popup menu on the Model Specification dialog (see Figure 5.6 on page 87). The Stepwise feature computes estimates that are the same as those of other least squares platforms, but it facilitates searching and selecting among many models.

As an example,

Open the Fitness.jmp data table in the Sample Data folder.

This data shows results from an aerobic fitness study. Figure 6.1 shows a partial listing of the Fitness.jmp data table.

Aerobic fitness can be evaluated using a special test that measures the oxygen uptake of a person run-ning on a treadmill for a prescribed distance. However, it would be more economical to find a formula that uses simpler measurements that evaluate fitness and predict oxygen uptake. To identify such an equation, measurements of age, weight, runtime, and pulse were taken for 31 participants who ran 1.5 miles.

To find a good oxygen uptake prediction equation, you need to compare many different regression models. The Stepwise platform lets you search through models with combinations of effects and choose the model you want.

100 6 Stepwise Regression Introduction

Figure 6.1 The Fitness Data Table

Note: For purposes of illustration, certain values of MaxPulse and RunPulse have been changed from data reported by Rawlings (1988, p.105).

Figure 6.2 Model Specification Dialog for a Stepwise Model

To do stepwise regression,

Select Fit Model in the Analyze menu.

In the Model Specification dialog,

Choose Oxy as the Y response

Choose Weight, Runtime, RunPulse, RstPulse, and MaxPulse as Effects.

Select Stepwise from the Fitting Personality popup menu.

Click Run Model.

6 Stepwise Regression Introduction 1016

Ste

pw

ise R

eg

ressio

n

When the report appears, you are presented with a control panel, used to specify how effects should enter or exit the model. In this example, we to add any effects that are significant at the 0.25 level or better.

Leave 0.25 as the Prob to Enter.

We now have a choice of three stepwise methods: Forward (where effects are added as they become sig-nificant), Backward (where effects are removed as they become noon-significant) or Mixed, a combina-tion of the two detailed below. This example uses forward selection, so

Leave the default Forward selection method.

We now want to add significant effects. To add the first detected effect,

Click the Step button.

After one step, the most significant term Runtime is entered into the model (top Current Estimates table in Figure 6.3).

To automatically add all the detected effects automatically (rather than manually with the Step button)

Click Go to see the stepwise process run to completion.

The bottom table in Figure 6.3 shows that all the terms have been added except RstPulse and Weight which are not significant at the Prob to Enter value of 0.25 specified in the Stepwise Regression Con-trol Panel.

Figure 6.3 Current Estimates Table

Now that we have selected the effects that contribute to explaining Oxy, we can make a model and examine its analysis.

Click Make Model.

A Fit Model dialog appears.

Click Run Model.

This produces a report identical to those seen in “The Fit Model Platform” chapter.

After one step

After all steps

102 6 Stepwise Regression The Stepwise Window

The Stepwise WindowWhen launched, the stepwise platform displays a window that shows three areas:

• The Stepwise Regression Control panel, which is an interactive control panel for operating the plat-form

• The Current Estimates table, which shows the current status of the specified model, and provides additional control features

• The Step History table, which lists the steps in the stepwise selection.

The following sections describe the components of these areas and tell how to use them.

Stepwise Regression Control Panel

The Stepwise Regression Control Panel (Control Panel for short), shown next, has editable areas, but-tons and a popup menu. You use these dialog features to limit regressor effect probabilities, determine the method of selecting effects, begin or stop the selection process, and create a model.

You use the Control Panel as follows:

• Prob to Enter is the significance probability that must be attributed to a regressor term for it to be considered as a forward step and entered into the model. Click the field to enter a value.

• Prob to Leave is the significance probability that must be attributed to a regressor term in order for it to be considered as a backward step and removed from the model. Click the field to enter a value.

• Direction accesses the popup menu shown here, which lets you choose how you want variables to enter the regression equation.

• Forward brings in the regressor that most improves the fit, given that term is significant at the level specified by Prob to Enter.

6 Stepwise Regression The Stepwise Window 1036

Ste

pw

ise R

eg

ressio

n

• Backward removes the regressor that affects the fit the least, given that term is not significant at the level specified in Prob to Leave.

• Mixed alternates the forward and backward steps. It includes the most significant term that satisfies Prob to Enter and removes the least significant term satisfying Prob to Leave. It continues remov-ing terms until the remaining terms are significant and then it changes to the forward direction.

Buttons on the controls panel let you control the stepwise processing:

• Go starts the selection process. The process continues to run in the background until the model is finished.

• Stop stops the background selection process.

• Step stops after each step of the stepwise process

• Enter All enters all unlocked terms into the model.

• Remove All removes all terms from the model.

• Make Model forms a model for the Model Specification Dialog from the model currently showing in the Current Estimates table. In cases where there are nominal or ordinal terms, Make Model can create new data table columns to contain terms that are needed for the model.

Current Estimates Table

The Current Estimates table lets you enter, remove, and lock in model effects. The platform begins with no terms in the model except for the intercept, as is shown here. The intercept is permanently locked into the model.

You use check boxes to define the stepwise regression process:

• Lock locks a term in or out of the model. Lock does not permit a term that is checked to be entered or removed from the model. Click an effect’s check box to change its lock status.

• Entered shows whether a term is currently in the model. You can click a term’s check box to manu-ally bring an effect into or out of the model.

The following quantities update continually during the fitting process and are used in determining the final model.

• Parameter lists the names of the regressor terms (effects).

• Estimate is the current parameter estimate. It is missing (.) if the effect is not currently in the model.

104 6 Stepwise Regression The Stepwise Window

• nDF is the number of degrees of freedom for a term. A term has more than one degree of freedom if its entry into a model also forces other terms into the model.

• SS is the reduction in the error (residual) SS if the term is entered into the model or the increase in the error SS if the term is removed from the model. If a term is restricted in some fashion, it could have a reported SS of zero.

• “F Ratio” is the traditional test statistic to test that the term effect is zero. It is the square of a t-ratio. It is in quotation marks because it does not have an F-distribution for testing the term because the model was selected as it was fit.

• “Prob>F” is the significance level associated with the F-statistic. Like the “F Ratio,” it is in quota-tion marks because it is not to be trusted as a real significance probability.

Statistics for the current model appear above the list of effects:

• SSE, DFE, MSE are the sum of squares, degrees of freedom, and mean square error (residual) of the current model.

• RSquare is the proportion of the variation in the response that can be attributed to terms in the model rather than to random error.

• RSquareAdj adjusts R2 to make it more comparable over models with different numbers of parame-ters by using the degrees of freedom in its computation. The adjusted R2 is useful in stepwise proce-dure because you are looking at many different models and want to adjust for the number of terms in the model.

• Cp is Mallow’s Cp criterion.

• AIC is Akaike’s Information Criterion.

Step History Table

As each step is taken, the Step History table records the effect of adding a term to the model. The Step History table for the Fitness data example shows the order in which the terms entered the model and shows the effect as reflected by R2 and Cp.

If you use Mallow’s Cp as a model selection criterion, select the model where Cp approaches p, the num-ber of parameters in the model. In this example, three or four variables appear to be a good choice for a regression model.

Make Model

When you click Make Model, the model seen in the Current Estimates table appears in the Model Specification dialog. For example, if you click Make Model after the backward selection in Figure 6.4, the Model Specification dialog appears as shown in Figure 6.4 without a fitting personality selection.

6 Stepwise Regression All Possible Regressions 1056

Ste

pw

ise R

eg

ressio

n

Figure 6.4 New Model Specification dialog from Forward Stepwise Procedure

All Possible RegressionsStepwise includes an All Possible Models command. It is accessible from the red triangle drop-down menu on the stepwise control panel (see Figure 6.5).

Figure 6.5 All Possible Models

When selected, all possible models of the regression parameters are run, resulting in the report seen in Figure 6.6. Note that this report is for a three-variable model consisting of Runtime, RunPulse, and MaxPulse.

106 6 Stepwise Regression All Possible Regressions

Figure 6.6 All Models Report

The models are listed in decreasing order of the number of parameters they contain. The model with the highest R2 for each number of parameters is highlighted.

We suggest that no more than about 15 variables be used with this platform. More may be possible, but can strain computer memory (and human patience).

Note: Mallow’s Cp statistic is computed, but initially hidden in the table. To make it visible, Right-click (Control-click on the Macintosh) and select Columns > Cp from the menu that appears.

7Control Charts

Control charts are a graphical and analytic tool for deciding whether a process is in a state of statistical control and for monitoring an in-control process. This monitoring process is often called quality control or QC.

IntroductionControl charts have the following characteristics:

• Each point represents a summary statistic computed from a subgroup sample of measurements of a quality characteristic.

• The vertical axis of a control chart is scaled in the same units as the summary statistic.

• The horizontal axis of a control chart identifies the subgroup samples.

• The center line on a Shewhart control chart indicates the average (expected) value of the summary statistic when the process is in statistical control.

• The upper and lower control limits, labeled UCL and LCL, give the range of variation to be expected in the summary statistic when the process is in statistical control.

• A point outside the control limits signals the presence of a special cause of variation.

• Graph > Control Chart subcommands create control charts that can be updated dynamically as samples are received and recorded or added to the data table.

The following example uses the Coating.jmp data in the Quality Control sample data folder (taken from the ASTM Manual on Presentation of Data and Control Chart Analysis). The quality characteristic of interest is the Weight column. A subgroup sample of four is chosen. An -chart and an R-chart for the process are shown in Figure 7.1.

To replicate this example,

Choose the Graph > Control Chart > XBar command.

measurementaxis

subgroup sample axis

UCL

centerline

LCL

out of control point

X

108 7 Control Charts Introduction

Note the selected chart types of XBar and R.

Specify Weight as the Process variable.

Since our example has four samples in each subgroup,

Change the Sample Size Constant from 5 to 4.

Click OK.

Sample six indicates that the process is not in statistical control. To check the sample values, click the sample six summary point on either control chart. The corresponding rows highlight in the data table.

7 Control Charts Introduction 1097

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Figure 7.1 Variables Charts for Coating Data

You can use Fit Y by X for an alternative visualization of the data. First, change the modeling type of Sample to Nominal. Specify the interval variable Weight as Y and the nominal variable Sample as X. The box plots in Figure 7.2 show that the sixth sample has a small range of high values.

Figure 7.2 Quantiles Option in Fit Y By X Platform

{

all values insample six arehigh

110 7 Control Charts The Control Chart Launch Dialog

The Control Chart Launch DialogWhen you select a Control Chart from the Graph > Control Chart menu (Figure 7.3), you see a Con-trol Chart Launch dialog similar to the one in Figure 7.4. (The exact controls vary depending on the type of chart you choose.) Initially, the dialog shows three kinds of information:

• process information, for measurement variable selection

• chart type information

• limits specification.Figure 7.3 Control Chart Menu

Specific information shown for each section varies according to the type of chart you request.

Figure 7.4 Control Chart Launch Dialog

Through interaction with the Launch dialog, you specify exactly how you want your charts created. The following sections describe the panel elements.

process information

enter or removeknown statistics

limitsspecification

chart typeinformation

Add capabilityanalysis to report

7 Control Charts The Control Chart Launch Dialog 1117

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Process Information

The Launch dialog displays a list of columns in the current data table. Here, you specify the variables to be analyzed and the subgroup sample size.

Process

selects variables for charting.

• For variables charts, specify measurements as the process.

• For attribute charts, specify the defect count or defective proportion as the process.

Sample Label

enables you to specify a variable whose values label the horizontal axis and can also identify unequal subgroup sizes. If no sample label variable is specified, the samples are identified by their subgroup sam-ple number.

• If the sample subgroups are the same size, check the Sample Size Constant radio button and enter the size into the text box. If you entered a Sample Label variable, its values are used to label the hor-izontal axis.

• If the sample subgroups have an unequal number of rows or have missing values and you have a col-umn identifying each sample, check the Sample Grouped by Sample Label radio button and enter the sample identifying column as the sample label.

For attribute charts (p-, np-, c-, and u-charts), this variable is the subgroup sample size. In Variables charts, it identifies the sample. When the chart type is IR, a Range Span text box appears. The range span specifies the number of consecutive measurements from which the moving ranges are computed.

The illustration in Figure 7.5 shows an -chart for a process with unequal subgroup sample sizes, using the Coating.jmp sample data from the Quality Control sample data folder.

X


Figure 7.5 Variables Charts with Unequal Subgroup Sample Sizes

Phase

The Phase role enables you to specify a column identifying different phases, or sections. A phase is a group of consecutive observations in the data table. For example, phases might correspond to time peri-ods during which a new process is brought into production and then put through successive changes. Phases generate, for each level of the specified Phase variable, a new sigma, set of limits, zones, and resulting tests.

7 Control Charts The Control Chart Launch Dialog 1137

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Chart Type Information

Shewhart control charts are broadly classified as variables charts and attribute charts. Moving average charts and cusum charts can be thought of as special kinds of variables charts.

• XBar charts menu selection gives XBar, R, and S checkboxes.

• The IR menu selection has checkbox options for the Individual Measurement, Moving Range, and Median moving range charts.

• The Cusum chart is a special chart for means or individual measurements.

X , r- and s-

IR

CUSUM


• P, NP, C, and U charts, and Run Charts, have no additional specifications.

Parameters

You specify computations for control limits by entering a value for k (K Sigma) or by entering a proba-bility for α(Alpha). There must be a specification of either K Sigma or Alpha. The dialog default for K Sigma is 3.

K Sigma

allows specification of control limits in terms of a multiple of the sample standard error. K Sigma spec-ifies control limits at k sample standard errors above and below the expected value, which shows as the center line. To specify k, the number of sigmas, click K Sigma and enter a positive k value into the text-box. The usual choice for k is three, which is three standard deviations. The examples shown in Figure 7.6 compare the -chart for the Coating.jmp data with control lines drawn with K Sigma = 3 and K Sigma = 4.

Figure 7.6 K Sigma =3 (left) and K Sigma=4 (right) Control Limits

Alpha

specifies control limits (also called probability limits) in terms of the probability α that a single sub-group statistic exceeds its control limits, assuming that the process is in control. To specify alpha, click the Alpha radio button and enter the probability you want. Reasonable choices for α are 0.01 or 0.001

Using Specified Statistics

If you click the Specify Stats (when available) button on the Control Chart Launch dialog, a tab with editable fields is appended to the bottom of the launch dialog. This lets you enter historical statistics (statistics obtained from historical data) for the process variable. The Control Chart platform uses those entries to construct control charts. The example here shows 1 as the standard deviation of the process variable and 20 as the mean measurement.

X

7 Control Charts Tailoring the Horizontal Axis 1157

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Note: When the mean is user-specified, it is labeled in the plot as µ0.

If you check the Capability option on the Control Chart launch dialog (see Figure 7.4), a dialog appears as the platform is launched asking for specification limits. The standard deviation for the con-trol chart selected is sent to the dialog and appears as a Specified Sigma value, which is the default option. After entering the specification limits and clicking OK, capability output appears in the same window next to the control chart.

Tailoring the Horizontal AxisWhen you double-click the x-axis, the X Axis Specification dialog appears for you to specify the format, axis values, number of ticks, gridline and reference lines to display on the x-axis.

For example, the Pickles.JMP data lists eight measures a day for three days. In this example, by default, the x-axis is labeled at every other tick. Sometimes this gives redundant labels, as shown to the left in Figure 7.7. If you specify a label at an increment of eight, with seven ticks between them, the x-axis is labeled once for each day, as shown in the chart on the right.

Figure 7.7 Example of Labeled x-Axis Tick Marks

Display OptionsControl Charts have popup menus that affect various parts of the platform:

• The menu on the top-most title bar affects the whole platform window. Its items vary with the type of chart you select.

• There is a menu of items on the chart type title bar with options that affect each chart individually.

116 7 Control Charts Display Options

Single Chart Options

The popup menu of chart options appears when you click the icon next to the chart name, or right-click the chart space.

Box Plots

superimposes box plots on the subgroup means plotted in a Mean chart. The box plot shows the sub-group maximum, minimum, 75th percentile, 25th percentile, and median. Markers for subgroup means show unless you deselect the Show Points option. The control limits displayed apply only to the subgroup mean. The Box Plots option is available only for -charts. It is most appropriate for larger subgroup sample sizes (more than 10 samples in a subgroup).

Needle

connects plotted points to the center line with a vertical line segment.

Connect Points

toggles between connecting and not connecting the points.

Show Points

toggles between showing and not showing the points representing summary statistics. Initially, the points show. You can use this option to suppress the markers denoting subgroup means when the Box Plots option is in effect.

X

7 Control Charts Display Options 1177

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Figure 7.8 Box Plot Option and Needle Option for Airport.jmp Data

Connect Color

displays the JMP-SE color palette for you to choose the color of the line segments used to connect points.

Center Line Color

displays the JMP-SE color palette for you to choose the color of the line segments used to draw the cen-ter line.

Limits Color

displays the JMP-SE color palette for you to choose the color of the line segments used in the upper and lower limits lines.

Line Width

allows you to pick the width of the control lines. Options are Thin, Medium, or Thick.

Show Center Line

initially displays the center line in green. Deselecting Show Center Line removes the center line and its legend from the chart.

Show Control Limits

toggles between showing and not showing the chart control limits and their legends.

Tests

shows a submenu that enables you to choose which tests to mark on the chart when the test is positive. Tests apply only for charts whose limits are 3σ limits. Tests 1 to 4 apply to Mean, Individual and attribute charts. Tests 5 to 8 apply to Mean charts and Individual Measurement charts only. If tests do not apply to a chart, the Tests option is dimmed. Tests apply, but will not appear for charts whose con-trol limits vary due to unequal subgroup sample sizes, until the sample sizes become equal. These spe-

118 7 Control Charts Display Options

cial tests are also referred to as the Western Electric rules. For more information on special causes tests, see “Tests for Special Causes” on page 119 later in this chapter.

Show Zones

toggles between showing and not showing the zone lines with the tests for special causes. The zones are labeled A, B, and C as shown here in the Mean plot for weight in the Coating.jmp sample data. Con-trol Chart tests use the zone lines as boundaries. The seven zone lines are set one sigma apart, centered on the center line.

Westgard Rules

are detailed in a later section. See the text and chart in “Westgard Rules,” p. 122.

Test Beyond Limits

flags as a “*” any point that is beyond the limits. This test works on all charts with limits, regardless of the sample size being constant, and regardless of the size of k or the width of the limits. For example, if you had unequal sample sizes, and wanted to flag any points beyond the limits of an r-chart, you could use this command.

OC Curve

gives Operating Characteristic (OC) curves for specific control charts. OC curves are defined in JMP-SE only for -, p-, np-, c-, and u-charts. The curve shows how the probability of accepting a lot changes with the quality of the sample. When you choose the OC Curve option from the control chart option list, JMP-SE opens a new window containing the curve, using all the calculated values directly from the active control chart. Alternatively, you can run an OC curve directly from the QC tab on the JMP-SE Starter window. Select the chart on which you want the curve based, then a dialog prompts you for Target, LCL, UCL, K, Sigma, and sample size.

Window Options

The popup menu on the window title bar lists options that affect the report window. The example menu shown here appears if you request XBar and R at the same time. You can check each chart to show or hide it.

X

7 Control Charts Tests for Special Causes 1197

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

The specific options that are available depend on the type of control chart you request. Unavailable options show as grayed menu items.

The following options show for all control charts except Run charts:

Show Limits Legend shows or hides the Avg, UCL, and LCL values to the right of the chart.

Connect thru Missing connects points when some samples have missing values. The left-hand chart in Figure 7.9 is a control chart with no missing points. The middle chart has samples 8, 9, and 10 missing with the points not connected. The right-hand chart appears if you use the Connect thru Missing option, which is the default.

Capability launches a capability analysis. Details are found in Figure 2.20 on page 39.

Figure 7.9 Example of Connect thru Missing Option

Script has a submenu of commands available to all platforms that let you redo the analysis or save the JSL commands for the analysis to a window or a file.

Tests for Special CausesThe Tests option in the chart type popup menu displays a submenu for test selection. You can select one or more tests for special causes with the options popup menu. Nelson (1984) developed the num-bering notation used to identify special tests on control charts.

If a selected test is positive, the last point in the test sequence is labeled with the test number, where the sequence is the moving set of points evaluated for that particular test. When you select several tests for display and more than one test signals at a particular point, the label of the numerically lowest test spec-ified appears beside the point.

120 7 Control Charts Tests for Special Causes

Western Electric Rules

Western Electric rules are implemented in the Tests submenu. Table 7.1 on page 120 lists and inter-prets the eight tests, and Figure 7.10 illustrates the tests. The following rules apply to each test:

• The area between the upper and lower limits is divided into six zones, each with a width of one stan-dard deviation.

• The zones are labeled A, B, C, C, B, A with zones C nearest the center line.

• A point lies in Zone B or beyond if it lies beyond the line separating zones C and B. That is, if it is more than one standard deviation from the centerline.

• Any point lying on a line separating two zones lines is considered belonging to the outermost zone.

Note: All Tests and zones require equal sample sizes in the subgroups of nonmissing data.

Tests 1 through 8 apply to Mean ( ) and individual measurement charts. Tests 1 through 4 can also apply to p-, np-, c-, and u-charts.

Tests 1, 2, 5, and 6 apply to the upper and lower halves of the chart separately. Tests 3, 4, 7, and 8 apply to the whole chart.

See Nelson (1984, 1985) for further recommendations on how to use these tests.

Nelson (1984, 1985)

Table 7.1 Description and Interpretation of Special Causes TestsTest 1 One point beyond Zone A detects a shift in the mean, an increase in

the standard deviation, or a single aberra-tion in the process. For interpreting Test 1, the R-chart can be used to rule out increases in variation.

Test 2 Nine points in a row in a sin-gle (upper or lower) side of Zone C or beyond

detects a shift in the process mean.

Test 3 Six points in a row steadily increasing or decreasing

detects a trend or drift in the process mean. Small trends will be signaled by this test before Test 1.

Test 4 Fourteen points in a row alternating up and down

detects systematic effects such as two alter-nately used machines, vendors, or opera-tors.

X

3σ limits

centerline

zones


Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Test 5 Two out of three points in a row in Zone A or beyond

detects a shift in the process average or increase in the standard deviation. Any two out of three points provide a positive test.

Test 6 Four out of five points in Zone B or beyond

detects a shift in the process mean. Any four out of five points provide a positive test.

Test 7 Fifteen points in a row in Zone C, above and below the center line

detects stratification of subgroups when the observations in a single subgroup come from various sources with different means.

Test 8 Eight points in a row on both sides of the center line with none in Zones C

detects stratification of subgroups when the observations in one subgroup come from a single source, but subgroups come from different sources with different means.

Table 7.1 Description and Interpretation of Special Causes Tests

122 7 Control Charts Tests for Special Causes

Figure 7.10 Illustration of Special Causes Tests

Nelson (1984, 1985)

Westgard Rules

Westgard rules are implemented under the Westgard Rules submenu of the Control Chart platform. The different tests are abbreviated with the decision rule for the particular test. For example, 1 2s refers to a test that one point is two standard deviations away from the mean.

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

UCL

Avg

LCL

ABCCBA

1

2

3

4

5

5

5 6

7

8

Test 1: One point beyond Zone ATest 2: Nine points in a row in a single(upper or lower) side of Zone C or beyond

Test 3: Six points in a row steadilyincreasing or decreasing

Test 4: Fourteen points in a rowalternating up and down

Test 5: Two out of three points in arow in Zone A or beyond

Test 6: Four out of five points in a row in Zone B or beyond

Test 7: Fifteen points in a row in Zone C (above and below the centerline)

Test 8: Eight points in a row on both sidesof the centerline with none in Zone C


Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Because Westgard rules are based on sigma and not the zones, they can be computed without regard to constant sample size.

Table 7.2 Westgard RulesRule 1 2s is commonly used with Levey-Jennings plots, where control limits are set 2 standard deviations away from the mean. The rule is triggered when any one point goes beyond these limits.

Rule 1 3s refers to a rule common to Levey-Jennings plots where the control limits are set 3 standard deviations away from the mean. The rule is triggered when any one point goes beyond these limits.

Rule 2 2s is triggered when two consecu-tive control measurements are farther than two standard deviations from the mean.

Rule 4s is triggered when one measure-ment in a group is two standard devia-tions above the mean and the next is two standard deviations below.

Rule 4 1s is triggered when four consecu-tive measurements are more than one standard deviation from the mean.

Rule 10 X is triggered when ten consecu-tive points are on one side of the mean.

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

UCL

Avg

LCL

+1s+2s+3s

-1s-2s-3s

124 7 Control Charts Excluded, Hidden, and Deleted Samples

Excluded, Hidden, and Deleted SamplesThe following table summarizes the effects of various conditions on samples and subgroups:

Some additional notes:

1 Exclude and Hide operate only on the row state of the first observation in the sample. For example, if the second observation in the sample is hidden, while the first observation is not hidden, the sam-ple will still appear on the chart.

2 An exception to the exclude/hide rule: Tests for Special Causes can flag if a sample is excluded, but will not flag if a sample is hidden.

Shewhart Control ChartsShewhart control charts are broadly classified into control charts for variables and control charts for attributes. Moving average charts are special kinds of control charts for variables.

The Control Chart platform in JMP-SE implements a variety of control charts:

Table 7.3 Excluded, Hidden, and Deleted SamplesSample is excluded before creating the chart.

Sample is not included in the calculation of the limits, but it appears on the graph.

Sample is excluded after creating the chart.

Sample is included in the calculation of the limits, and it appears in the graph. Nothing will change on the output by excluding a sample with the graph open.

Sample is hidden before creating the chart.

Sample is included in the calculation of the limits, but does not appear on the graph.

Sample is hidden after creating the chart.

Sample is included in the calculation of the limits, but does not appear on the graph. The sample marker will disappear from the graph, the sample label will still appear on the axis, but limits remain the same.

Sample is both excluded and hidden before creating the chart.

Sample is not included in the calculation of the limits, and it does not appear on the graph.

Sample is both excluded and hidden after creating the chart.

Sample is included in the calculation of the limits, but does not appear on the graph. The sample marker will disappear from the graph, the sample label will still appear on the axis, but limits remain the same.

Data set is subsetted with Sample deleted before creating chart.

Sample is not included in the calculation of the limits, the axis will not include a value for the sample, and the sample marker does not appear on the graph.

Data set is subsetted with Sample deleted after creating chart.

Sample is not included in the calculation of the limits, and does not appear on the graph. The sample marker will dis-appear from the graph, the sample label will still be removed from the axis, the graph will shift, and the limits will change.

7 Control Charts Shewhart Control Charts for Variables 1257

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

• -, R-, and S-charts,

• Individual and Moving Range charts,

• p-, np-, c-, and u-charts,

• Phase Control Charts for -, r-, IR-, p-, np-, c-, and u- charts

One feature special to Control Charts, different from other platforms in JMP-SE, is that they update dynamically as data is added or changed in the table.

Shewhart Control Charts for VariablesControl charts for variables are classified according to the subgroup summary statistic plotted on the chart:

• -charts display subgroup means (averages)

• R-charts display subgroup ranges (maximum – minimum)

• S-charts display subgroup standard deviations.

• Run charts display data as a connected series of points.

The IR selection gives two additional chart types:

• Individual Measurement charts display individual measurements

• Moving Range charts display moving ranges of two or more successive measurements.

XBar-, R-, and S- Charts

For quality characteristics measured on a continuous scale, a typical analysis shows both the process mean and its variability with a mean chart aligned above its corresponding R- or S-chart. Or, if you are charting individual measurements, the individual measurement chart shows above its corresponding moving range chart.s

Example. - and S-charts with varying subgroup sizes

This example uses the same data as example 1, Coating.jmp, in the Quality Control sample data folder. This time the quality characteristic of interest is the Weight 2 column. An -chart and an S chart for the process are shown in Figure 7.11.

To replicate this example,

• Choose the Graph > Control Chart > XBar command.

• Select the chart types of XBar and S.

• Specify Weight 2 as the Process variable.

• Specify the column, Sample as the Sample Label variable.

• The Sample Size option should automatically change to Sample Grouped by Sample Label.

• Click OK.

X

X

X

X

X

126 7 Control Charts Shewhart Control Charts for Variables

Figure 7.11 and S charts for Varying Subgroup Sizes

Weight 2 has several missing values in the data, so you may notice the chart has uneven limits. Although, each sample has the same number of observations, samples 1, 3, 5, and 7 each have a missing value.

Note: Although they will turn on and appear checked, no zones or tests will appear on the chart until all samples are equally sized, as neither are valid on charts with unequally sized samples. If the samples change while the chart is open and they become equally sized, and the zone and/or test option is selected, the zones and/or tests will be applied immediately and appear on the chart.

Run Charts

Run charts display a column of data as a connected series of points. The following example is a Run chart for the Weight variable from Coating.jmp.

X

7 Control Charts Shewhart Control Charts for Variables 1277

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Figure 7.12 Run Chart

When you select the Show Center Line option in the Run Chart drop-down, a line is drawn through the center value of the column.The center line is determined by the Use Median setting of the plat-form drop-down. When Use Median is selected, the median is used as the center line. Otherwise, the mean is used. When saving limits to a file, both the overall mean and median are saved.

Run charts can also plot the group means when a sample label is given, either on the dialog or through a script.

Individual Measurement Charts

Individual Measurement Chart Type displays individual measurements. Individual Measurement charts are appropriate when only one measurement is available for each subgroup sample.

Moving Range Chart Type displays moving ranges of two or more successive measurements. Moving ranges are computed for the number of consecutive measurements you enter in the Range Span box. The default range span is 2. Because moving ranges are correlated, these charts should be interpreted with care.

Example. Individual Measurement and Moving Range Charts

The Pickles.jmp data in the Quality Control sample data folder contains the acid content for vats of pickles. Because the pickles are sensitive to acidity and produced in large vats, high acidity ruins an entire pickle vat. The acidity in four vats is measured each day at 1, 2, and 3 PM. The data table records day, time, and acidity measurements. The dialog in Figure 7.13 creates Individual Measurement and Moving Range charts with date labels on the horizontal axis.

128 7 Control Charts Shewhart Control Charts for Variables

Figure 7.13 Launch Dialog for Individual Measurement and Moving Range Chart

To complete this example,

• Choose the Graph > Control Chart > IR command.

• Select both Individual Measurement and Moving Range chart types.

• Specify Acid as the Process variable.

• Specify Date as the Sample Label variable.

• Click OK.

The individual measurement and moving range charts shown in Figure 7.14 monitor the acidity in each vat produced.

7 Control Charts Shewhart Control Charts for Attributes 1297

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Figure 7.14 Individual Measurement and Moving Range Charts for Pickles Data

Note: If you choose a Median Moving range chart, the limits on the Individuals chart use the Median Moving Range as the sigma, rather than the Average Moving Range.

Shewhart Control Charts for AttributesIn the previous types of charts, measurement data was the process variable. This data is often continu-ous, and the charts are based on continuous theory. Another type of data is count data, where the vari-able of interest is a discrete count of the number of defects or blemishes per subgroup. For discrete count data, attribute charts are applicable, as they are based on binomial and poisson models. Since the counts are measured per subgroup, it is important when comparing charts to determine whether you have similar number of items in the subgroups between the charts. Attribute charts, like variables charts, are classified according to the subgroup sample statistic plotted on the chart:

Table 7.4 Determining which Attribute Chart to useEach item is judged as either conforming or non-conforming

For each item, the number of defects is counted

The subgroups are a constant size

The subgroups vary in size

The subgroups are a constant size

The subgroups vary in size

np-chart p-chart c-Chart u-chart

130 7 Control Charts Shewhart Control Charts for Attributes

• p-charts display the proportion of nonconforming (defective) items in subgroup samples which can vary in size. Since each subgroup for a p-chart consists of N items, and an item is judged as either conforming or nonconforming, the maximum number of nonconforming items in a subgroup is N.

• np-charts display the number of nonconforming (defective) items in constant sized subgroup sam-ples. Since each subgroup for a np-chart consists of Ni items, and an item is judged as either con-forming or nonconforming, the maximum number of nonconforming items in subgroup i is Ni.

• c-charts display the number of nonconformities (defects) in a subgroup sample that usually consists of one inspection unit.

• u-charts display the number of nonconformities (defects) per unit in subgroup samples that can have a varying number of inspection units.

p- and np-Charts

Example. np-Charts

The Washers.jmp data in the Quality Control sample data folder contains defect counts of 15 lots of 400 galvanized washers. The washers were inspected for finish defects such as rough galvanization and exposed steel. If a washer contained a finish defect, it was deemed nonconforming or defective. Thus, the defect count represents how many washers were defective for each lot of size 400. To replicate this example, follow these steps:

• Choose the Graph > Control Chart > NP command.

• Choose # defects as the Process variable.

• Change the Constant Size to 400.

• Click OK.

The example here illustrates an np-chart for the number of defects.

Figure 7.15 np-Chart

Example. p-Charts

Again, using the Washers.jmp data, we can specify a sample size variable, which would allow for vary-ing sample sizes.

7 Control Charts Shewhart Control Charts for Attributes 1317

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Note: This data contains all constant sample sizes. Follow these steps or submit the JSL script below:

• Choose the Graph > Control Chart > P command.

• Choose Lot as the Sample Label variable.


• Choose Lot Size as the Sample Size variable.

• Click OK.

The chart shown here illustrates a p-chart for the proportion of defects.

Figure 7.16 p-Chart

Note that although the points on the chart look the same as the np-chart, the y-axis, Avg and limits are all different since they are now based on proportions

u-Charts

The Braces.jmp data in the Quality Control sample data folder records the defect count in boxes of automobile support braces. A box of braces is one inspection unit. The number of boxes inspected (per day) is the subgroup sample size, which can vary. The u-chart, shown here, is monitoring the number of brace defects per subgroup sample size. The upper and lower bounds vary according to the number of units inspected.

Note:When you generate a u-chart, and select Capability, JMP-SE launches the Poisson Fit in Distri-bution and gives a Poisson-specific capability analysis.

132 7 Control Charts Shewhart Control Charts for Attributes

Figure 7.17 u-Chart

Example. u-Charts

To replicate this example, follow these steps or submit the JSL below.

• Open the Braces.jmp data in the Quality Control sample data folder.

• Choose the Graph > Control Chart > U command.


• Choose Unit size as the Unit Size variable.

• Choose Date as the Sample Label.

• Click OK.

c-Charts

c-charts are similar to u-charts in that they monitor the number of nonconformities in an entire sub-group, made up of one or more units. However, they require constant subgroup sizes. c-charts can also be used to monitor the average number of defects per inspection unit.

Note:When you generate a c-chart, and select Capability, JMP-SE launches the Poisson Fit in Distribu-tion and gives a Poisson-specific capability analysis.

Example 10. c-Charts for Noncomformities per Unit

In this example, a clothing manufacturer ships shirts in boxes of ten. Prior to shipment, each shirt is inspected for flaws. Since the manufacturer is interested in the average number of flaws per shirt, the number of flaws found in each box is divided by ten and then recorded. To replicate this example, fol-low these steps or submit the JSL below.

• Open the Shirts.jmp data in the Quality Control sample data folder.

• Choose the Graph > Control Chart > C command.

• Choose # Defects as the Process variable.

• Choose Box Size as the Sample Size.

• Choose Box as the Sample Label.

7 Control Charts Phases 1337

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

• Click OK.Figure 7.18 c-Chart

PhasesA phase is a group of consecutive observations in the data table. For example, phases might correspond to time periods during which a new process is brought into production and then put through successive changes. Phases generate, for each level of the specified Phase variable, a new sigma, set of limits, zones, and resulting tests.

On the dialog for -, r-, s-, IR-, p-, np-, c-, u-, Presummarized, and Levey-Jennings charts, a Phase variable button appears. If a phase variable is specified, the phase variable is examined, row by row, to identify to which phase each row belongs.

Saving to a limits file reveals the sigma and specific limits calculated for each phase.

Example

Open Diameter.JMP, found in the Quality Control sample data folder. This data set contains the diam-eters taken for each day, both with the first prototype (phase 1) and the second prototype (phase 2).

• Select Graph > Control Chart > XBar.

• Choose DIAMETER as the Process, DAY as the Sample Label, and Phase as the Phase.

• Click OK.

X

134 7 Control Charts Phases

Figure 7.19 Launch Dialog for Phases

The resulting chart has different limits for each phase

Figure 7.20 Phase Control Chart

7 Control Charts Cumulative Sum (Cusum) Charts 1357

Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Cumulative Sum (Cusum) ChartsCumulative Sum (Cusum) charts display cumulative sums of subgroup or individual measurements from a target value. Cusum charts are graphical and analytical tools for deciding whether a process is in a state of statistical control and for detecting a shift in the process mean.

JMP cusum charts can be one-sided, which detect a shift in one direction from a specified target mean, or two-sided to detect a shift in either direction. Both charts can be specified in terms of geometric parameters (h and k described in Figure 7.21); two-sided charts allow specification in terms of error probabilities α and β.

To interpret a two-sided Cusum chart, you compare the points with limits that compose a V-mask. A V-mask is formed by plotting V-shaped limits. The origin of a V-mask is the most recently plotted point, and the arms extended backward on the x-axis, as in Figure 7.21. As data are collected, the cumulative sum sequence is updated and the origin is relocated at the newest point.

Figure 7.21 Illustration of a V-Mask for a Two-Sided Cusum Chart

Shifts in the process mean are visually easy to detect on a cusum chart because they produce a change in the slope of the plotted points. The point where the slope changes is the point where the shift occurs. A condition is out-of-control if one or more of the points previously plotted crosses the upper or lower arm of the V-mask. Points crossing the lower arm signal an increasing process mean, and points crossing the upper arm signal a downward shift.

There are major differences between cusum charts and other control (Shewhart) charts:

• A Shewhart control chart plots points based on information from a single subgroup sample. In cusum charts, each point is based on information from all samples taken up to and including the current subgroup.

• On a Shewhart control chart, horizontal control limits define whether a point signals an out-of-con-trol condition. On a cusum chart, the limits can be either in the form of a V-mask or a horizontal decision interval.

upper arm

lowerarm

vertex

d hthe rise in the armcorresponding to the distance(d) from origin to vertex

1 unit k, the rise in thearm corresponding toone sampling unit

136 7 Control Charts Cumulative Sum (Cusum) Charts

• The control limits on a Shewhart control chart are commonly specified as 3σ limits. On a cusum chart, the limits are determined from average run length, from error probabilities, or from an eco-nomic design.

A cusum chart is more efficient for detecting small shifts in the process mean. Lucas (1976) comments that a V-mask detects a 1σ shift about four times as fast as a Shewhart control chart.

Launch Options for Cusum Charts

When you choose Graph > Control Charts > Cusum, the Control Charts Launch dialog appears, including appropriate options and specifications as shown here.

Note: The following items pertain only to cusum charts:

Two Sided

requests a two-sided cusum scheme when checked. If it is not checked, a one-sided scheme is used and no V-mask appears. If an H value is specified, a decision interval is displayed.

Data Units

specifies that the cumulative sums be computed without standardizing the subgroup means or individ-ual values so that the vertical axis of the cusum chart is scaled in the same units as the data.

Note: Data Units requires that the subgroup sample size be designated as constant.

Beta

specifies the probability of failing to discover that the specified shift occurred. Beta is the probability of a Type II error and is available only when you specify Alpha.

H

is the vertical distance h between the origin for the V-mask and the upper or lower arm of the V-mask for a two-sided scheme. When you click H, the Beta entry box is labeled K. You also enter a value for the increase in the lower V-mask per unit change on the subgroup axis (See Figure 7.26). For a one-sided scheme, H is the decision interval. Choose H as a multiple of the standard error.

Specify Stats

appends the panel shown here to the Control Charts Launch dialog, which lets you enter the process variable specifications.


Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Target

is the target mean (goal) for the process or population. The target mean must be scaled in the same units as the data.

Delta

specifies the absolute value of the smallest shift to be detected as a multiple of the process standard devi-ation or of the standard error, depending on whether the shift is viewed as a shift in the population mean or as a shift in the sampling distribution of the subgroup mean, respectively. Delta is an alterna-tive to the Shift option (described next). The relationship between Shift and Delta is given by

where δ represents Delta, ∆ represents the shift, σ represents the process standard deviation, and n is the (common) subgroup sample size.

Shift

is the minimum value you want to detect on either side of the target mean. You enter the shift value in the same units as the data, and you interpret it as a shift in the mean of the sampling distribution of the subgroup mean. You can choose either Shift or Delta.

Sigma

specifies a known standard deviation, σ0, for the process standard deviation, σ. By default, the Control Chart platform estimates sigma from the data. You can use Sigma instead of the Alpha option on the Control Charts Launch dialog.

Head Start

specifies an initial value for the cumulative sum, S0, for a one-sided cusum scheme (S0 is usually zero). Enter Head Start as a multiple of standard error.

Cusum Chart Options

Cusum charts have these options (in addition to standard chart options).

Show Points

shows or hides the sample data points.

Connect Points

connects the sample points with a straight line.

δ ∆σ n( )⁄( )

------------------------=


Mask Color

displays the JMP color palette for you to select a line color for the V-mask.

Connect Color

displays the JMP color palette for you to select a color for the connect line when the Connect Points option is in effect.

Center Line Color

displays the JMP color palette for you to select a color for the center line.

Show Shift

shows or hides the shift you entered, or center line.

Show V Mask

shows or hides the V-mask based on the parameters (statistics) specified on the Control Charts Launch dialog when Cusum is selected as the Chart Type.

Show Parameters

displays a Parameters table (see Figure 7.26) that summarizes the Cusum charting parameters.

Show ARL

displays the average run length (ARL) information.

Example 1. Two-Sided Cusum Chart with V-mask

To see an example of a two-sided cusum chart, open the Oil1 Cusum.jmp file from the Quality Control sample data folder. A machine fills 8-ounce cans of two-cycle engine oil additive. The filling process is believed to be in statistical control. The process is set so that the average weight of a filled can, µ0, is 8.10 ounces. Previous analysis shows that the standard deviation of fill weights, σ0, is 0.05 ounces.

Subgroup samples of four cans are selected and weighed every hour for twelve hours. Each observation in the Oil1 Cusum.jmp data table contains one value of weight along with its associated value of hour. The observations are sorted so that the values of hour are in increasing order. The Control Chart plat-form assumes that the data are sorted in increasing order.

A two-sided cusum chart is used to detect shifts of at least one standard deviation in either direction from the target mean of 8.10 ounces.

To create a Cusum chart for this example,

• Choose the Graph > Control Chart > CUSUM command.

• Click the Two Sided check box if it is not already checked.

• Specify weight as the Process variable.

• Specify hour as the Sample Label.

• Click the H radio button and enter 2 into the text box.


Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

• Click Specify Stats to open the Known Statistics for CUSUM chart tab.

• Set Target to the average weight of 8.1.

• Enter a Delta value of 1.

• Set Sigma to the standard deviation of 0.05.

The finished dialog should look like the one in Figure 7.22.

Figure 7.22 Dialog for Cusum Chart Example

When you click OK, the chart in Figure 7.23 appears.

Figure 7.23 Cusum Chart for Oil1 Cusum.jmp Data

You can interpret the chart by comparing the points with the V-mask whose right edge is centered at the most recent point (hour=12). Because none of the points cross the arms of the V-mask, there is no evidence that a shift in the process has occurred.


A shift or out-of-control condition is signaled at a time t if one or more of the points plotted up to the time t cross an arm of the V-mask. An upward shift is signaled by points crossing the lower arm, and a downward shift is signaled by points crossing the upper arm. The time at which the shift occurred cor-responds to the time at which a distinct change is observed in the slope of the plotted points.

The cusum chart automatically updates when you add new samples. The Cusum chart in Figure 7.24 is the previous chart with additional points. You can move the origin of the V-mask by using the hand to click a point. The center line and V-mask adjust to reflect the process condition at that point.

Figure 7.24 Updated Cusum Chart for the OIL Data

Example 2. One-Sided Cusum Chart with no V-mask

Consider the data used in Example 1, where the machine fills 8-ounce cans of engine oil. Consider also that the manufacturer is now concerned about significant over-filling in order to cut costs, and not so concerned about under-filling. A one-sided Cusum Chart can be used to identify data approaching or exceeding the side of interest. Anything 0.25 ounces beyond the mean of 8.1 is considered a problem. To do this example,

• Choose the Graph > Control Chart > CUSUM command.

• Deselect the Two Sided check box.

• Specify weight as the Process variable.

• Specify hour as the Sample Label.

• Click the H radio button and enter 0.25 into the text box.

• Click Specify Stats to open the Known Statistics for CUSUM chart tab.

• Set Target to the average weight of 8.1.

• Enter a Delta value of 1.

• Set Sigma to the standard deviation 0.05.

The resulting output should look like the picture in Figure 7.25.


Qu

ality

Co

ntro

l with

Co

ntro

l Ch

arts

Figure 7.25 One-Sided Cusum Chart for the OIL Data

Notice that the decision interval or horizontal line is set at the H-value entered (0.25). Also note that no V-mask appears with One-Sided Cusum charts.

The Show Parameters option in the Cusum chart popup menu shows the Parameters report in Figure 7.26. The parameters report summarizes the charting parameters from the Known Statistics for CUSUM chart tab on the Control Chart Launch dialog. An additional chart option, Show ARL, adds the average run length (ARL) information to the report. The average run length is the expected number of samples taken before an out-of-control condition is signaled:

• ARL (Delta), sometimes denoted ARL1, is the average run length for detecting a shift the size of the specified Delta

• ARL(0), sometimes denoted ARL0, is the in-control average run length for the specified parameters (Montogomery (1985)).

Figure 7.26 Show Parameters and Show ARL Options

8Time Series

The Time Series platform lets you explore, analyze, and forecast univariate time series. A time series is a set y1, y2, ... ,yN of observations taken over a series of equally-spaced time periods. The analysis begins with a plot of the points in the time series. In addition, the platform displays graphs of the autocorrela-tions and partial autocorrelations of the series. These indicate how and to what degree each point in the series is correlated with earlier values in the series and can be used to identify the type of model appro-priate for describing and predicting (forecasting) the evolution of the time series. The model types include

• ARIMA, autoregressive integrated moving-average, often called Box-Jenkins models

• Smoothing Models, several forms of exponential smoothing and Winter’s method.

Note: The Time Series Launch dialog requires that one or more continuous variables be assigned as the time series. Optionally, you can specify a time ID variable, which is used to label the time axis. If a time ID variable is specified, it must be continuous, sorted ascending, evenly spaced, and without missing values.

IntroductionThe data for the next examples are in the Seriesg.jmp table found in the Time Series sample data folder (Box and Jenkins 1976). The time series variable is Passengers and the time ID is Time.

Select Analyze > Time Series to display the Time Series Launch dialog (Figure 8.1).

This dialog allows you to specify the number of lags to use in computing the autocorrelations and par-tial autocorrelations. It also lets you specify the number of future periods to forecast using each model fitted to the data.

For this example, assign Passengers as Y, Time Series and Time as X, Time ID.Figure 8.1 Launch Dialog

144 8 Time Series Introduction

The first thing you see is a graph showing the time series, its autocorrelation graph, and its partial auto-correlation graph.

Figure 8.2 Initial Time Series Report

The graph shows that the series has an increasing spread over time. This should be accounted for before modeling the series. In general, increasing variances are transformed using logarithms. A column con-taining a count of Log Passengers is already included in the table.

Again select Analyze > Time Series

Assign Log Passengers as Y, Time Series and Time as X, Time ID.

The series now has an acceptable appearance for modeling.

8 Time Series Introduction 1458

Tim

e S

erie

s

Figure 8.3 Log Passenger Series

Since the autocorrelation graph decreases slowly and steadily, but the partial autocorrelation graph drops off drastically after lag 1, a reasonable guess for a model is an MA(1). To try this model,

Select ARIMA from the platform menu.

Enter a 1 beside q, Moving Average Order.

Click Estimate.

JMP Student Edition estimates the model and displays a model summary, parameter estimates, and a forecast graph. The most important graph, however, is the residuals, which is initially closed.

146 8 Time Series Introduction

Figure 8.4 Model Results

Open the Residuals node to reveal a graph and autocorrelation plots for the model residuals.

Figure 8.5 MA(1)Model Results

8 Time Series The Time Series Platform 1478

Tim

e S

erie

s

The expected reduction in spikes did not occur, so an MA(1) is not an appropriate model. A second model, an MA(2), can be run in the same way.

Select ARIMA from the platform menu.

Enter a 2 beside q, Moving Average Order.

Click Estimate.

Similar unsatisfactory results appear. However, note that JMP is accumulating a list of models, along with appropriate fit statistics, in the Model Comparison table.

Figure 8.6 Model Comparison Table

Examine the R2 for the two models in this table. In fact, the MA(2) is a worse fit than the MA(1). Some reflection is necessary.

The Time Series PlatformFirst, assign columns for analysis with the dialog in Figure 8.1. The selector list at the left of the dialog shows all columns in the current table. To cast a column into a role, select one or more columns in the column selector list and click a role button. Or, drag variables from the column selector list to one of the following role boxes:

X, Time ID for the x-axis, one variable used for labeling the time axis

Y, Time Series for the y-axis, one or more time series variables.

To remove an unwanted variable from an assigned role, select it in the role box and click Remove. After assigning roles, click OK to see the analysis for each time series variable versus the time ID.

You set the number of lags for the autocorrelation and partial autocorrelation plots in the Autocorrelation Lags box. This is the maximum number of periods between points used in the com-putation of the correlations. It must be more than one but less than the number of rows. A commonly used rule of thumb for the maximum number of lags is n/4, where n is the number of observations. The Forecast Periods box allows you to set the number of periods into the future that the fitted mod-els are forecast. By default, JMP uses 25 lags and 25 forecast periods

The Time Series Graph

The Time Series platform begins with a plot of each times series by the time ID, or row number if no time ID is specified (Figure 8.7). The plot, like others in JMP, has features to resize the graph, highlight points with the cursor or brush tool, and label points.

148 8 Time Series Time Series Commands

Figure 8.7 Time Series Plot of Seriesg (Airline Passenger) Data

By default, graphs of the autocorrelation and partial autocorrelation (Figure 8.5) of the time series are also shown, but can be hidden with commands from the platform popup menu on the Time Series title bar.

The platform popup menu, discussed next, also has fitting commands and options for displaying addi-tional graphs and statistical tables.

Time Series Commands The popup menu next to the time series name has the commands shown here.

The first three items in this menu control the descriptive and diagnostic graphs and tables. These are typically used to determine the nature of the model to be fitted to the series.

The ARIMA and Smoothing Model commands are for fitting various models to the data and produc-ing forecasts. You can select the model fitting commands repeatedly. The result of each new fit is appended to the report. After the first model has been fit, a summary of all the models is inserted just above the first model report (an example is shown in “Model Comparison Table,” p. 151).

The following sections describe options and model fits, discuss statistical results, and cover additional platform features.

8 Time Series Time Series Commands 1498

Tim

e S

erie

s

Graph

The Time Series platform begins by showing a time series plot, like the one shown previously in Figure 8.7. The Graph command on the platform popup menu has a submenu of controls for the time series plot with the following commands.

• Time Series Graph hides or displays the time series graph.

• Show Points hides or displays the points in the time series graph.

• Connecting Lines hides or displays the lines connecting the points in the time series graph.

• Mean Line hides or displays a horizontal line in the time series graph that depicts the mean of the time series.

Autocorrelation

The Autocorrelation command alternately hides or displays the autocorrelation graph of the sample, often called the sample autocorrelation function. This graph describes the correlation between all the pairs of points in the time series with a given separation in time or lag. By definition, the first autocor-relation (lag 0) always has length 1.

In addition, confidence curves show twice the large-lag standard error (± 2 standard errors). The auto-correlation plot for the Seriesg data is shown on the left in Figure 8.8. You can examine the autocorre-lation and partial autocorrelations plots to determine whether the time series is stationary (meaning it has a fixed mean and standard deviation over time) and what model might be appropriate to fit the time series.

Partial Autocorrelation

The Partial Autocorrelation command alternately hides or displays the graph of the sample partial autocorrelations. The plot on the right in Figure 8.8 shows the partial autocorrelation function for the Seriesg data. The solid black lines represent ± 2 standard errors for approximate 95% confidence limits

150 8 Time Series Modeling Reports

Figure 8.8 Autocorrelation and Partial Correlation Plots

Number of Forecast Periods

The Number of Forecast Periods command displays a dialog for you to reset the number of periods into the future that the fitted models will forecast. The initial value is set in the Time Series Launch dia-log. All existing and future forecast results will show the new number of periods with this command.

Modeling ReportsThe time series modeling commands are used to fit theoretical models to the series and use the fitted model to predict (forecast) future values of the series. These commands also produce statistics and residuals that allow you to ascertain the adequacy of the model you have elected to use. You can select the modeling commands repeatedly. Each time you select a model, a report of the results of the fit and a forecast is added to the platform results.

The fit of each model begins with a dialog that lets you specify the details of the model being fit as well as how it will be fit. Each general class of models has its own dialog, as discussed previously in their respective sections. The models are fit by maximizing the likelihood function, using a Kalman filter to compute the likelihood function. The ARIMA, seasonal ARIMA, and smoothing models begin with the following report tables.

8 Time Series Modeling Reports 1518

Tim

e S

erie

s

Model Comparison Table

The Model Comparison table summarizes the fit statistics for each model. You can use it to compare several models fitted to the same time series. Each row corresponds to a different model. The numerical values in the table are drawn from the Model Summary table for each fitted model. The Model Com-parison table shown above summarizes the ARIMA models (1, 0, 0), (0, 0, 1), and (1, 0, 1) respectively.

Model Summary Table

Each model fit generates a Model Summary table, which summarizes the statistics of the fit. In the for-mulae below, n is the number of nonmissing observations and k is the number of fitted parameters in the model.

• DF is the number of degrees of freedom in the fit, n – k.

• Sum of Squared Errors is the sum of the squares of the prediction errors, SSE.

• Variance Estimate is the unconditional sum of squares (SSE) divided by the number of degrees of freedom, SSE / (n – k). This is the sample estimate of the variance of the random shocks at, described in the section “ARIMA Model,” p. 154.

• Standard Deviation is the square root of the variance estimate. This is a sample estimate of the stan-dard deviation of at, the random shocks

• Akaike’s Information Criterion [AIC], Schwartz’s Bayesian Criterion [SBC or BIC] are goodness of fit statistics, detailed in the online help. Smaller values of these criteria indicate better fit.

• RSquare and RSquare Adj are also goodness of fit statistics, where values closer to 1 indicate a bet-ter fit.

• –2LogLikelihood is minus two times the natural log of the likelihood function evaluated at the best-fit parameter estimates. Smaller values are better fits.

• Stable indicates whether the autoregressive operator is stable. That is, whether all the roots of lie outside the unit circle.

• Invertible indicates whether the moving average operator is invertible. That is, whether all the roots of lie outside the unit circle.

Note: The φ and θ operators are defined in the section “ARIMA Model,” p. 154.

φ z( ) 0=

θ z( ) 0=

152 8 Time Series Modeling Reports

Parameter Estimates Table

There is a Parameter Estimates table for each selected fit, which gives the estimates for the time series model parameters. Each type of model has its own set of parameters. They are described in the sections on specific time series models. The Parameter Estimates table has these terms:

• Term lists the name of the parameter. These are described below for each model type. Some models contain an intercept or mean term. In those models, the related constant estimate is also shown. The definition of the constant estimate is given under the description of ARIMA models.

• Factor (Seasonal ARIMA only) lists the factor of the model that contains the parameter. This is only shown for multiplicative models. In the multiplicative seasonal models, Factor 1 is nonseasonal and Factor 2 is seasonal.

• Lag lists the degree of the lag or backshift operator that is applied to the term to which the parame-ter is multiplied.

• Estimate lists the parameter estimates of the time series model.

• Std Error lists the estimates of the standard errors of the parameter estimates. They are used in con-structing tests and confidence intervals.

• t Ratio lists the test statistics for the hypotheses that each parameter is zero. It is the ratio of the parameter estimate to its standard error. If the hypothesis is true, then this statistic has an approxi-mate Student’s t-distribution. Looking for a t-ratio greater than 2 in absolute value is a common rule of thumb for judging significance because it approximates the 0.05 significance level.

• Prob>|t| lists the observed significance probability calculated from each t-ratio. It is the probability of getting, by chance alone, a t-ratio greater (in absolute value) than the computed value, given a true hypothesis. Often, a value below 0.05 (or sometimes 0.01) is interpreted as evidence that the parameter is significantly different from zero.

The Parameter Estimates table also gives the Constant Estimate, for models that contain an intercept or mean term. The definition of the constant estimate is given under “ARIMA Model,” p. 154.

8 Time Series Modeling Reports 1538

Tim

e S

erie

s

Forecast Plot

Each model has its own Forecast plot. The Forecast plot shows the values that the model predicts for the time series. It is divided by a vertical line into two regions. To the left of the separating line the one-step-ahead forecasts are shown overlaid with the input data points. To the right of the line are the future values forecast by the model and the confidence intervals for the forecasts.

You can control the number of forecast values by changing the setting of the Forecast Periods box in the platform launch dialog or by selecting Number of Forecast Periods from the Time Series drop-down menu. The data and confidence intervals can be toggled on and off using the Show Points and Show Confidence Interval commands on the model’s popup menu.

Residuals

The graphs under the residuals section of the output show the values of the residuals based on the fitted model. These are the actual values minus the one-step-ahead predicted values. In addition, the autocor-relation and partial autocorrelation of these residuals are shown. These can be used to determine whether the fitted model is adequate to describe the data. If it is, the points in the residual plot should be normally distributed about the zero line and the autocorrelation and partial autocorrelation of the residuals should not have any significant components for lags greater than zero.

Iteration History

The model parameter estimation is an iterative procedure by which the log-likelihood is maximized by adjusting the estimates of the parameters. The iteration history for each model you request shows the value of the likelihood function for each iteration. This can be useful for diagnosing problems with the fitting procedure. Attempting to fit a model which is poorly suited to the data can result in a large number of iterations that fail to converge on an optimum value for the likelihood.

154 8 Time Series ARIMA Model

Model Report Options

The title bar for each model you request has the popup menu shown to the right, with the following options for that model:

Show Points hides or shows the data points in the forecast graph.

Show Confidence Interval hides or shows the confidence intervals in the forecast graph.

Save Columns creates a new data table with columns representing the results of the model.

Residual Statistics controls which displays of residual statistics are shown for the model. These dis-plays are described in the section “Time Series Commands,” p. 148; however, they are applied to the residual series (the one-step-ahead model predictions minus the input series).

ARIMA ModelAn AutoRegressive Integrated Moving Average (ARIMA) model predicts future values of a time series by a linear combination of its past values and a series of errors (also known as random shocks or innova-tions). The ARIMA command performs a maximum likelihood fit of the specified ARIMA model to the time series.

For a response series , the general form for the ARIMA model is:

where

t is the time index

B is the backshift operator defined as

is the response series after differencing

µ is the intercept or mean term.

and , respectively, the autoregressive operator and the moving average operator and are written

and

are the sequence of random shocks.

The are assumed to be independent and normally distributed with mean zero and constant vari-

ance. The model can be rewritten as

where the constant estimate δ is given by the relation

yi{ }

φ B( ) wt µ–( ) θ B( )at=

Byt yt 1–=

wt 1 B–( )dyt=

φ B( ) θ B( )

φ B( ) 1 φ1B– φ2B2

– …– φpBp

–= θ B( ) 1 θ1B– θ2B2

– …– θqBq

–=

at

at

φ B( )wt δ θ B( )at+=

8 Time Series Smoothing Models 1558

Tim

e S

erie

s

.

The ARIMA command displays the Specify ARIMA Model dialog, which allows you to specify the ARIMA model you want to fit. The results appear when you click Estimate.

Use the Specify ARIMA Model dialog for the following three orders that can be specified for an ARIMA model:

1 The Autoregressive Order is the order (p) of the polynomial operator.

2 The Differencing Order is the order (d) of the differencing operator.

3 The Moving Average Order is the order (q) of the differencing operator .

4 An ARIMA model is commonly denoted ARIMA(p,d,q). If any of p,d, or q are zero, the correspond-ing letters are often dropped. For example, if p and d are zero, then model would be denoted MA(q).

The Confidence Intervals box allows you to set the confidence level between 0 and 1 for the forecast confidence bands. The Intercept check box determines whether the intercept term µ will be part of the model. If the Constrain fit check box is checked, the fitting procedure will constrain the autoregressive parameters to always remain within the stable region and the moving average parameters within the invertible region. You might want to uncheck this box if the fitter is having difficulty finding the true optimum or if you want to speed up the fit. You can check the Model Summary table to see if the resulting fitted model is stable and invertible.

Smoothing ModelsJMP offers a variety of smoothing techniques.

Smoothing models represent the evolution of a time series by the model:

where

is the time-varying mean term,

is the time-varying slope term,

is one of the s time-varying seasonal terms,

are the random shocks.

δ φ B( )µ µ φ1µ– φ2µ– …– φpµ–= =

ϕ B( )

θ B( )

yt µt βt s t( ) at+ + +=

µt

βt

s t( )

at

156 8 Time Series Smoothing Models

Models without a trend have and nonseasonal models have . The estimators for these time-varying terms are

smoothed level that estimates

is a smoothed trend that estimates

for are the estimates of the .

Each smoothing model defines a set of recursive smoothing equations that describes the evolution of these estimators. The smoothing equations are written in terms of model parameters called smoothing weights. They are

α, the level smoothing weight

γ, the trend smoothing weight

ϕ, the trend damping weight

δ, the seasonal smoothing weight.

While these parameters enter each model in a different way (or not at all), they have the common prop-erty that larger weights give more influence to recent data while smaller weights give less influence to recent data.

Each smoothing model has an ARIMA model equivalent. These ARIMA equivalents are used to esti-mate the smoothing weights and provide forecasts. You may not be able to specify the equivalent ARIMA model using the ARIMA command because some smoothing models intrinsically constrain the ARIMA model parameters in ways the ARIMA command will not allow.

Smoothing Model Dialog

The Smoothing Model dialog appears in the report window when you select one of the smoothing model commands.

The Confidence Intervals box allows you to set the confidence level for the forecast confidence bands. The dialogs for seasonal smoothing models include a Periods Per Season box for setting the number of periods in a season. The dialog also lets you to specify what type of constraint you want to enforce on the smoothing weights during the fit. The constraints are:

Zero To One keeps the values of the smoothing weights in the range zero to one.

Unconstrained allows the parameters to range freely.

Stable Invertible constrains the parameters such that the equivalent ARIMA model is stable and invertible.

Custom expands the dialog to allow you to set constraints on individual smoothing weights.

βt 0= s t( ) 0=

Lt µt

Tt βt

St j– j 0 1 … s 1–, , ,= s t( )


Tim

e S

erie

s

Each smoothing weight can be Bounded, Fixed, or Unconstrained as determined by the setting of the popup menu next to the weight’s name.

The example shown here has the Level weight (α) fixed at a value of 0.3 and the Trend weight (γ) bounded by 0 and 1. In this case, the value of the Trend weight is allowed to move within the range 0 to 1 while the Level weight is held at 0.3. Note that you can specify all the smoothing weights in advance by using these custom constraints. In that case, none of the weights would be estimated from the data although forecasts and residuals would still be computed. When you click Estimate, the results of the fit appear in place of the dialog.

Simple Exponential Smoothing

The model for simple exponential smoothing is .

The smoothing equation, Lt = αyt + (1 – α)Lt-1, is defined in terms of a single smoothing weight α. This model is equivalent to an ARIMA(0, 1, 1) model where

with .

The moving average form of the model is

Double (Brown) Exponential Smoothing

The model for double exponential smoothing is .

The smoothing equations, defined in terms of a single smoothing weight α are

and .

This model is equivalent to an ARIMA(0, 1, 1)(0, 1, 1)1 model

where with .


yt µt αt+=

1 B–( )yt 1 θB–( )αt= θ 1 α–=

yt at αat j–j 1–

∞

∑+=

yt µt β1t at+ +=

Lt αyt 1 α–( )Lt 1–+= Tt α Lt Lt 1––( ) 1 α–( )Tt 1–+=

1 B–( )2yt 1 θB–( )2

at= θ1 1, θ2 1,= θ 1 α–=

158 8 Time Series Smoothing Models

Linear (Holt) Exponential Smoothing

The model for linear exponential smoothing is .

The smoothing equations defined in terms of smoothing weights α and γ are

and

This model is equivalent to an ARIMA(0, 2, 2) model where

with and .


Damped-Trend Linear Exponential Smoothing

The model for damped-trend linear exponential smoothing is .

The smoothing equations in terms of smoothing weights α, γ, and ϕ are

and

This model is equivalent to an ARIMA(1, 1, 2) model where

with and .


Seasonal Exponential Smoothing

The model for seasonal exponential smoothing is .

The smoothing equations in terms of smoothing weights α and δ are

and

This model is equivalent to a seasonal ARIMA(0, 1, 1)(0, 1, 0)S model where we define

, , and

yt at 2α j 1–( )α2+( )at j–

j 1=

∞

∑+=

yt µt βtt at+ +=

Lt αyt 1 α–( ) Lt 1– Tt 1–+( )+= Tt γ Lt Lt 1––( ) 1 γ–( )Tt 1–+=

1 B–( )2yt 1 θB– θ2B

2–( )at= θ 2 α– αγ–= θ2 α 1–=

yt at α jαγ+( )at j–j 1=

∞

∑+=

yt µt βtt at+ +=

Lt αyt 1 α–( ) Lt 1– ϕTt 1–+( )+= Tt γ Lt Lt 1––( ) 1 γ–( )ϕTt 1–+=

1 ϕB–( ) 1 B–( )yt 1 θ1B– θ2B2

–( )at= θ1 1 ϕ α– αγϕ–+= θ2 α 1–( )ϕ=

yt αtα αγϕ ϕj

1–( )+ϕ 1–

---------------------------------------⎝ ⎠⎜ ⎟⎛ ⎞

αt j–j 1=

∞

∑+=

yt µt s t( ) at+ +=

Lt α yt St s––( ) 1 α–( )Lt 1–+= St δ yt Lt s––( ) 1 δ–( )ϕSt s–+=

θ1 θ1 1,= θ2 θ2 s, θ2 s,= = θ3 θ1 1, θ2 s,–=


Tim

e S

erie

s

so

with

, , and .


where

Winters Method (Additive)

The model for the additive version of Winter’s method is .

The smoothing equations in terms of weights α, γ, and δ are

, , and

.

This model is equivalent to a seasonal ARIMA(0, 1, s+1)(0, 1, 0)s model


where

1 B–( ) 1 Bs

–( )yt 1 θ1B– θ2B2

– θ3Bs 1+

–( )at=

θ1 1 α–= θ2 δ 1 α–( )= θ3 1 α–( ) δ 1–( )=

yt at ψjat j–j 1=

∞

∑+= ψα for jmods 0≠

α δ 1 α–( ) forjmods+ 0=⎩⎪⎨⎪⎧

=

yt µt βtt s t( ) at+ + +=

Lt α yt St s––( ) 1 α–( ) Lt 1– Tt 1–+( )+= Tt γ Lt Lt 1––( ) 1 γ–( )Tt 1–+=

St δ yt Lt–( ) 1 δ–( )St s–+=

1 B–( ) 1 B2

–( )yt 1 θiBi

i 1=

s 1+

∑–⎝ ⎠⎜ ⎟⎜ ⎟⎛ ⎞

at=

yt at Ψjat j–j 1=

∞

∑+=

ψα jαγ+ for jmods 0≠α jαγ δ+ 1 α–( ) forjmods+ 0=⎩

⎨⎧

=

9Correlations and Multivariate Techniques

The Multivariate platform specializes in exploring how many variables relate to each other. The plat-form begins by showing a standard correlation matrix. The Multivariate platform popup menu gives the additional correlations options and other techniques for looking at multiple variables such as

• a scatterplot matrix with normal density ellipses

• inverse, partial, and pairwise correlations

• a covariance matrix

• nonparametric measures of association

• simple statistics (such as mean and standard deviation)

All plots and the current data table are linked. You can highlight points on any scatterplot in the scat-terplot matrix, or the outlier distance plot. The points are highlighted on all other plots and are selected in the data table.

IntroductionFor a short tour of the Multivariate platform,

Open Solubility.jmp from the sample data folder.

Select Analyze > Multivariate to bring up the launch dialog.

When the report appears, you see correlations and a scatterplot matrix.

162 9 Correlations and Multivariate Techniques Introduction

From here, you can calculate several different kinds of correlations, including nonparametric correla-tions (se below for instances of each kind.

Note that the first two variables (1-Octonol and Ether) are correlated with each other. In addition, the last four variables are similarly correlated. This suggests that the variability in these six variables could be explained in fewer dimensions. Principal Components would confirm this.

Select Principal Components > on Correlations from the platform’s drop-down list.

The principal components report, shown here, indicates that there are two strong directions of varia-tion, corresponding to the eigenvalues of 4.785 and 0.945. We can express 95% of the variation of these six dimensions in only two dimensions.

9 Correlations and Multivariate Techniques Launch the Platform and Select Options 1639

Co

rrela

tion

s a

nd

Mu

ltivaria

te S

tatis

tics

Launch the Platform and Select OptionsWhen you choose Analyze > Multivariate, a standard correlation matrix and scatterplot matrix appears first. The platform popup menu shown here lists additional correlation options and other techniques for looking at multiple variables. The following sections describe the tables and plots offered by the Multivariate platform.

In most of the following analysis options, a missing value in an observation causes the entire observa-tion to be deleted. The exceptions are in Pairwise Correlations, which exclude rows that are missing on either of the variables under consideration, and Simple Statistics > Univariate, which calculates its statistics column-by-column, without regard to missing values in other columns.

Many of the following examples use the Solubility.jmp sample data table.

Correlations Multivariate

The Correlations Multivariate option gives the Correlations table, which is a matrix of correlation coefficients that summarizes the strength of the linear relationships between each pair of response (Y) variables. This correlation matrix only uses the observations that have nonmissing values for all vari-ables in the analysis.

Inverse Correlations and Partial Correlations

The inverse correlation matrix (Inverse Corr table), shown at the top in the next figure, provides useful multivariate information. The diagonal elements of the matrix are a function of how closely the vari-able is a linear function of the other variables. In the inverse correlation, the diagonal is 1/(1 – R2) for the fit of that variable by all the other variables. If the multiple correlation is zero, the diagonal inverse

164 9 Correlations and Multivariate Techniques Launch the Platform and Select Options

element is 1. If the multiple correlation is 1, then the inverse element becomes infinite and is reported missing.

The partial correlation table (Partial Corr table) shows the partial correlations of each pair of variables after adjusting for all the other variables. This is the negative of the inverse correlation matrix scaled to unit diagonal.

Scatterplot Matrix

To help you visualize the correlations, a scatterplot for each pair of response variables displays in a matrix arrangement, as shown in Figure 9.1. The scatterplot matrix is shown by default. If the scatter-plots are not showing, select Scatterplot Matrix from the platform popup menu. The cells of the scat-terplot matrix are size-linked so that stretching a plot from any cell resizes all the scatterplot cells.

By default, a 95% bivariate normal density ellipse is imposed on each scatterplot. If the variables are bivariate normally distributed, this ellipse encloses approximately 95% of the points. The correlation of the variables is seen by the collapsing of the ellipse along the diagonal axis. If the ellipse is fairly round and is not diagonally oriented, the variables are uncorrelated.


Co

rrela

tion

s a

nd

Mu

ltivaria

te S

tatis

tics

Figure 9.1 Example of a Scatterplot Matrix

The popup menu next on the Scatterplot Matrix title bar button lets you tailor the matrix with color and density ellipses and by setting the α-level.

Density Ellipses toggles the display of the density ellipses on the scatterplots constructed by the α level that you choose. By default they are 95% ellipses.

Show Correlations shows the correlation of each histogram in the upper left corner of each scat-terplot.

Show Histogram draws histograms in the diagonal of the scatterplot matrix. These histograms can be specified as Horizontal or Vertical. In addition, you can toggle the counts that label each bar with the Show Counts command.

Two clustersof correlations:The first twovariables and the next four.


Ellipse α lets you select from a submenu of standard α-levels or select the Other command and specifically set the α level for the density ellipses.

Ellipse Color lets you select from a palette of colors to change the color of the ellipses.

You can reorder the scatterplot matrix columns by dragging a diagonal (label) cell to another position on the diagonal. For example, if you drag the cell of the column labeled 1-octanol diagonally down one cell, the columns reorder as shown in Figure 9.2.

When you look for patterns in the whole scatterplot matrix with reordered columns, you clearly see the variables cluster into groups based on their correlations, as illustrated previously by the two groups showing in Figure 9.1.

Figure 9.2 Reorder Scatterplot Matrix

Covariance Matrix

The Covariance Matrix command displays the covariance matrix for the analysis.

ShowCorrelations

Show Histogram(Horizontal in this case)

Show Counts

Drag cell diagonally


Co

rrela

tion

s a

nd

Mu

ltivaria

te S

tatis

tics

Pairwise Correlations

The Pairwise Correlations table lists the Pearson product-moment correlations for each pair of Y vari-ables, using all available values. The count values differ if any pair has a missing value for either vari-able. These are values produced by the Density Ellipse option on the Fit Y by X platform.

The Pairwise Correlations report also shows significance probabilities and compares the correlations with a bar chart, as shown in Figure 9.3.

Figure 9.3 Pairwise Correlations Report

Simple Statistics

The Simple Statistics submenu allows you to display simple statistics (mean, standard deviation, and so on) for each column. These statistics can be calculated in two ways that differ when there are missing values in the data table.

Univariate Simple Statistics are calculated on each column, regardless of values in other col-umns. These values match the ones that would be produced using the Distribution platform.


Multivariate Simple Statistics are calculated by dropping any row that has a missing value for any column in the analysis. These are the statistics that are used by the Multivariate platform to calculate correlations.

Nonparametric Correlations

When you select Nonparametric Correlations from the platform popup menu, the Nonparametric Measures of Association table is shown. The Nonparametric submenu offers these three nonparametric measures:

Spearman’s Rho is a correlation coefficient computed on the ranks of the data values instead of on the values themselves.

Kendall’s Tau is based on the number of concordant and discordant pairs of observations. A pair is concordant if the observation with the larger value of X also has the larger value of Y. A pair is discordant if the observation with the larger value of X has the smaller value of Y. There is a cor-rection for tied pairs (pairs of observations that have equal values of X or equal values of Y).

Hoeffding’s D is a statistical scale that ranges from –0.5 to 1, with large positive values indicating dependence.The statistic approximates a weighted sum over observations of chi-square statistics for two-by-two classification tables, and detects more general departures from independence.

The Nonparametric Measures of Association report also shows significance probabilities for all mea-sures and compares them with a bar chart similar to the one in Figure 9.3.

See “Computations and Statistical Details,” p. 169, for computational information.

9 Correlations and Multivariate Techniques Computations and Statistical Details 1699

Co

rrela

tion

s a

nd

Mu

ltivaria

te S

tatis

tics

Computations and Statistical Details

Pearson Product-Moment Correlation

The Pearson product-moment correlation coefficient measures the strength of the linear relationship between two variables. For response variables X and Y, it is denoted as r and computed as

.

If there is an exact linear relationship between two variables, the correlation is 1 or –1, depending on whether the variables are positively or negatively related. If there is no linear relationship, the correla-tion tends toward zero.

Nonparametric Measures of Association

For the Spearman, Kendall, or Hoeffding correlations, the data are first ranked. Computations are then performed on the ranks of the data values. Average ranks are used in case of ties.

Spearman’s ρ (rho) Coefficients

Spearman’s ρ correlation coefficient is computed on the ranks of the data using the formula for the Pearson’s correlation previously described.

Kendall’s τb Coefficients

Kendall’s τb coefficients are based on the number of concordant and discordant pairs. A pair of rows for two variables is concordant if they agree in which variable is greater. Otherwise they are discordant, or tied.

The formula

computes Kendall’s τb where

,

, and

,

Note that is equal to 1 if , 0 if , and –1 if .

The ti (the ui) are the number of tied x (respectively y) values in the ith group of tied x (respectively y) values, n is the number of observations, and Kendall’s τb ranges from –1 to 1. If a weight variable is specified, it is ignored.

rx x–( ) y y–( )∑

x x–( )2

∑ y y–( )2

∑

----------------------------------------------------------=

τb

xi xj–( )sgn yi yj–( )sgni j<∑

T0 T1–( ) T0 T2–( )----------------------------------------------------------------=

T0 n n 1–( )( ) 2⁄=

T1 ti( ) ti 1–( )( ) 2⁄∑=

T2 ui( ) ui 1–( )( ) 2⁄∑=

z( )sgn z 0> z 0= z 0<

170 9 Correlations and Multivariate Techniques Computations and Statistical Details

Computations proceed in the following way:

• Observations are ranked in order according to the value of the first variable.

• The observations are then re-ranked according to the values of the second variable.

• The number of interchanges of the first variable is used to compute Kendall’s τb.

Hoeffding’s D Statistic

The formula for Hoeffding’s D (1948) is

.

The Ri and Si are ranks of the x and y values, and the Qi (sometimes called bivariate ranks) are one plus the number of points that have both x and y values less than the ith points. A point that is tied on its x value or y value, but not on both, contributes 1/2 to Qi if the other value is less than the corresponding value for the ith point. A point tied on both x and y contributes 1/4 to Qi.

When there are no ties among observations, the D statistic has values between –0.5 and 1, with 1 indi-cating complete dependence. If a weight variable is specified, it is ignored.

Inverse Correlation Matrix

The inverse correlation matrix provides useful multivariate information. The diagonal elements of the inverse correlation matrix, sometimes called the variance inflation factors (VIF), are a function of how closely the variable is a linear function of the other variables. Specifically, if the correlation matrix is denoted R and the inverse correlation matrix is denoted R-1, the diagonal element is denoted and is computed as

,

where Ri2 is the coefficient of variation from the model regressing the ith explanatory variable on the

other explanatory variables. Thus, a large rii indicates that the ith variable is highly correlated with any number of the other variables.

Note that the definition of R2 changes for no-intercept models. For no-intercept and hidden-intercept models, JMP uses the R2 from the uncorrected Sum of Squares, i.e. from the zero model, rather than the corrected sum of squares, from the mean model.

D 30n 2–( ) n 3–( )D1 D2 2 n 2–( )D3–+

n n 1–( ) n 2–( ) n 3–( ) n 4–( )-----------------------------------------------------------------------------------------⎝ ⎠⎛ ⎞ where=

D1 Si Qi 1–( ) Qi 2–( )=

D2 Si Ri 1–( ) Si 1–( ) Si 2–( )=

D3 Ri 1–( ) Si 2–( ) Qi 1–( )=

rii

rii

VIFi1

1 Ri2

–

----------------= =

10Importing, Exporting, and Charting Data

This chapter shows you how to use JMP to interact with the rest of the world. Therefore, it illustrates ways of importing data from various formats into JMP for analysis.

No statistics package is useful unless its results can be communicated to others. Graphs and charts are usually used to summarize results, so this chapter also describes copying and pasting results from JMP Student Edition into word processors, presentation managers, or web authoring tools. In addition to techniques for accomplishing these tasks, the Chart, and Overlay Plot platforms are documented.




In this introduction, some of the variables’ statistics are charted.

Using the Chart Platform

To begin with, produce a bar chart showing the mean starch content and the maximum thread wear for each of the wash methods.

Select Graph > Chart from the menu bar.

This brings up a dialog like the one in Figure 10.7 on page 181, shown later in this chapter.

Select Starch Content (%) from the list of columns.

Click on the Statistics button and select Mean from the drop-down list.

Select Thread Wear Measured from the list of columns.

Click on the Statistics button and select Max.

Select Method from the list of variables.

Click the Categories, X Levels button.

172 10 Importing, Exporting, and Charting Data Introduction

Click OK.

A bar chart like the one in Figure 10.1 appears.

Figure 10.1 Bar Chart

The Chart platform makes it easy to format this chart, or even change to another chart type.

Right-Click on Mean(Starch Content (%)) in the legend to the right of the plot, select Overlay Color, and choose a color from the resulting palette.

The colors of the bars for starch content change to the selected color.

Please note that merely because JMP Student Edition allows for a great many chart options to be changed, do not feel that they all must be. Simple charts are almost always the most effective.

With Mean(Starch Content (%)) still selected:

From the drop-down list in the title bar next to Chart, select Y Options > Needle Plot.

The bars for Mean(Starch Content (%)) change to a needle chart, as shown in Figure 10.2.

10 Importing, Exporting, and Charting Data Introduction 17310

Gra

ph

s

Figure 10.2 Half Needle Chart

Click somewhere in the blank area above the legend to deselect all columns.

From the drop-down list in the title bar next to Chart, select Y Options > Line Chart.

The entire chart changes to a line plot. Therefore, charting options can be applied to individual levels as well as entire charts.

Using the Overlay Plot Platform

In this example, two charts are produced:

• A plot of starch content and thread wear against wash method.

• A plot of starch content versus thread wear for each level of wash method.

Select Graph > Overlay Plot from the menu bar.

Assign Thread Wear Measured and Starch Content (%) to the Y role, and Method to the Categories, X Level role.

174 10 Importing, Exporting, and Charting Data Introduction

Click OK.

This produces a plot with both variables plotted on the y-axis. To connect the points and produce the plot shown in Figure 10.3,

From the drop-down list in the title bar next to Overlay Plot, select Y Options > Connect Points.Figure 10.3 Final Overlay Plot

10 Importing, Exporting, and Charting Data Importing Data 17510

Gra

ph

s

To produce the plot of starch content versus load size for each level of wash method,

Again select Graph > Overlay Plot from the menu bar.

Select Starch Content (%) from the list of columns and click the Y button.

Select Size of Load from the list of columns and click the X button.

Select Method from the list of columns and click the By but-ton.

This produces three separate plots, one for each level of the Method variable. The graphs shown here have been reduced in size by holding down the control or key and dragging on the corner of one graph. They all resize.

Importing DataThe File > Open command displays a specialized open file dialog used to locate a file to open and tell JMP Student Edition the file format of the incoming file. The Open command then reads the file into a JMP Student Edition data table.

JMP Student Edition directly reads JMP data tables, JMP journal files, JMP Script files, SAS transport files, text files with any column delimiter, and Excel files, and flat-file database files.

Windows

The Files of Type selection filters the list of files displayed in the dialog. If applicable, JMP Student Edition gives additional information about the file in the File Open dialog. The example in Figure 10.4 shows an Open Data File dialog when the Files of Type drop-down list is changed from the default to JMP data table. The dialog shows the table notes, if they exist.

If *.* is chosen from the Files of Type menu, JMP looks at the type of file given by the 3-character extension appended to its file name and opens it accordingly. This works as long as the file has the structure indicated by its name.

176 10 Importing, Exporting, and Charting Data Importing Data

Figure 10.4 The Open Data File Dialog to Read a JMP table

Macintosh

The Open window allows you to open any type of file into JMP. If you select a text document to open, the Open As menu appears in the window (see the next section for details on text importing).


Gra

ph

s

Importing Text Files

You import text be doing one of three things, depending on the operating system you are working with.

Windows

Under Microsoft Windows,

Choose Text from the Files of Type drop-down list to import the data based on your current prefer-ence settings.

Select the Data With Preview checkbox to see a preview of an incoming text file.

You then see the dialog shown in Figure 10.5 for specification of delimiters and other import details.

Macintosh

On the Macintosh, select a text file to open. This displays the Open As menu in the lower part of the dialog. Then choose one of the three text commands:

• Text opens the file in a simple text editing window

• Data (Best Guess) opens the data in a format that JMP thinks is appropriate based on the contents of the file.

• Data (Using Preview) presents a dialog similar to the one shown in Figure 10.5, that lets you desig-nate delimiters and other information, and shows a preview of the resulting data table.

• Data (Using Preferences) opens the file and uses default rules (set in the preferences panel) to interpret end of field and end of line delimiters to create a JMP data table

All Platforms

JMP Student Edition attempts to discern the arrangement of text data. This is adequate for a rectangu-lar text file with no missing fields, a consistent field delimiter, and an end-of-line delimiter

178 10 Importing, Exporting, and Charting Data Importing Data

Note: If double-quotes are encountered when importing text data, JMP changes the delimiter rules to look for an end double-quote. Other text delimiters, including spaces embedded within the quotes, are ignored and treated as part of a text string.

The initial settings in the delimited import dialog are taken from the current Preferences file. The dia-log also shows the column names, data types, and the first two rows of data. In Figure 10.5, preferences are set that indicate the incoming table contains column headers to be used as the column names; the column names are name, age, sex, and height. If no column names are indicated, the Name fields are called Column 1, Column 2, and so on.

One or more end-of-field delimiters, end-of-line delimiters, the option to Strip enclosing quotes, and the ability to set how many rows and columns will be read are additional options presented in the dia-log.

Figure 10.5 Import Text File


Gra

ph

s

Fixed-Width Text

If your data is fixed width (that is, each variable uses a set number of columns in the text file), click the Try Fixed Width button to specify the separations of each column to be imported.

Importing Microsoft Excel Files

JMP Student Edition has the ability to directly import Microsoft Excel worksheets and workbooks under Macintosh and Microsoft Windows. Excel worksheet and workbooks are imported simply by choosing Excel Files from the Files of Type (Windows) or Enable (Macintosh) lists as shown in Figure 10.6.

Figure 10.6 Excel Open Choice

JMP Student Edition can also import Excel workbooks that contain several data tables inside them. After selecting Excel Files(*.xls) from the Open dialog and double-clicking on the desired workbook, JMP Student Edition opens all the worksheets in the workbook.

180 10 Importing, Exporting, and Charting Data Results from Platforms

Results from PlatformsThe results from JMP Student Edition’s platforms can be cut and pasted into other programs using the system’s clipboard, through standard cut and paste facilities. To copy results into another program,

Select the Selection tool ( ).

Hold down the Shift key and click on each part of the report that needs to be copied. Note that axes frequently need to be selected in addition to the graphs they accompany.

Select Edit > Copy.

In the word processor, select Edit > Paste.

The Chart PlatformThe Chart platform computes and plots data and statistics about the data. Unlike the statistical plat-forms in JMP, the Chart platform is not intended as an exploratory device. It is used to report the results from other explorations. For that reason, the plots in the Chart platform do not “bristle with interactivity” to the degree that other platforms do. In essence, what is going to be reported should be known before the Chart platform is used.

To plot descriptions of data, complete the following steps after bringing up the Chart launch dialog (Figure 10.7)

Figure 10.7 Chart Launch Dialog

Select the data column in the column list.

Click the Statistics button (revealing the menu shown in Figure 10.8) and select the statistic to be charted.

From the section labeled Options, use the drop-down list to select the orientation of the chart (ver-tical or horizontal) and the type of chart (bar, line, pie, needle, or point chart) to be generated.

Don’t worry too much about getting the orientation and chart type correct initially — they can be changed after the chart has been generated.

10 Importing, Exporting, and Charting Data The Chart Platform 18110

Gra

ph

s

Optionally, include an X, Level column to be plotted on the horizontal axis, or a Grouping variable to generate separate graphs for each level of the column, either in separate windows or overlaid in the same window. Weight, Freq, and By options work as in other platforms.

Click OK.

An example bar chart, plotting the mean of Starch Content (%) using Method as Categories, X Lev-els is shown in Figure 10.9.

Figure 10.8 Plottable Statistics

Figure 10.9 Chart Example

Many of the options on the launch dialog can be changed using the platform popup menu. Platform options affect all charts in the report window. However, some options can be applied to individual charts.

182 10 Importing, Exporting, and Charting Data The Chart Platform

Single-Chart Options

To apply the following options to individual charts, right-click on a chart legend to see the popup menu shown in Figure 10.10. When this menu is accessed through a chart legend, the commands apply only to the individual chart. When accessed through the platform popup menu, without any legends highlighted, the commands apply to all charts.

Figure 10.10 Chart Options

• Bar Chart displays a bar for each level of the chart variables. The default chart is a bar chart.

• Line Chart replaces a bar chart with a line chart and connects each point with a straight line. Choose the Show Points option to show or hide the points.

• Needle Chart replaces each bar with a line drawn from the axis to the plotted value.

• Point Chart shows only the plot points, without connecting them.

• Show Points toggles the point markers on a line or needle chart off and on.

• Connect Points toggles the line connecting points on and off, leaving a point chart.

• Std Error Bars overlays needles at plus or minus one standard error from the mean.

• Overlay Color assigns color to a variable to identify it when overlaid with other charts.

• Overlay Marker assigns plot points a marker to identify them in overlaid charts.

• Overlay Pattern assigns bars a fill pattern to identify them in overlaid charts.

• Pen Style allows the choice of a line style from the palette shown in Figure 10.10.

10 Importing, Exporting, and Charting Data The Chart Platform 18310

Gra

ph

s

Frame Options

Frame options allow control of the plot frame’s elements as a whole.

Figure 10.11 Frame Options

• Background Color colors the background of the plot with the color chosen from the JMP Student Edition color palette.

• Marker Size allows selection of the marker from a palette of six point sizes that range from dot to very large. The Preferred Size is set in the Preferences. The marker size applies to the plot point and its associated rows.

• Marker Drawing Mode lets you adjust the way markers are drawn. Fast mode is useful for graphs with large number of points, so that they re-draw quickly after adjustments. Outlined mode is often used in presentations.

• Border lets you turn border lines on or off.

• Size/Scale allows changes in axes and plot frames. X Scale is only active on platforms where the X axis is numeric (the X axis for the Charts platform is categorical). Y Scale shows the standard axis scale dialog, which can also be shown by double-clicking in the Y axis. The axis scale dialog allows the specification of the maximum, minimum, increment, and tick marks for the axis, and draws tai-lored reference lines or a grid on the plot. The Frame Size command displays a dialog to enter the exact pixel size for the plot frame.

• Add Graphics Script displays a text entry box to enter JSL commands, usually to tailor the graphics output in ways not provided with commands and options. See the JMP Scripting Language Guide for documentation of JSL commands.

• DisplayBox lists the commands to conveniently select and deselect the plot area, and redraw the plot.

184 10 Importing, Exporting, and Charting Data The Chart Platform

Level Options

Click on a value in the legend to highlight it. Right-clicking on a highlighted legend shows the com-mands for the Colors and Markers palettes, which are identical to their Overlay equivalents in Figure 10.10. The commands affect only the highlighted level and its associated rows in the data table.

Platform Options

After the charts appear, their drop-down menu has the following options.

• Overlay displays a single overlaid chart when the chart has more than one Y (statistics) variable. Each chart can have its own chart type, which can be overlaid. For example, the chart shown in Figure 10.12 has two overlaid variables, with one as a bar chart and one as a needle chart. When Overlay is not checked, the platform shows duplicate axis notation for each chart.

Figure 10.12 Different Chart Types

• Vertical changes a horizontal bar chart or a pie chart to a vertical bar chart.

• Horizontal changes a vertical bar chart or a pie chart to a horizontal bar chart

• Pie changes a horizontal or vertical chart into a pie chart.

• Y Options accesses the options described in “Single-Chart Options,” p. 182. These options affect the variable whose legend is highlighted.

• Level Options accesses the color, marker, and pattern palettes options described in “Level Options,” p. 184. These options affect highlighted bars.

• Separate Axes duplicate the axis notation for each chart when there are multiple charts. By default, the axis notation only shows for the last chart displayed if the charts are not overlaid. Separate Axes is only enabled on the menu when there are multiple Y variables that are not overlaid.

• Script has a submenu of commands available to all platforms that redo the analysis, or save the JSL commands for the analysis to a window or a file. (See “Script Submenu,” p. 66.)

10 Importing, Exporting, and Charting Data The Overlay Plot Platform 18510

Gra

ph

s

The Overlay Plot PlatformThe Overlay Plot platform overlays numeric Y variables with a single numeric or character X variable. Optionally, the values of the X variable appear in ascending order, with points plotted and connected in that order.

The Overlay Plot platform has platform plotting options accessed by the popup menu icon on the Overlay Plot title bar. There is also a single-plot options menu for each Y variable, which show when the Y variable legend beneath the plot is right-clicked. The individual plot options are the same as those in the Y Options submenu at the platform level. When one of these options is selected at the platform level, it affects all plots in the report if no legends are highlighted. If one or more plot legends are high-lighted, the options affects all those plots.

Platform Options

Platform options affect every plot in the report window.

• Overlay overlays plots for all columns assigned the Y role. Plots initially appear overlaid with the Connect Points option in effect. When Overlay option is turned off, the plots show separately.

• Separate Axes lets the X axis scale values be printed only once, on the last plot in the window. When Separate Axes is selected, the X axes for other Y variables show tick marks but show no scale values.

• Uniform Y Scale makes the Y scales the same on grouped plots.

• Connect Through Missing connects adjacent points in the plot, regardless if there are missing val-ues between them.

• Range Plot connects the lowest and highest points at each X value with a line with bars at each end.

Note: The Needle option, described below, and Range option cannot be selected at the same time.

• Y Options has a submenu of options that apply to all variables and plots in the report window when selected from the main platform window. The section “Single-Plot Options,” p. 186, describes Y options for each individual variable.

• Ungroup Plots creates a separate chart for each level of a grouping variable

• Arrange Plots allows you to specify the number of plots in each row

• Script has a submenu of commands available to all platforms that redo the analysis or save the JSL commands for the analysis to a window or a file. (See “Script Submenu,” p. 66.)

Single-Plot Options

Each Y variable is labeled beneath the plot, showing its name and symbol. Each Y variable’s plot can be modified by right-clicking on the variable name to bring up a menu.

• Show Points alternately shows or hides points.

• Connect Points is a toggle that alternately connects the points with lines. Connect Points can be activated when Show Points is not, allowing for greater flexibility in plot displays.

• Needle draws a vertical line from each point to the X axis.

186 10 Importing, Exporting, and Charting Data The Overlay Plot Platform

• Step joins the position of the points with a discrete step by drawing a straight horizontal line from each point to the X value of the following point, and then a straight vertical line to that point. Step, without showing points, is illustrated in Figure 10.13.

Note: Only one of Connect Points, Needle, and Step can be chosen at a time.

Figure 10.13 Overlay Step Plot

• Function Plot plots a formula (stored in the Y column) as a smooth curve. To use this function, store a formula in the Y column that is a function of a single X column. For example, the following column contains a formula involving the sine function. When used in an overlay plot, the function is plotted as a curve rather than individual points.

Note: Overlay Plot normally assumes you want a function plot when the Y column contains a formula. However, formulas that contain random number functions are more frequently used with simulations, where function plotting is not often wanted. Therefore, the Function Plot option is off (by default) when a random number function is present, but on for all other functions.

• Connect Color displays the standard JMP color palette (see Figure 10.10) for assigning colors to lines that connect points.

• Overlay Marker assigns markers to plotted points using the standard JMP marker palette (see Figure 10.10).

• Overlay Marker Color lets you select the color of the overlay marker.

• Line Style and Line Width let you adjust the appearance of the lines drawn on the plot.

10 Importing, Exporting, and Charting Data The Overlay Plot Platform 18710

Gra

ph

s

11Full Factorial Designs

Designing Experiments

A full factorial design contains all possible combinations of a set of factors. This is the most conserva-tive design approach, but it is also the most costly in experimental resources. The full factorial designer supports both continuous factors and categorical factors with up to nine levels.

In full factorial designs, you perform an experimental run at every combination of the factor levels. The sample size is the product of the numbers of levels of the factors. For example, a factorial experiment with a two-level factor, a three-level factor, and a four-level factor has 2 x 3 x 4 = 24 runs.

Factorial designs with only two-level factors have a sample size that is a power of two (specifically 2f where f is the number of factors). When there are three factors, the factorial design points are at the ver-tices of a cube as shown in the diagram below. For more factors, the design points are the vertices of a hypercube.

Full factorial designs are the most conservative of all design types. There is little scope for ambiguity when you are willing to try all combinations of the factor settings.

Unfortunately, the sample size grows exponentially in the number of factors, so full factorial designs are too expensive to run for most practical purposes.

IntroductionThe following example, adapted from Meyer et al. (1996) and Box, Hunter, and Hunter (1978), shows a five-factor reactor example.

To follow along with this example, open the folder Sample Data that was installed when you installed JMP. Within this folder, open Design Experiment > Reactor 32 Runs.jmp.

Suppose you have used the screening designer to investigate the effects of five factors on the percent reaction of a chemical process. The factors (Feed Rate, Catalyst, Stir Rate, Temperature, and Concentration) are all two-level continuous factors.

1 Select DOE > Full Factorial Design.

2 Click the red triangle icon on the Full Factorial Design title bar and select Load Responses.

3 Open Reactor Response.jmp to load the responses by opening the Sample Data folder that was installed with JMP. In the Sample Data folder, open Design Experiment > Reactor Response.jmp.

4 Click the red triangle icon on the Full Factorial Design title bar and select Load Factors.

5 Open Reactor Factors.jmp to load the responses by opening the Sample Data folder that was installed with JMP. In the Sample Data folder, open Design Experiment > Reactor Factors.jmp.

190 11 Full Factorial Designs Introduction

The completed dialog is shown in Figure 11.1.

Figure 11.1 Full-Factorial Example Response and Factors Panels

6 Click Continue to see Output Options panel. A full factorial design includes runs for all combina-tions of high and low factors for the five variables, giving 32 runs.

7 Click Make Table.

The design data table (Figure 11.3) contains a run for every combination of high and low values for the five variables. Since there are five variables, there are 25=32 runs. This covers all combinations of a five factors with two levels each. Initially, the table has an empty Y column named Percent Reacted for entering response values when the experiment is complete.

The values in your table may be different from those shown below.

Figure 11.2 25 Factorial Reactor Data

To see the completed experiment and continue following this example, open the folder Sample Data that was installed when you installed JMP. Within this folder, open Design Experiment > Reactor 32 Runs.jmp.

11 Full Factorial Designs Introduction 19111

DO

E: F

ull F

acto

rial D

esig

ns

Figure 11.3 Reactor 32 Runs.jmp

Begin the analysis with a quick look at the data before fitting the factorial model.

1 Select Analyze > Distribution.

2 Highlight Percent Reacted and click Y, Columns. Then click OK.

3 Click the red triangle icon on the Percent Reacted title bar and select Normal Quantile Plot. The results are shown in Figure 11.4.

Figure 11.4 Distribution of Response Variable for Reactor Data

Start the formal analysis with a stepwise regression. The data table has a script stored with it that auto-matically defines an analysis of the model that includes main effects and all two-factor interactions, and brings up the Stepwise control panel.

1 Click the red triangle icon next to the Fit Model script and select Run Script.

2 The probability to enter a factor (Prob to Enter) in the model should be 0.05.

3 The probability to remove a factor (Prob to Leave) should be 0.1.


Figure 11.5 Run JSL Script for Stepwise Regression

4 A useful way to use the Stepwise platform is to check all the main effects in the Current Estimates table. To do this, make sure the menu beside Direction specifies Mixed.

5 Check the boxes for the main effects of the factors as shown in Figure 11.6.

6 Click Go.

Figure 11.6 Starting Model For Stepwise Process

The mixed stepwise procedure removes insignificant main effects and adds important interactions. The end result is shown in Figure 11.7. Note that the Feed Rate and Stir Rate factors are no longer in the model.

Figure 11.7 Model After Mixed Stepwise Regression

7 Click the Make Model button. The Model Specification window that appears is automatically set

11 Full Factorial Designs Introduction 19311

DO

E: F

ull F

acto

rial D

esig

ns

up with the appropriate effects (Figure 11.8).

Figure 11.8 Fitting a Prediction Model

8 Click Run Model to see the analysis for a candidate prediction model (Figure 11.9).

The figure on the left in Figure 11.9 shows the actual by predicted plot for the model. The predicted model covers a range of predictions from 40% to 95% reacted. The size of the random noise as mea-sured by the RMSE is only 3.3311%, which is more than an order of magnitude smaller than the range of predictions. This is strong evidence that the model has good predictive capability.

The figure on the right in Figure 11.9 shows a table of model coefficients and their standard errors (labeled Parameter Estimates). All effects selected by the stepwise process are highly significant.

Figure 11.9 Actual by Predicted Plot and Prediction Model Estimates

The factor Prediction Profiler also gives you a way to compare the factors and find optimal settings.

1 Open the Prediction Profiler by clicking the red triangle on the Response Percent Reacted title bar and selecting Factor Profiling > Profiler, as shown in Figure 11.10.


Figure 11.10 Selecting the Profiler

Figure 11.11 shows the profiler’s initial display.

Figure 11.11 Viewing the Profiler

2 Click the red triangle on the Prediction Profiler title bar and select Maximize Desirability to see the profiler in Figure 11.12.

Figure 11.12 Viewing the Maximum Desirability

The plot of Desirability versus Percent Reacted shows that the goal is to maximize Percent Reacted. The reaction is unfeasible economically unless the Percent Reacted is above 90%, therefore the Desirability for values less than 90% decreases and finally becomes zero. Desirability increases linearly as the Percent Reacted increases.

11 Full Factorial Designs Creating a Factorial Design 19511

DO

E: F

ull F

acto

rial D

esig

ns

The maximum Desirability is 0.945 when Catalyst and Temperature are at their highest settings and Concentration is at its lowest setting. Percent Reacted increases from 65.5 at the center of the factor ranges to 95.875 at the most desirable setting.

Creating a Factorial DesignTo start a full factorial design, select DOE > Full Factorial Design, or click the Full Factorial Design button on the JMP Starter DOE page. Then, follow the steps below.

Entering Responses and Factors

To enter responses, follow the steps in Figure 11.13. Then, enter factors as shown in Figure 11.14

Figure 11.13 Entering Responses

Tip: To quickly enter multiple responses, click the N Responses button and enter the number of responses you want.

Click to change the response goal, if desired.

To enter one response at a time, click then select a goal type: Maximize, Match Target, Minimize, or None.

Double-click to edit the response name, if desired.

Click to enter lower and upper limits and importance weights.

2 3

4

1

196 11 Full Factorial Designs Creating a Factorial Design

Figure 11.14 Entering Factors in a Full Factorial Design

When you finish adding factors, click Continue.

Selecting Output Options

Use the Output Options panel to specify how you want the output data table to appear:

• Run Order—Lets you designate the order you want the runs to appear in the data table when it is created. Choices are:

Keep the Same—the rows (runs) in the output table appear as they do in the Design panel.

Sort Left to Right—the rows (runs) in the output table appear sorted from left to right.

Randomize—the rows (runs) in the output table appear in a random order.

Sort Right to Left—the rows (runs) in the output table appear sorted from right to left.

• Number of Center Points—Specifies additional runs placed at the center of each continuous factor’s range.

• Number of Replicates—Specify the number of times to replicate the entire design, including cen-terpoints. Type the number of times you want to replicate the design in the associated text box. One replicate doubles the number of runs.

To enter factors, click either the Continuous button or the Categorical button and select a factor type, level 2 - 9.

Double-click to edit the factor name.

Click to enter values or change the level names. To remove a level, click it, press the delete key on the keyboard, then press the Return or Enter key on the keyboard.

11 Full Factorial Designs Creating a Factorial Design 19711

DO

E: F

ull F

acto

rial D

esig

ns

Making the Table

When you click Make Table, the table shown in Figure 11.15 appears.

Figure 11.15 Factorial Design Table

for continuous factors, a minus sign represents low levels

level numbers represent values of categorical factors

values in the Pattern column describe the run each row rep-resents

The name of the table is the design type that generated it.

This script allows you to easily fit a model using the values in the design table.

for continuous factors, a plus sign represents high levels

12Screening Designs

Designing Experiments

Screening designs are arguably the most popular designs for industrial experimentation. They examine many factors to see which have the greatest effect on the results of a process.

Compared to other design methods, screening designs require fewer experimental runs, which is why they are cheap. Thus, they are attractive because they are a cheap and efficient way to begin improving a process.

Often screening designs are a prelude to further experiments. It is wise to spend only about a quarter of your resource budget on an initial screening experiment. You can then use the results to guide further study.

The efficiency of screening designs depends on the critical assumption of effect sparsity. Effect sparsity results because real-world processes usually have only a few driving factors; other factors are relatively unimportant. To understand the importance of effect sparsity, you can contrast screening designs to full factorial designs:

• Full factorial designs consist of all combinations of the levels of the factors. The number of runs is the product of the factor levels. For example, a factorial experiment with a two-level factor, a three-level factor, and a four-level factor has 2 x 3 x 4 = 24 runs.

• By contrast, screening designs reduce the number of runs by restricting the factors to two (or three) levels and by performing only a fraction of the full factorial design.

Each factor in a screening design is usually set at two levels to economize on the number of runs needed, and response measurements are taken for only a fraction of the possible combinations of levels. In the case described above, you can restrict the factors to two levels, which yield 2 x 2 x 2 = 8 runs. Further, by doing half of these eight combinations you can still assess the separate effects of the three factors. So the screening approach reduces the 24-run experiment to four runs.

Of course, there is a price for this reduction. This chapter discusses the screening approach in detail, showing both pros and cons. It also describes how to use JMP’s screening designer, which supplies a list of popular screening designs for two or more factors. These factors can be continuous or categorical, with two or three levels. The list of screening designs you can use includes designs that group the exper-imental runs into blocks of equal sizes where the size is a power of two.

IntroductionSuppose an engineer wants to investigate a process that uses an electron beam welding machine to join two parts. The engineer fits the two parts into a welding fixture that holds them snugly together. A voltage applied to a beam generator creates a stream of electrons that heats the two parts, causing them

200 12 Screening Designs Introduction

to fuse. The ideal depth of the fused region is 0.17 inches. The engineer wants to study the welding process to determine the best settings for the beam generator to produce the desired depth in the fused region.

For this study, the engineer wants to explore the following three inputs, which are the factors for the study:

• Operator, who is the technician operating the welding machine

• Rotation Speed, which is the speed at which the part rotates under the beam

• Beam Current, which is a current that affects the intensity of the beam

After each processing run, the engineer cuts the part in half. This reveals an area where the two parts have fused. The Length of this fused area is the depth of penetration of the weld. This depth of pene-tration is the response for the study.

The goals of the study are to:

• find which factors affect the depth of the weld

• quantify those effects

• find specific factor settings that predict a weld depth of 0.17 inches

To begin this example, select DOE > Screening Design from the main menu. Note that in the Responses panel, there is a single default response called Y. Change the default response as follows:

1 Double-click the response name and change it to Depth (In.).

2 The default goal for the single default response is Maximize, but the goal of this process is to get a target value of 0.17 inches with a lower bound of 0.12 and an upper bound of 0.22. Click the Goal text edit area and choose Match Target, as shown in Figure 12.1.

Figure 12.1 Screening Design Response With Match Target Goal

3 Click the Lower Limit text edit area and enter 0.12 as the lower limit (minimum acceptable value), Then click the Upper Limit text edit area and enter 0.22 as the upper limit (maximum acceptable value).

This example has one categorical factor (Operator) and two continuous factors (Speed and Current).

4 Add the categorical factor by clicking the Add button beside 2-Level Categorical.

5 Add two continuous factors by typing 2 in the Continuous box and clicking the associated Add button.

6 Double-click the factor names and rename them Operator, Speed, and Current.

7 Set high and low values for Speed to 3 and 5 rpm. Set high and low values for Current to 150 and 165 amps, and assign Mary and John as values for the categorical factor called Operator, as shown in Figure 12.2.

12 Screening Designs Creating a Screening Design 20112

DO

E: S

cre

en

ing

Desig

ns

Figure 12.2 Screening Design with Two Continuous and One Categorical Factor

8 Click Continue.

9 Select Full Factorial in the list of designs, as shown in Figure 12.3, and then click Continue.

Figure 12.3 List of Screening Designs for Two Continuous and One Categorical Factors

When the design details are complete, click Make Table to create a JMP table that contains the speci-fied design. The table in Figure 12.4 appears. The table uses the names for responses, factors, and levels you specified. The Pattern variable shows the coded design runs.

10 View the table produced in this example by selecting Help (View on the Macintosh) > Sample Data Directory > Design Experiment > DOE Example 1.jmp.

Figure 12.4 The Design Data Table

Creating a Screening DesignTo start a screening design, select DOE > Screening Design, or click the Screening Design button on the JMP Starter DOE page. Then, follow the steps below.

202 12 Screening Designs Creating a Screening Design

Entering Responses

To enter responses, follow the steps in Figure 12.5.

Figure 12.5 Entering Responses

Specifying Goal Types and Lower and Upper Limits

When entering responses, you can tell JMP that your goal is to obtain the maximum or minimum value possible, to match a specific value, or that there is no goal.

The following description explains the relationship between the goal type (step 3 in Figure 12.5) and the lower and upper limits (step 4 in Figure 12.5):

• For responses such as strength or yield, the best value is usually the largest possible. A goal of Maximize supports this objective.

• The Minimize goal supports an objective of having the best objective be the smallest value, such as when the response is impurity or defects.

• The Match Target goal supports the objective when the best value for a responses is a specific target value, such as with part dimensions. The default target value is assumed to be midway between the lower and upper limits.

Tip: To quickly enter multiple responses, click the N Responses button and enter the number of responses you want.

Note: If your target range is not symmetric around the target value, you can alter the default target after you make a table from the design. In the data table, open the response’s Column Info dialog by double-clicking the column name, and enter an asymmetric target value.

Click to change the response goal, if desired.

To enter one response at a time, click then select a goal type: Maximize, Match Target, Minimize, or None.

Double-click to edit the response name, if desired.

Click to enter lower and upper limits and importance weights.

2 3

4

1


DO

E: S

cre

en

ing

Desig

ns

Understanding Importance Weights

When computing overall desirability, JMP uses the value you enter as the importance weight (step 4 in Figure 12.5) as the weight of each response. If there is only one response, then specifying importance is unnecessary. With two responses you can give greater weight to one response by assigning it a higher importance value.

Entering Factors

After entering responses, enter factors. The Factors panel’s appearance depends on the design you select. Entering factors is the same in Screening Design, Space Filling Design, Mixture Design, and Response Surface Design. This process is described below, in Figure 12.6.

Figure 12.6 Entering Factors

Types of Factors

In general, when designing experiments, you can enter different types of factors in the model. Below is a description of factor types from which you can choose when creating screening designs:

• Continuous Continuous factors have numeric data types only. In theory, you can set a continuous factor to any value between the lower and upper limits you supply.

• Categorical Categorical factors (either numerical or categorical data types) have no implied order. If the values are numbers, the order is the numeric magnitude. If the values are character, the order is the sorting sequence. The settings of a categorical factor are discrete and have no intrinsic order. Examples of categorical factors are machine, operator, and gender.

After your responses and factors are entered, click Continue.

Choosing a Design

The list of screening designs you can use includes designs that group the experimental runs into blocks of equal sizes where the size is a power of two. Highlight the type of screening design you would like to use and click Continue.

To enter factors, type the number of factors and click Add.

Double-click to edit the factor name.

Click to enter factor values. To remove a level, click it, press the delete key on your keyboard, then press the Return or Enter key on your keyboard.

Highlight the factor and click the Remove Selected button to remove a factor in the list.


Figure 12.7 Choosing a Type of Screening Design

The screening designer provides the following types of designs:

Two-Level Full Factorial

A full factorial design contains all combinations of the levels of the factors. The samples size is the prod-uct of the levels of the factors. For two-level designs, this is 2k where k is the number of factors. This can be expensive if the number of factors is greater than 3 or 4.

These designs are orthogonal. This means that the estimates of the effects are uncorrelated. If you remove an effect in the analysis, the values of the other estimates remain the same. Their p-values change slightly, because the estimate of the error variance and the degrees of freedom are different.

Full factorial designs allow the estimation of interactions of all orders up to the number of factors. Most empirical modeling involves first- or second-order approximations to the true functional relationship between the factors and the responses. The figure to the left in Figure 12.8 is a geometric representation of a two-level factorial.

Two-Level Fractional Factorial

A fractional factorial design also has a sample size that is a power of two. If k is the number of factors, the number of runs is 2k – p where p < k. The fraction of the full factorial is 2-p. Like the full factorial, fractional factorial designs are orthogonal.

The trade-off in screening designs is between the number of runs and the resolution of the design. If price is no object, you can run several replicates of all possible combinations of m factor levels. This provides a good estimate of everything, including interaction effects to the mth degree. But because running experiments costs time and money, you typically only run a fraction of all possible levels. This causes some of the higher-order effects in a model to become nonestimable. An effect is nonestimable when it is confounded with another effect. In fact, fractional factorials are designed by deciding in advance which interaction effects are confounded with the other interaction effects.

Resolution Number: The Degree of Confounding

In practice, few experimenters worry about interactions higher than two-way interactions. These higher-order interactions are assumed to be zero. Experiments can therefore be classified by resolution number into three groups:

• Resolution = 3 means that main effects are confounded with one or more two-way interactions, which must be assumed to be zero for the main effects to be meaningful.

• Resolution = 4 means that main effects are not confounded with other main effects or two-factor interactions. However, two-factor interactions are confounded with other two-factor interactions.


DO

E: S

cre

en

ing

Desig

ns

• Resolution ≥ 5 means there is no confounding between main effects, between two-factor interac-tions, or between main effects and two-factor interactions.

All the fractional factorial designs are minimum aberration designs. For DOE experts, the minimum aberration design of a given resolution minimizes the number of words in the defining relation that are of minimum length.

The figure on the right in Figure 12.8 is geometric representation of a two-level fractional factorial design.

Figure 12.8 Representation of Full Factorial (Left) and Two-Level Fractional Factorial (Right) Designs

Plackett-Burman Designs

Plackett-Burman designs are an alternative to fractional factorials for screening. One useful characteris-tic is that the sample size is a multiple of four rather than a power of two. There are no two-level frac-tional factorial designs with sample sizes between 16 and 32 runs. However, there are 20-run, 24-run, and 28-run Plackett-Burman designs.

The main effects are orthogonal and two-factor interactions are only partially confounded with main effects. This is different from resolution-three fractional factorial where two-factor interactions are indistinguishable from main effects.

In cases of effect sparsity, a stepwise regression approach can allow for removing some insignificant main effects while adding highly significant and only somewhat correlated two-factor interactions.

Mixed-Level Designs

If you have qualitative factors with three values, then none of the classical designs discussed previously are appropriate. For pure three-level factorials, JMP offers fractional factorials. For mixed two-level and three-level designs, JMP offers complete factorials and specialized orthogonal-array designs, listed below.

Table 12.1Design Two–Level Factors Three–Level Factors

L18 John 1 7

L18 Chakravarty 3 6

L18 Hunter 8 4

L36 11 12

–1, –1, –1

1, 1, 1

-1, 1, -1

1, 1, -1

1, -1, -1

-1, -1, 1


If you have fewer than or equal to the number of factors for a design listed in the table, you can use that design by selecting an appropriate subset of columns from the original design. Some of these designs are not balanced, even though they are all orthogonal.

Cotter Designs

Cotter designs are used when you have very few resources and many factors, and you believe there may be interactions. Suppose you believe in effect sparsity— that very few effects are truly nonzero. You believe in this so strongly that you are willing to bet that if you add up a number of effects, the sum will show an effect if it contains an active effect. The danger is that several active effects with mixed signs will cancel and still sum to near zero and give a false negative.

Cotter designs are easy to set up. For k factors, there are 2k + 2 runs. The design is similar to the “vary one factor at a time” approach many books call inefficient and naive.

A Cotter design begins with a run having all factors at their high level. Then follow k runs each with one factor in turn at its low level, and the others high. The next run sets all factors at their low level and sequences through k more runs with one factor high and the rest low. This completes the Cotter design, subject to randomizing the runs.

When you use JMP to generate a Cotter design, JMP also includes a set of extra columns to use as regressors. These are of the form factorOdd and factorEven where factor is a factor name. They are con-structed by adding up all the odd and even interaction terms for each factor. For example, if you have three factors, A, B, and C:

Because these columns in a Cotter design make an orthogonal transformation, testing the parameters on these combinations is equivalent to testing the combinations on the original effects. In the example of factors listed above, AOdd estimates the sum of odd terms involving A. AEven estimates the sum of the even terms involving A, and so forth.

Because Cotter designs have a false-negative risk, many statisticians discourage their use.

How to Run a Cotter Design

By default, JMP does not include a Cotter design in the list of available screening designs (Figure 12.7). However, if you would like to make a Cotter design:

1 Immediately after entering responses and factors (and before clicking Continue), click the red trian-gle icon in the Screening Design title bar.

2 Select Supress Cotter Designs.

Changing the setting via the red triangle menu applies only to the current design. To alter the setting for all screening designs:

1 Select File > Preferences.

2 Click the Platform icon.

3 Click DOE to highlight it.

4 Uncheck the box beside Supress Cotter Designs.

Figure 12.9AOdd = A + ABC AEven = AB + AC

BOdd = B + ABC BEven = AB + BC

COdd = C + ABC CEven = BC + AC


DO

E: S

cre

en

ing

Desig

ns

Displaying and Modifying the Design

After you select a design type, click the disclosure buttons ( on Windows and on the Mac-intosh) to display the design and show modification options using the Display and Modify Design panel to tailor the design (Figure 12.10).

Figure 12.10 Display and Modification Options

• Change Generating Rules—Controls the choice of different fractional factorial designs for a given number of factors.

• Aliasing of Effects—Shows the confounding pattern for fractional factorial designs.

• Coded Design—Shows the pattern of high and low values for the factors in each run.

Aliasing of Effects

To see which effects are confounded with which other effects, click the disclosure button ( on Windows and on the Macintosh) to reveal the Aliasing of Effects panel. It shows effects and con-founding up to two-factor interactions (Figure 12.11).

Figure 12.11 Generating Rules and Aliasing of Effects Panel

For example, a full factorial with five factors requires 25 = 32 runs. Eight runs can only accommodate a full factorial with three two-level factors. It is necessary to construct the two additional factors in terms of the first three factors.

The price of reducing the number of runs from 32 to eight is effect aliasing (confounding). Confound-ing is the direct result of the assignment of new factor values to products of the coded design columns.


For example, the values for Temperature are the product of the values for Feed Rate and Concentration. This means that you can’t tell the difference of the effect of Temperature and the syner-gistic (interactive) effect of Feed Rate and Concentration.

In the example shown in Figure 12.11, all the main effects are confounded with two-factor interactions. This is characteristic of resolution-three designs.

Viewing the Confounding Pattern

JMP can create a data table that shows the aliasing pattern for a specified level. To create this table:

1 Click the red triangle at the bottom of the Aliasing of Effects area.

2 Select Show Confounding Pattern (Figure 12.12).

Figure 12.12 Show Confounding Patterns

3 Enter the order of confounding you want to see (Figure 12.13).

Figure 12.13 Enter Order

4 Click OK.

Figure 12.14 shows the third level alias for the five-factor reactor example. The effect names begin with C (Constant) and are shown by their order number in the design. Thus, Temperature appears as “4”, with second order aliasing as “1 5” (Feed Rate and Concentration), and third order confounding as “1 2 3” (Feed Rate, Catalyst, and Stir Rate).


DO

E: S

cre

en

ing

Desig

ns

Figure 12.14 The Third Level Alias for the Five-Factor Reactor Example

Understanding the Coded Design

In the coded design panel, each row represents a run. Plus signs designate high levels and minus signs represent low levels. As shown in Figure 12.15, rows for the first three columns of the coded design, which represent Feed Rate, Catalyst, and Stir Rate are all combinations of high and low values (a full factorial design). The fourth column (Temperature) of the coded design is the element-by-element product of the first three columns. Similarly, the last column (Concentration) is the product of the sec-ond and third columns.

Figure 12.15 Default Coded Designs

Changing the Coded Design

In the Change Generating Rules panel, changing the checkmarks and clicking Apply changes the coded design; it changes the choice of different fractional factorial designs for a given number of factors. The Change Generating Rules table in Figure 12.16 shows how the last two columns are constructed in terms of the first three columns. The check marks for Temperature show it is a function of Feed Rate, Catalyst, and Stir Rate. The checkmarks for Concentration show it is a function of Catalyst and Stir Rate.

If you check the options as shown in Figure 12.16 and click Apply, the Coded Design panel changes. The first three columns of the coded design remain a full factorial for the first three factors (Feed Rate, Catalyst, and Stir Rate). Temperature is now the product of Feed Rate and Catalyst, so the fourth column of the coded design is the element by element product of the first two columns. Concentration is a function of Feed Rate and Stir Rate.


Figure 12.16 Modified Coded Designs and Generating Rules

Specifying Output Options

Use the Output Options panel to specify how you want the output data table to appear. When the options are correctly set up, click Make Table.

Figure 12.17 Select the Output Options


Keep the Same—the rows (runs) in the output table appear as they do in the Design panel.

Sort Left to Right—the rows (runs) in the output table appear sorted from left to right.

Randomize—the rows (runs) in the output table appear in a random order.

Sort Right to Left—the rows (runs) in the output table appear sorted from right to left.

Randomize within Blocks—the rows (runs) in the output table will appear in random order within the blocks you set up.

• Number of Center Points—Specifies additional runs placed at the center points.



DO

E: S

cre

en

ing

Desig

ns

Viewing the Table

After clicking Make Table, you have a data table that outlines your experiment. In the table, the high and low values you specified are displayed for each run.


Continuing the Analysis

After creating and viewing the data table, you can now run analyses on the data. The data table con-tains a script labeled Model. Right-click it and select Run Script to run a fit model analysis (Figure 12.19).

Figure 12.19 Running the Model Script


The column called Pattern shows the pattern of low values denoted “–” and high values denoted “+”. Pattern is especially useful as a label variable in plots.



The next sections describe some of the parts of the analysis report that appears when you click Run Model.

Viewing an Actual-by-Predicted Plot

When the model contains no interactions, an actual-by-predicted plot, shown on the left in Figure 12.20, appears at the top of the Fit Model report.

Figure 12.20 An Actual-by-Predicted Plot

To show labels in the graph (on the right in Figure 12.20), select all points, right-click the graph, and select Row Label. The pattern variable displayed in the data table serves as the label for each point.

In Figure 12.20, the mean line falls inside the bounds of the 95% confidence curves, which tells you that the model is not significant. The model p-value, R2, and RMSE appear below the plot.

The RMSE is an estimate of the standard deviation of the process noise assuming that the unestimated effects are negligible. In this case, the RMSE is 14.199, which is much larger than expected. This sug-gests that effects other than the main effects of each factor are important. Because of the confounding between two-factor interactions and main effects in this design, it is impossible to determine which two-factor interactions are important without performing more experimental runs.

Viewing a Scaled Estimates Report

When you fit the model, JMP displays a Scaled Estimates report (Figure 12.21) as a part of the Fit Model report. The Scaled Estimates report displays a bar chart of the individual effects embedded in a table of parameter estimates. The last column of the table has the p-values for each effect. None of the factor effects are significant, but the Catalyst effect is large enough to be interesting if it is real. At this stage the results are not clear, but this does not mean that the experiment has failed. It means that some follow-up runs are necessary.

Figure 12.21 Example of a Scaled Estimates Report

If this scaled estimates report were not merely an example, you would then want to augment the design. For comparison, you might also want to have complete 32-run factorial experimental data and analysis.

13Response Surface Designs

Response surface designs are useful for modeling a curved quadratic surface to continuous factors. If a minimum or maximum response exists inside the factor region, a response surface model can pinpoint it. Three distinct values for each factor are necessary to fit a quadratic function, so the standard two-level designs cannot fit curved surfaces.

The most popular response surface design is the central composite design, illustrated in the figure to the left below. It combines a two-level fractional factorial and two other kinds of points:

• Center points, for which all the factor values are at the zero (or midrange) value.

• Axial (or star) points, for which all but one factor are set at zero (midrange) and that one factor is set at outer (axial) values.

The Box-Behnken design, illustrated in the figure on the right below, is an alternative to central com-posite designs.

One distinguishing feature of the Box-Behnken design is that there are only three levels per factor.

Another important difference between the two design types is that the Box-Behnken design has no points at the vertices of the cube defined by the ranges of the factors. This is sometimes useful when it is desirable to avoid these points due to engineering considerations. The price of this characteristic is the higher uncertainty of prediction near the vertices compared to the central composite design.

center points

axial points

fractional factorial points

Central Composite Design Box-Behnken Design

216 13 Response Surface Designs Introduction

IntroductionThe Bounce Data.jmp sample data file has response surface data inspired by the tire tread data described in Derringer and Suich (1980). To see this example data table, open the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Data.jmp.

The objective of this experiment is to match a standardized target value (450) of tennis ball bounciness. The bounciness varies with amounts of Silica, Silane, and Sulfur used to manufacture the tennis balls. The experimenter wants to collect data over a wide range of values for these variables to see if a response surface can find a combination of factors that matches a specified bounce target. To follow this exam-ple:

1 Select DOE > Response Surface Design.

2 Load factors by clicking the red triangle icon on the Response Surface Design title bar and selecting Load Factors. Navigate to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Factors.jmp.

3 Load the responses by clicking the red triangle icon on the Response Surface Design title bar and selecting Load Responses. Navigate to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Response.jmp. Figure 13.1 shows the com-pleted Response panel and Factors panel.

Figure 13.1 Response and Factors For Bounce Data

After the response data and factors data are loaded, the Response Surface Design Choice dialog lists the designs in Figure 13.2.

Figure 13.2 Response Surface Design Selection

The Box-Behnken design selected for three effects generates the design table of 15 runs shown in Figure 13.3.

In real life, you would conduct the experiment and then enter the responses into the data table. Let’s pretend this happened and use a finalized data table called Bounce Data.jmp.

13 Response Surface Designs Introduction 21713

DO

E: R

esp

on

se S

urfa

ce D

esig

ns

1 Open to the Sample Data folder that was installed when you installed JMP, and select Design Experiment > Bounce Data.jmp (Figure 13.3).

Figure 13.3 JMP Table for a Three-Factor Box-Behnken Design

After obtaining the Bounce Data.jmp data table, run a fit model analysis on the data. The data table contains a script labeled Model.

2 Right-click Model and select Run Script to start a fit model analysis.

3 Click Run Model.

The standard Fit Model analysis results appear in tables shown in Figure 13.4, with parameter estimates for all response surface and crossed effects in the model.

The prediction model is highly significant with no evidence of lack of fit. All main effect terms are sig-nificant as well as the two interaction effects involving Sulfur.

Figure 13.4 JMP Statistical Reports for a Response Surface Analysis of Bounce Data

The Response Surface report also has the tables shown in Figure 13.5.

218 13 Response Surface Designs Creating a Response Surface Design

Figure 13.5 Statistical Reports for a Response Surface Analysis

Creating a Response Surface DesignResponse Surface Methodology (RSM) is an experimental technique invented to find the optimal response within specified ranges of the factors. These designs are capable of fitting a second-order pre-diction equation for the response. The quadratic terms in these equations model the curvature in the true response function. If a maximum or minimum exists inside the factor region, RSM can find it. In industrial applications, RSM designs involve a small number of factors. This is because the required number of runs increases dramatically with the number of factors. Using the response surface designer, you choose to use well-known RSM designs for two to eight continuous factors. Some of these designs also allow blocking.

Response surface designs are useful for modeling and analyzing curved surfaces.

To start a response surface design, select DOE > Response Surface Design, or click the Response Surface Design button on the JMP Starter DOE page. Then, follow the steps below:

• “Entering Responses and Factors,” p. 219

• “Choosing a Design,” p. 219

• “Specifying Axial Value (Central Composite Designs Only),” p. 220

• “Specifying Output Options,” p. 221

• “Viewing the Design Table,” p. 221

• “Continuing the Analysis, If Needed,” p. 222

Provides a summary of the parameter estimates

Lists the critical values of the surface factors and tells the kind of solution (maximum, mini-mum, or saddlepoint). The solution for this example is a saddlepoint. The table also warns that the critical values given by the solution are outside the range of data values.

Shows eigenvalues and eigenvectors of the effects. The eigenvector values show that the dominant negative cur-vature (yielding a maximum) is mostly in the Sulfur direction. The dominant positive curvature (yielding a minimum) is mostly in the Silica direction.

13 Response Surface Designs Creating a Response Surface Design 21913

DO

E: R

esp

on

se S

urfa

ce D

esig

ns

Entering Responses and Factors

The steps for entering factors in a response surface design are unique to this design. To add factors, fol-low the step in Figure 13.6.

Figure 13.6 Entering Factors into a Response Surface Design

Click Continue to proceed to the next step.

Choosing a Design

Highlight the type of response surface design you would like to use and click Continue.

Figure 13.7 Choose a Design Type

The Response Surface designer provides the following types of designs:

Box-Behnken Designs

The Box-Behnken design has only three levels per factor and has no points at the vertices of the cube defined by the ranges of the factors. This is sometimes useful when it is desirable to avoid extreme


points due to engineering considerations. The price of this characteristic is the higher uncertainty of prediction near the vertices compared to the central composite design.

Central Composite Designs

The response surface design list contains two types of central composite designs: uniform precision and orthogonal. These properties of central composite designs relate to the number of center points in the design and to the axial values:

• Uniform precision means that the number of center points is chosen so that the prediction variance at the center is approximately the same as at the design vertices.

• For orthogonal designs, the number of center points is chosen so that the second order parameter estimates are minimally correlated with the other parameter estimates.

Specifying Axial Value (Central Composite Designs Only)

When you select a central composite (CCD-Uniform Precision) design and then click Continue, you see the panel in Figure 13.8. It supplies default axial scaling information. Entering 1.0 in the text box instructs JMP to place the axial value on the face of the cube defined by the factors, which controls how far out the axial points are. You have the flexibility to enter the values you want to use.

Figure 13.8 Display and Modify the Central Composite Design

• Rotatable makes the variance of prediction depend only on the scaled distance from the center of the design. This causes the axial points to be more extreme than the range of the factor. If this factor range cannot be practically achieved, it is recommended that you choose On Face or specify your own value.

• Orthogonal makes the effects orthogonal in the analysis. This causes the axial points to be more extreme than the –1 or 1 representing the range of the factor. If this factor range cannot be practi-cally achieved, it is recommended that you choose On Face or specify your own value.

• On Face leaves the axial points at the end of the -1 and 1 ranges.

• User Specified uses the value entered by the user, which can be any value greater than zero. Enter that value into the Axial Value text box.

If you would like to inscribe the design, click the box beside Inscribe. When checked, JMP re-scales the whole design so that the axial points are at the low and high ends of the range (the axials are –1 and 1 and the factorials are shrunken based on that scaling).


DO

E: R

esp

on

se S

urfa

ce D

esig

ns

Specifying Output Options

Use the Output Options panel to specify how you want the output data table to appear. When the options are correctly set up, click Make Table.

Figure 13.9 Select the Output Options


• Keep the Same—the rows (runs) in the output table will appear as they do in the Design panel.

• Sort Left to Right—the rows (runs) in the output table will appear sorted from left to right.

• Randomize—the rows (runs) in the output table will appear in a random order.

• Sort Right to Left—the rows (runs) in the output table will appear sorted from right to left.

• Randomize within Blocks—the rows (runs) in the output table will appear in random order within the blocks you set up.

• Number of Center Points—Specifies additional runs placed at the center points.


Viewing the Design Table

Now you have a data table that outlines your experiment, as described in Figure 13.10.



Continuing the Analysis, If Needed

After creating and viewing the design table, running the experiment, and recording your results in the design table’s Y column, run a fit model analysis on the data. The data table contains a script labeled Model. Right-click it and select Run Script (Figure 13.11) to fit the model.

Figure 13.11 Running the Script

After clicking Run Model in the dialog, (Figure 13.12), review the analysis.


There are two center points per replicate.

Runs are in a random order.

The column called Pattern identifies the coding of the factors. It shows all the codings with “+” for high, “–” for low factor, “a” and “A” for low and high axial values, and “0” for midrange. Pattern is suitable to use as a label variable in plots because when you hover over a point in a plot of the factors, the pattern value shows the factor coding of the point.

The Y column is for recording experimental results.



DO

E: R

esp

on

se S

urfa

ce D

esig

ns

Figure 13.12 Fitting the Model

14Prospective Power and Sample Size

Use the DOE > Sample Size and Power command to answer the question “How many runs do I need?” The important quantities are sample size, power, and the magnitude of the effect. These depend on the significance level—alpha—of the hypothesis test for the effect and the standard deviation of the noise in the response. You can supply either one or two of the three values. If you supply only one of these values, the result is a plot of the other two. If you supply two values, JMP computes the third. This capability is available for the single sample, two-sample, and k-sample situations.

Using the Sample Size and Power command when doing a prospective analysis helps answer the question, “Will I detect the group differences I am looking for, given my proposed sample size, estimate of within-group variance, and alpha level?” In this type of analysis, you must give JMP an estimate of the group means and sample sizes in a data table as well as an estimate of the within-group standard deviation (σ).

The Sample Size and Power command determines how large of a sample is needed to be reasonably likely that an experiment or sample will yield a significant result, given that the true effect size is at least a certain size. It requires that you enter any two of three quantities, difference to detect, sample size, and power, and computes the third for the following cases:

• difference between one sample's mean and a hypothesized value

• difference between two samples means

• differences in the means among k samples

• difference between a variance and a hypothesized value

• difference between one sample proportion and a hypothesized value

• difference between two sample proportions

• difference between counts per unit in a Poisson-distributed sample and a hypothesized value.

The calculations assume that there are equal numbers of units in each group. You can apply this plat-form to more general experimental designs, if they are balanced, and a number-of-parameters adjust-ment is specified.

Prospective Power AnalysisThe following five values have an important relationship in a statistical test on means:

• Alpha Alpha is the significance level that prevents declaring a zero effect significant more than alpha portion of the time.

226 14 Prospective Power and Sample Size One-Sample and Two-Sample Means

• Error Standard Deviation Error Standard Deviation is the unexplained random variation around the means.

• Sample Size Sample size is how many experimental units (runs, or samples) are involved in the experiment.

• Power Power is the probability of declaring a significant result.

• Effect Size Effect size is how different the means are from each other or from the hypothesized value.

The Sample Size and Power command in JMP helps estimate in advance either the sample size needed, power expected, or the effect size expected in the experimental situation where there is a single mean comparison, a two sample comparison, or when comparing k sample means.

When you select DOE > Sample Size and Power, the panel shown in Figure 14.1 appears with button selections for experimental situations. The following sections describe each of these selections and explains how to enter estimated parameter values and the desired computation.

Figure 14.1 Sample Size and Power Choices

One-Sample and Two-Sample MeansAfter you click either One Sample Mean, or Two Sample Means in the initial Sample Size selection list (Figure 14.1), the Power and Sample Size dialog in Figure 14.2 appears and asks for the anticipated experimental values. The values you enter depend on your initial choice. As an example, consider the two-sample situation.

14 Prospective Power and Sample Size One-Sample and Two-Sample Means 22714

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

Figure 14.2 Initial Power and Sample Size Dialogs for Single Mean (left) and Two Means (right)

The Two Sample Means choice in the initial Power and Sample Size dialog always requires values for Alpha and the error standard deviation (Error Std Dev), as shown here, and one or two of the other three values: Difference to detect, Sample Size, and Power. The power and sample size platform then calculates the missing item. If there are two unspecified fields, the power and sample size platform constructs a plot that shows the relationship between those two values:

• power as a function of sample size, given specific effect size

• power as a function of effect size, given a sample size

• effect size as a function of sample size, for a given power.

The Power and Sample Size dialog asks for the values depending the first choice of design:

• Alpha Alpha is the significance level, usually 0.05. This implies willingness to accept (if the true difference between groups is zero) that 5% (alpha) of the time a significant difference will be incor-rectly declared.

• Error Std Deviation Error Std (Standard) Deviation is the true residual error. Even though the true error is not known, the power calculations are an exercise in probability that calculates what might happen if the true values were as specified.

• Extra Params Extra Params (Parameters) is only for multi-factor designs. Leave this field zero in simple cases. In a multi-factor balanced design, in addition to fitting the means described in the sit-uation, there are other factors with the extra parameters that can be specified here. For example, in a three-factor two-level design with all three two-factor interactions, the number of extra parameters is five—two parameters for the extra main effects, and three parameters for the interactions. In prac-tice, it isn’t very important what values you enter here unless the experiment is in a range where there is very few degrees of freedom for error.

• Difference to Detect Difference to detect is the smallest detectable difference (how small a differ-ence you want to be able to declare statistically significant). For single sample problems this is the difference between the hypothesized value and the true value.

• Sample Size Sample size is the total number of observations (runs, experimental units, or sam-ples). Sample size is not the number per group, but the total over all groups. Computed sample size numbers can have fractional values, which you need to adjust to real units. This is usually done by


increasing the estimated sample size to the smallest number evenly divisible by the number of groups.

• Power Power is the probability of getting a statistic that will be declared statistically significant. Bigger power is better, but the cost is higher in sample size. Power is equal to alpha when the speci-fied effect size is zero. You should go for powers of at least 0.90 or 0.95 if you can afford it. If an experiment requires considerable effort, plan so that the experimental design has the power to detect a sizable effect, when there is one.

• Continue Evaluates at the entered values.

• Back Goes back to the previous dialog.

• Animation Script The Animation Script button runs a JSL script that displays an interactive plot showing power or sample size. See the section, “Power and Sample Size Animation for a Single Sam-ple,” p. 229, for an illustration of this animation script.

Single-Sample Mean

Suppose there is a single sample and the goal is to detect a difference of 2 where the error standard devi-ation is 0.9, as shown in the left-hand dialog in Figure 14.3. To calculate the power when the sample size is 10, leave Power missing in the dialog and click Continue. The dialog on the right in Figure 14.3, shows the power is calculated to be 0.99998, rounding to 1.

Figure 14.3 A One-Sample Example

To see a plot of the relationship of power and sample size, leave both Sample Size and Power empty and click Continue.

Double click on the horizontal axis to get any desired scale. The left-hand graph in Figure 14.4, shows a range of sample sizes for which the power varies from about 0.2 to 0.95. Change the range of the curve by changing the range of the horizontal axis. For example, the plot on the right in Figure 14.4, has the horizontal axis scaled from 1 to 8, which gives a more typical looking power curve.

14 Prospective Power and Sample Size One-Sample and Two-Sample Means 22914

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

Figure 14.4 A One-Sample Example Plot

When only Sample Size is specified (Figure 14.5) and Difference to Detect and Power are empty, a plot of power by difference appears.

Figure 14.5 Plot of Power by Difference to Detect for a Given Sample Size

Power and Sample Size Animation for a Single Sample

The Animation Script button on the Power and Sample Size dialog for the single mean displays an interactive plot that illustrates the effect that changing the sample size has on power. In the example shown in Figure 14.6, Sample Size is 10, Alpha is 0.05, and the Difference to Detect is set to 0.4. The animation begins showing a normal curve positioned with mean at zero (representing the estimated mean and the true mean), and another with mean at 0.4 (the difference to be detected). The probabil-


ity of committing a Type II error (not detecting a difference when there is a difference), often repre-sented as β in literature, is shaded in blue on this plot.

You can drag the handles over the curves drag to see how their positions affect power. Also, you can click on the values for sample size and alpha showing beneath the plot to change them.

Figure 14.6 Example of Animation Script to Illustrate Power

Two-Sample Means

The dialogs work similarly for two samples; the Difference to Detect is the difference between two means. Suppose the error standard deviation is 0.9 (as before), the desired detectable difference is 1, and the sample size is 16.

Leave Power blank and click Continue to see the power calculation, 0.5433, as shown in the dialog on the left in Figure 14.7. This is considerably lower than in the single sample because each mean has only half the sample size. The comparison is between two random samples instead of one.

To increase the power requires a larger sample. To find out how large, leave both Sample Size and Power blank and examine the resulting plot, shown on the right in Figure 14.7. The crosshair tool esti-mates that a sample size of about 35 is needed to obtain a power of 0.9.

14 Prospective Power and Sample Size k-Sample Means 23114

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

Figure 14.7 Plot of Power by Sample Size to Detect for a Given Difference

k-Sample MeansThe k-Sample Means situation can examine up to 10 kinds of means. The next example considers a situation where 4 levels of means are expected to be about 10 to 13, and the Error Std Dev is 0.9. When a sample size of 16 is entered the power calculation is 0.95, as shown in the dialog on the left in Figure 14.8.

If both Sample Size and Power are left blank, the power and sample size calculations produce the power curve shown on the right in Figure 14.8. This confirms that a sample size of 16 looks acceptable.

Notice that the difference in means is 2.236, calculated as square root of the sum of squared deviations from the grand mean. In this case it is the square root of (–1.5)2+ (–0.5)2+(0.5)2+(1.5)2, which is the square root of 5.

232 14 Prospective Power and Sample Size One-Sample Variance

Figure 14.8 Prospective Power for k-Means and Plot of Power by Sample Size

One-Sample VarianceThe One-Sample Variance choice on the Power and Sample Size dialog (Figure 14.1) determines sam-ple size for detection of a change in variance.The usual purpose of this option is to compute a large enough sample size to guarantee that the risk of accepting a false hypothesis (β) is small. In the dialog, specify a baseline variance, alpha level, and direction of change you want to detect. To indicate direc-tion of change, select either Larger or Smaller from the Guarding a change menu. The computations then show whether the true variance is larger or smaller than its hypothesized value, entered as the Baseline Variance. An example is when the variance for resistivity measurements on a lot of silicon wafers is claimed to be 100 ohm-cm and a buyer is unwilling to accept a shipment if variance is greater than 55 ohm-cm for a particular lot.

The examples throughout the rest of this chapter use engineering examples from the online manual of The National Institute of Standards and Technology (NIST). You can access the NIST manual exam-ples at http://www.itl.nist.gov/div898/handbook.

As with previous dialogs, you enter two of the items and the Power and Sample Size calculations deter-mines the third.

Suppose you want to detect an increase of 55 for a baseline variance of 100, with an alpha of 0.05 and power of 0.99. Enter these items as shown in Figure 14.9. When you click Continue, the computed result shows that you need a sample size of 170.

http://www.itl.nist.gov/div898/handbook

14 Prospective Power and Sample Size One-Sample and Two-Sample Proportions 23314

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

If you want to detect a change to a small variance, enter a negative amount in the Difference to Detect box.

Note: Remember to enter the variance in the Baseline Variance box, not the standard deviation.

Figure 14.9 Sample Size and Power Dialog To Compare Single-Direction One-Sample Variance

One-Sample and Two-Sample ProportionsThe dialogs and computations to test power and sample sizes for proportions are similar to those for testing sample means. The dialogs are the same except you enter Baseline Proportion and also specify either a one-sided or two-sided test. For a one-sample situation, the Baseline Proportion is the average of a known baseline proportion and the single sample proportion. When there are two samples, the Baseline Proportion you enter is the average of the two sample proportions.

The sampling distribution for proportions is actually binomial, but the computations to determine sample size and test proportions use a normal approximation, as indicated on the dialogs (Figure 14.10).

234 14 Prospective Power and Sample Size One-Sample and Two-Sample Proportions

Figure 14.10 Default Power and Sample Dialog for One-Sample and Two-Sample Proportions

Testing proportions is useful in production lines, where proportion of defects is part of process control monitoring. For example, suppose a line manager wants to detect a change in defective units that is 6% above a known baseline of approximately 10% defective.The manager does not want to stop the process unless it has degenerated greater than 16% defects (6% above the 10% known baseline defective). The Baseline Proportion in this example is 0.08, which is the average of the baseline (10%) and the propor-tion above the baseline (6%). The example process is monitored with a one-sided test at 5% alpha and a 10% risk (90% power) of failing to detect a change of that magnitude.

Figure 14.11 shows the entries in the Sample Size and Power dialog to detect a given difference between an observed proportion and a baseline proportion, and the computed sample size of approximately 77. To see the plot on the right in Figure 14.11, leave both Difference to Detect and Sample Size blank. Use the grabber tool (hand) to move the x-axis and show a specific range of differences and sample sizes.

Figure 14.11 Dialog To Compare One Proportion to a Baseline and Sample Size Plot

Enter average of baseline and one-sample proportions, or average of two-sample proportions.

Enter 1 or 2 to indicate the type of test (one- or two-sided)

14 Prospective Power and Sample Size Counts per Unit 23514

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

Counts per UnitThe Counts per Unit selection calculates sample size for the Poisson-distributed counts typical when you can measure more than one defect per unit. A unit can be an area and the counts can be fractions or large numbers.

Although the number of defects observed in an area of a given size is often assumed to have a Poisson distribution, the area and count are assumed to be large enough to support a normal approximation.

Questions of interest are:

• Is the defect density within prescribed limits?

• Is the defect density greater than or less than a prescribed limit?

Enter alpha and the baseline count per unit. Then enter two of the remaining fields to see the calcula-tion of the third. The test is for one-sided (one-tailed) change. Enter the Difference to Detect in terms of the baseline count per unit (defects per unit). The computed sample size is expressed in those units.

As an example, consider a wafer manufacturing process with a target of 4 defects per wafer and you want to verify that a new process meets that target.

1 Select DOE > Sample Size and Power.

2 Click the Counts per Unit button.

3 Enter an alpha of 0.1 to be the chance of failing the test if the new process is as good as the target.

4 Enter a power of 0.9, which is the chance of detecting a change larger than 2 (6 defects per wafer). In this kind of situation, alpha is sometimes called the producer’s risk and beta is called the consumer’s risk.

5 Click Continue to see the computed sample size of 8.128 (Figure 14.12).

The process meets the target if there are less than 48 defects (6 defects per wafer in a sample of 8 wafers).

Figure 14.12 Dialog For Counts Per Unit Example

236 14 Prospective Power and Sample Size Sigma Quality Level

Sigma Quality LevelUse the Sigma Quality Level feature, accessed by selecting DOE > Sample Size and Power, by enter-ing any two of the following three quantities:

• number of defects

• number of opportunities

• Sigma quality level

When you click Continue, the sigma quality calculator computes the missing quantity using the for-mula Sigma Quality Level = NormalQuantile(1 – defects/opportunities) + 1.5.

As an example, use the Sample Size and Power feature to compute the Sigma quality level for 50 defects in 1,000,000 opportunities:

1 Select DOE > Sample Size and Power.

2 Click the Sigma Quality Level button.

3 Enter 50 for the number of defects and 1,000,000 as the number of opportunities, as shown in the window to the left in Figure 14.13.

4 Click Continue. The results, as shown in the window on the right in Figure 14.13, are a Sigma qual-ity level of 5.3.

Figure 14.13 Sigma Quality Level Example 1

If you want to know how many defects reduce the Sigma Quality Level to “six-sigma” for 1,000,000 opportunities, enter 6 as the Sigma Quality Level and leave the Number of Defects blank (window to the left in Figure 14.14). The computation (window to the right in Figure 14.14) shows that the Number of Defects cannot be more than approximately 3.4.

14 Prospective Power and Sample Size Sigma Quality Level 23714

DO

E: P

ow

er A

nd

Sam

ple

Siz

e

Figure 14.14 Sigma Quality Level Example 2

Index

JMP-SE

Symbols“F Ratio” 104“Prob>F” 104

Numerics–2LogLikelihood 15195% bivariate normal density ellipse 164

Aaberration designs 205acceptable values See lower limits and upper lim-

itsactivating toolbars 14Actual-by-Predicted plots 212Add button 86Add Column button 95Add Graphics Script command 184additional runs 196, 210, 221AIC 151AIC 104Akaike’s Information Criterion 104, 151aliasing effects 207All Graphs command 65All Pairs, Tukey Kramer command 62Alpha 225, 227Alpha Amalyze 21analysis of variance

report 47, 92table 53, 60, 89

Analyze menu 22, 45, 72, 79Analyze Toolbar 13–14animation scripts 228Annotate tool 28annotating 28

resizing and repositioning 28ANOVA 60

Display Options command 62JMP INTRO terms 60

one way 46popup menu 62report 47table 53, 89

ARIMA 143, 154–156Arrange Plots 186assigning importances (of responses) 195, 202Autocorrelation 149autocorrelation 148–149Autocorrelation Lags 147Autoregressive Order 155axial

points 215scaling, central composite designs 220

BBackground Color command 184Backward 103bar chart 172, 181

producing 171Bar Chart command 183bar chart of correlations 167Bartlett’s test 64BIC 151bivariate normal density ellipse 164blue diamond disclosure icon 53Border 184Bounce Data.jmp 216Bounce Factors.jmp 216–217Bounded 157Box Plots command 62, 65Box-Behnken designs 215–216, 219

See also Response Surface designsBox-Jenkins model see ARIMABraces.jmp 131Brown smoothing 157Brown-Forsythe test 64By role 38

240 Index JMP-SE

CC Total 89Capability Analysis

command 39with Control Charts 115

Categorical factors 203categorical probabilities

testing 36categorical variables 29

graphs and reports 34Caustic Soda 21CCD See central composite designsc-Chart 132CDF plot command 33center points

central composite designs 215response surface designs 215

central composite designs 215, 219–220See also response surface designs

Chakravarty 205changing individual levels 172Chart launch dialog 181–182Chart platform 171, 181

Frame Options 183Level Options 184Platform options 185Single-Chart Options 182

chart platform 172changing all levels 173

chart types 185Chi Square statistic 35clipboard 180Close command 19Coating.jmp 107, 125–126Color or Mark by Column command 49colors and markers 49Colors command 184Column Info 96Columns command in reports 34Compare Means command 50, 62comparison circles 50, 63

interpretation 63Comparison Circles command 66Confid Curves Fit command 59Confid Curves Fit option 52Confidence Interval command 24, 37Confidence Intervals 155–156confidence intervals

in ANOVA 50in linear regression 52

mean 23, 30, 36score 37selecting level 37

confidence limitsin linear regression 59

confounding 207, 212resolution numbers 204

confounding pattern 208Connect Color command 187Connect Means command 66Connect Points option 174, 183, 186Connect Through Missing 186Connecting Lines 149constant estimate 152Constrain fit 155contingency table 54, 66

analysis 45, 66reports 66

Contingency Table command 67continuous factors 203continuous variables 29, 46

graphs and reports 30popup menu 30

Contrast dialog 84contrasts 84Control Charts

c 132Individual Measurement 127Moving Range 127np 130p 130R 125S 125Shewhart 124–133u 131XBar 125

Copy command 18, 180corrected total 89correlation 161–170correlation coefficient 53correlation matrix 163Correlation of Estimates command 95Correlations Multivariate 163Cotter designs 206count 34Count Axis command 31counts per unit (power and sample size) 235covariance 161–170Covariance Matrix 166Cp 41, 104Cp 104

Index JMP-SE 241In

dex

Cpk 41Cpm 41Cross button 82, 86crossed effect 86crosstabs table 66cumulative distribution function 33cumulative logistic probability plot 67cumulative probabilities 34Current Estimates table 102Custom 156Custom Test command 94cut and paste 18, 28, 171, 180

Ddamped-trend linear exponential smoothing 158data table

opening 15Data Table Window 39, 66defects 235Denim.jmp 21, 29, 45, 71, 79, 171

details 21Density Axis command 31Density Ellipse 164–165, 167Density Ellipse command 53, 61density functions 33descriptive statistics 22design

resolutions 204designs

aberration 205Box-Behnken 215–216, 219central composite 215fractional factorials 204full factorial 189, 195, 199full factorials 204minimum aberration 205mixed-level 205orthogonal

screening designs 204surface designs 220

orthogonal arrays 205Plackett-Burman 205response surface 215screening 199uniform precision 220

desirabilityvalues 202

DF 151DFE 104dialog boxes

dragging in 46Difference to Detect option 227, 230Differencing Order 155Direction 102disclosure icon 53DispayBox command 184Display Options command 62, 65Distribution platform 16–17, 21

graphs 29launch dialog 17, 22, 24launching 22report 25

DOEsimple examples 199

double exponential smoothing 157drag 147, 166dummy variables 93Dunnett’s comparisons 62Durbin-Watson Test 96

EEach Pair, Student’s t command 62Edit Formula 96effect

aliasing 207eigenvalue 218eigenvector 218size 226sparsity 199, 205–206

effect details 93Effect Leverage Pairs 97Effect Leverage personality 88Effect Screening personality 88Effect Test table 91effects

nonestimable 204orthogonal 220

eigenvalue of effect 218eigenvector of effect 218Ellipse Alpha 166Ellipse Color 166Enter All 103Entered 103equal variances in t test 47Error Bars command 62error SS 90error standard deviation 226–227Estimate 103, 152, 155, 157evolution 156Expanded Estimates command 93

242 Index JMP-SE

exponential smoothing see Smoothing Modelsextra parameters 227

F“F Ratio” 104F Ratio 47F test 90Factor 152Factor Profiling option 193factorial designs

fractionals 204full 189, 195, 199, 204three level 205

Factorial Sorted macro 87Factorial to Degree macro 86factors

categorical 203continuous 203key factors 199

false negatives 206Fat Plus (selection) tool 18, 180File tab 13File/Edit toolbar 13–14Fit Distribution 26, 44Fit Line command 52, 54, 58Fit Mean command 51, 58Fit Model

dialog 79–80, 85platform 79

Fit Model platform 100examining results 81launching 79Save 96–97

Fit Polynomial command 59Fit Special command 61Fit Y By X platform 45

launching 45Fitness.jmp 99fitting lines 51fitting personality 85, 87Fixed 157Forecast Periods 147, 153Forecast plot 153Formula command 56Formula Editor 56Forward 102fractional factorial designs 204Freq button 86frequencies table 34frequency 34

full factorial designs 189, 195, 199, 204examples 189

Full Factorial macro 83, 86

Ggeneral linear model 79Go 101, 103goal types 195, 202goals

matching targets 202minimizing and maximizing 202

Goodness of Fit 33Grand Mean command 65Graph 149Graph menu 171, 173, 175Graph toolbar 13–14Graphs tab 13Group By command 54, 61group variances

homogeneity 64grouping variable 54, 61, 181

Hhand tool

in Distribution platform 22with Distribution platform 31

help system 11Help tool 12histogram 16, 18, 22, 24

bar position 23bar widths 22red bracket (box plot) 32using 22

Histogram command 30Hoeffding’s D 168, 170Holt smoothing 158homogeneity of variances 64honestly significant difference 62Horizontal command 185Horizontal Layout command 17, 30hypothesized means

specifying 25

Iidentifying key factors 199importance of responses 195, 202independent variables 55Individual Confidence Interval 96

Index JMP-SE 243In

dex

Individual Measurement Chart 127inertia of Scroller tool 82Inscribe option 220interaction effect, adding 82interactions 206

high-order 204Intercept 155intercept 152interquartile range 31Introduction sections, about 12Inverse Corr table 163inverse correlation 163, 170Invertible 151

JJMP Starter 12–13JSL (JMP Scripting Language)

animation scripts 228

KKeep the Same command 196, 210, 221Kendall’s Tau 168Kendall’s tau-b 169k-Sample Means (power and sample size) 231

LL18 Chakravarty 205L18 Hunter 205L18 John 205L36 205Label column 211, 222lack of fit 59, 61, 90

error 90table 60

Lag 152Lasso tool 48leaf values 32least squares means 82, 93least squares regression 51, 58, 79legends

with colors and markers 49Level Midpoints command 38Level Numbers command 38Level Options command 185level smoothing weight 156Levene’s test 64leverage plot 91–92

whole model 92

likelihood ratio tests 36Line Chart command 183Line of Fit command 65Line Style 187linear contrasts 84linear exponential smoothing 158linear regression 51, 58

confidence limits 59Lock 103Log function 57Log10 function 57Logistic platform 55Logistic Plot command 68logistic regression 67Lognormal 33Long-term sigma 39Lot Number column 21Lower Spec Limit 39LSMeans 82, 93LSMeans Contrast command 84, 93LSMeans Plot 82LSMeans Plot command 93LSMeans Student’s t command 93LSMeans Table command 93

Mmacros 83Macros drop-down list 86Make Model 103–104Mallow’s Cp criterion 104Marker Drawing Mode 184Marker Size command 184Markers command 184marking points 48matched pairs 47

plot interpretation 75scatterplot 73

Matched Pairs platform 47, 71interpreting the scatterplot 73launching 72preparing the data 71

matching target goals 195, 202maximize responses 195, 202maximizing

goals 202mean 16, 18, 30

confidence interval 30, 36specifying hypothesized 25test 35testing 24

244 Index JMP-SE

Mean CI Lines command 65Mean Confidence Interval 96Mean Error Bars command 65Mean Error Bars option 48Mean Line 149Mean Lines command 65means

one and two sample 226Means and Std Dev command 48means diamonds 31, 50Means Diamonds command 62, 65Means Dots command 62Means/Anova/t test command 50, 62Means/Std Dev/Std Err command 62median 16, 18, 31–32Median rank scores 64Method column 21Minimal Report personality 88minimize responses 195, 202minimizing goals 202minimum aberration designs 205missing value 163missing values 167Mixed 103mixed-level designs 205Mixture Response Surface macro 87Model Comparison table 151model effects 86Model script

Model Specification dialog 211, 217, 222model sum of squares 90Model Summary table 151, 155Modeling tab 13modeling type 29, 45Moments command 30Moments table 36More Moments command 30mosaic plot 54, 66Mosaic Plot command 67Moving Average Order 155Moving Range Chart 127MSE 104multiple comparison tests 62–63, 93multiple regression example 99–105Multivariate 161, 163Multivariate platform 161–170

NN Responses button 195, 202nDF 104

Needle Chart command 183Needle option 186Needle Plot command 172nested effect 86New Column

command 57nominal logistic regression see Logistic platformnominal variables 29nominal/ordinal by continuous fit see Logistic

platformnonconforming unit 43nonestimable effects 204Nonparametric Correlations 168Nonparametric Measures of Association

table 168nonparametric tests 25, 35Normal 33Normal 26normal density ellipse 164normal quantile plot 25, 32Normal Quantile Plot command 25, 31, 64Normal Quantiles command 38normality 25np-Chart 130Number of Forecast Periods 150number of runs

screening designs 204

OO’Brien’s test 64OC Curves 118Oil1 Cusum.jmp 138On Face option 220one way analyses 47one way ANOVA 46one-sample and two-sample means 226one-sample proportion (power and sample

size) 233one-sample variance (power and sample

size) 232opening data tables 15order of runs 196, 210, 221ordinal variables 29orthogonal array designs 205orthogonal designs

screening designs 204surface designs 220

Orthogonal option 220Other 166outlier box plot 30–31

Index JMP-SE 245In

dex

outliers 31outside effect 86overlap marks 50Overlay Color command 183Overlay Marker Color 187Overlay Marker command 183, 187Overlay option 185–186Overlay Pattern command 183Overlay Plot platform 173, 185

Connect Points option 174platform options 186single-plot options 186Y Options 174Y Options command 186

Pp value 35, 47, 81, 85p, d, q parameters 155Pairwise Correlations 163Pairwise Correlations table 167Parameter 103parameter estimates 60

with fitted lines 52Parameter Estimates Table 60, 91Parameter Estimates table 152parameters, extra 227Partial Autocorrelation 149partial autocorrelation 148Partial Corr table 164partial correlation 164Paste command 19, 180Pattern column 197, 201, 211, 222patterns

confounding 208p-Chart 130Pearson Chi Square test 36Pearson correlation 167, 169Pen Style command 183Periods Per Season 156personality 99Pickles.jmp 127Pie command 185Plackett-Burman designs 205Plot Actual by Predicted 96Plot Actual by Quantile command 65Plot Effect Leverage 96Plot Quantile by Actual command 65Plot Residual By Predicted 96Plot Residual By Row 96Plot Residuals command 53

plotsActual-by-Predicted 212

Point Chart command 183points

axial 215colors and markers 49

Points command 65Points Jittered command 66Points Spread command 66Poisson-distributed counts 235polynomial effect 87Polynomial to Degree macro 87power

analyses 226in statistical tests on means 226one-sample and two-sample means 227–228

Power Analysis command 93power and sample size calculations 225–237

animation 229counts per unit 235k-sample means 231one-sample and two sample proportions 233one-sample mean 228one-sample variance 232sigma quality level 236two-sample means 230

Ppk Capability Labeling 40Predicted Values 96predicted values

saving 58–59prediction

variances 220Prediction Formula 96prerequisites to using JMP INTRO 11Press 96Print command 18printing reports 18Prob Axis command 31Prob Scores command 38Prob to Enter 102Prob to Leave 102Prob>|t| 152“Prob>F” 104probabilities

testing 26, 35Probability Labels command 65process capability ratio 41product-moment correlation 167, 169proportions (power and sample size) 233Pumice Stone 21pure error 90

246 Index JMP-SE

p-value 36

Qq-q plot 31Quantile Box Plot command 32quantile-quantile plot 31quantiles 32Quantiles command 30, 48, 62quartiles 31

Rr 53R2 59, 89R2

adjusted 59, 89Random Effect 99Randomize within Blocks 210, 221randomizing

runs 196, 210, 221Range option 186Range Plot command 186Ranks Averaged command 38Ranks command 38R-Chart 125Reactor 32 Runs.jmp 189Reactor Factors.jmp 189–190Reactor Response.jmp 189red bracket (box plot) 32red triangle popup menus 17Redo Analysis command 66regressor columns 206Remove 147Remove 85Remove All 103Remove button 82Remove Fit command 53reports

setting titles 80requesting additional runs 196, 210, 221re-running an analysis 82rescaling designs 220Residual Statistics 154Residuals 96residuals 53, 153

plotting 53saving 58

resolution numbers 204resolutions of designs 204response surface designs

examples 216–223introduction 218purpose 215reports 217

Response Surface Effect macro 87Response Surface Methodology (RSM) 218response surface models 87responses

custom designs 202, 219desirability values 202goals 195, 202lower limits 202upper limits 202

resultsannotating 28

revealing columns in reports 34RMSE 193, 212root mean square error 59, 89Rotatable option 220Row Colors command 49Row Markers command 49row states 23RSM (Response Surface Methodology) 218RSquare 104, 151Rsquare 89

Adj 89RSquare Adj 104Run Charts 114Run Model 100Run Model button 84, 88runs

additional 196, 210, 221order they appear in table 196, 210, 221requesting additional 196, 210, 221screening designs 204

Ssample autocorrelation function 149sample data

Denim.jmp 21, 29, 45, 71, 79, 171details 21

sample means 226Sample Size and Power command 225sample sizes

example comparing one proportion to base-line and sample size plot 234

example comparing single-direction one-sam-ple variances 232

example with counts per unit 235one and two sample means 227

Index JMP-SE 247In

dex

prospective power analysis 226screening designs 189

Sand Blasted? column 21Save 40Save Centered command 65Save Columns 154Save commands 37Save Normal Quantiles command 65Save Predicted Values 96Save Script 38Save Script for All Objects command 66Save Script to Data Table command 66Save Script to Report command 66Save Script to Script Window command 66Save Standardized command 65SBC 151scaling

axial 220designs 220

scatterplot 45–46, 58Scatterplot Matrix 164scatterplot matrix 163S-Chart 125Schwartz’s Bayesian Criterion 151score confidence intervals 37screening designs 199

design types 204Script 38Script submenu 66, 185–186scripts

animation 228generating the analysis modelModel script See Model table propertyscripting See JSL

Scroller tool 81seasonal exponential smoothing 158seasonal smoothing weight 156Select Columns list 86select rows in data table 161selecting and marking points 48selecting report items 18, 180selection tool 18, 180Separate Axes command 185–186Seriesg.jmp 143Set Alpha Level command 51, 64setting titles

windows and reports 80Shewhart Control Charts 124–133Shirts.jmp 132Short Term, Grouped by Column 40Short Term, Grouped by fixed subgroup size 40

shortest half 32Show Center Line 127Show Confidence Interval 153–154Show Correlations 165Show Histogram 165Show Points 149, 153–154Show Points command 58, 183, 186sigma 39–40Sigma Quality 41sigma quality level (power and sample size) 236signed-rank test 35significance probability 167

stepwise regression 99simple exponential smoothing 157single-sample means (power and sample

sizes) 228Size/Scale commands 184Smoothing Model dialog 156smoothing models 143, 155–159smoothing weight 156Sort Left to Right 196, 210, 221Sort Right to Left 196, 210, 221sparsity, effect 199, 205–206Spearman’s Rho 169Spearman’s Rho 168Spec Limits 40Specified Sigma 40specifying hypothesized means 25Split command 71Split command

selecting rows 71SS 104SSE 104Stable 151Stable Invertible 156Stack 38Standard Deviation 151standard deviation 16, 18, 25, 30

testing 35standard deviation, error 226Standardize command 38standardized values 38star points 215starting JMP INTRO 12statistical tests 34Statistics 171Statistics button 181Std Dev Lines command 62, 65Std Dev Lines option 48Std Err Bars command 31Std Error 152

248 Index JMP-SE

Std Error of Individual 97Std Error of Predicted 96Std Error of Residual 97StdErr Prob 34Stem and Leaf command 32Step 101, 103Step command 186Step History table 102stepwise regression 100

Control panel 102–103Stop 103Studentized Residuals 96Sum of Squared Errors 151Summary of Fit table 59, 89sums of squares 89surface designs See response surface designs

Tt Ratio 152t statistic 35t test 25, 35–36, 46–47

report 46two sample 46–47

Tables menuSplit command 71

Tables tab 13Tables toolbar 14Tag Line option 28Target 39target values 202Term 152Test Mean

command 24, 35dialog 25

Test Probabilitiescommand 27, 36table 36

Test Std Dev command 35testing a mean 24testing for independence 55testing probabilities 26

scaling estimated values 27Tests command 67Thread Wear column 21Thread Wear Measured column 21Time ID role 147Time Series 143Time Series Graph 149Time Series platform 143–159

ARIMA 154–155

commands 148–150example 143–148smoothing models 155–159

Time Series Plot 148Time Series role 147titles

setting in windows and reports 80toolbars 13

showing and hiding 14Tools toolbar 13–14trade-off in screening designs 204trend 156Try Fixed Width 179Tukey-Kramer HSD 62tutorial examples

DOE 199–201full factorial designs 189multiple regression 99–105response surface designs 216–223time series 143–148

two-level categorical 200two-level fractional factorials 204two-level full factorials 204two-sample and one-sample means 226, 230two-sample proportion (power and sample

size) 233two-way contingency table 54

Uu-Chart 131Unconstrained 156–157UnEqual Variances command 64Ungroup Plots 186uniform precision designs 220Uniform Scaling 38Uniform Y Scale 186Univariate 163Upper Spec Limit 39Use Median 127User Defined option 220using histograms 22

Vvalues

target 202Van der Waerden 64variables

categorical 29continuous, ordinal, and nominal 29

Index JMP-SE 249In

dex

modeling type 29standardized values 65

Variance Estimate 151variance of prediction 220variances

equality in t test 47Vertical command 185

W-ZWashers.jmp 130Weibull 33Weight button in Fit Model 85weight, importance 195, 202Welch ANOVA 64Westgard Rules 122Where 38whiskers 31whole model 92Whole Model Test table 68Wilcoxon rank scores 64Wilcoxon signed-rank test 25, 35Window menu 23, 26, 79–80, 82, 84windows

setting titles 80Winter’s method 159With Best, Hsu’s MCB command 62With Control, Dunnett’s command 62word processing program

with cut and paste 28, 180X role 147X-Axis Proportional command 66XBar Chart 125Y button in Fit Model 85Y Options command 185Y role 147Y, Columns button 22Z statistics 43z test 25, 35Zero To One 156

Notices

Technology License NoticesThe ImageMan DLL is used with permission of Data Techniques, Inc.

Scintilla is Copyright 1998-2003 by Neil Hodgson <[email protected]>. NEIL HODGSON DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MER-CHANTABILITY AND FITNESS, IN NO EVENT SHALL NEIL HODGSON BE LIABLE FOR ANY SPECIAL, INDI-RECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFT-WARE.

XRender is Copyright © 2002 Keith Packard. KEITH PACKARD DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FIT-NESS, IN NO EVENT SHALL KEITH PACKARD BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUEN-TIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

SAS INSTITUTE INC.’S LICENSORS MAKE NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITH-OUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, REGARDING THE SOFTWARE. SAS INSTITUTE INC.’S LICENSORS DO NOT WARRANT, GUAR-ANTEE OR MAKE ANY REPRESENTATIONS REGARDING THE USE OR THE RESULTS OF THE USE OF THE SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY, CURRENTNESS OR OTHERWISE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THE SOFTWARE IS ASSUMED BY YOU. THE EXCLUSION OF IMPLIED WARRANTIES IS NOT PERMITTED BY SOME STATES. THE ABOVE EXCLU-SION MAY NOT APPLY TO YOU.

IN NO EVENT WILL SAS INSTITUTE INC.’S LICENSORS AND THEIR DIRECTORS, OFFICERS, EMPLOYEES OR AGENTS (COLLECTIVELY SAS INSTITUTE INC.’S LICENSOR) BE LIABLE TO YOU FOR ANY CONSE-QUENTIAL, INCIDENTAL OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR LOSS OF BUSINESS PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, AND THE LIKE) ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE EVEN IF SAS INSTITUTE INC.’S LICENSOR’S HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. BECAUSE SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATIONS MAY NOT APPLY TO YOU. SAS INSTITUTE INC.’S LICENSOR’S LIABILITY TO YOU FOR ACTUAL DAMAGES FOR ANY CAUSE WHATSOEVER, AND REGARDLESS OF THE FORM OF THE ACTION (WHETHER IN CONTRACT, TORT (INCLUDING NEGLIGENCE), PRODUCT LIABILITY OR OTH-ERWISE WILL BE LIMITED TO $50.00.

Using_JMP

Documents

load sand blasted effect

column labeled prob

western electric rules

platformselect analyze

jmp student edition

central composite designs

central composite design

red triangle icon