Top Banner
UNIVERSITY OF KABIANGA SCHOOL OF SCIENCE AND TECHNOLOGY DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE THE STUDY ON THE FACTORS AFFECTING THE DIFFICULTY OF A SUDOKU PUZZLE A PROJECT REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF DEGREE OF BACHELOR OF SCIENCE IN APPLIED STATISTICS WITH COMPUTING OF UNIVERSITY OF KABIANGA BY OKOYO COLLINS OMONDI AST/0037/09 SUPERVISORS: MR. RUEBEN C. LANGA’T MR. TONUI B. APRIL 2013
48
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BSc Statistical Project

UNIVERSITY OF KABIANGA

SCHOOL OF SCIENCE AND TECHNOLOGY

DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

THE STUDY ON THE FACTORS AFFECTING THE DIFFICULTY OF A SUDOKU PUZZLE

A PROJECT REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF DEGREE OF BACHELOR OF SCIENCE IN

APPLIED STATISTICS WITH COMPUTING OF UNIVERSITY OF KABIANGA

BY

OKOYO COLLINS OMONDI

AST/0037/09

SUPERVISORS: MR. RUEBEN C. LANGA’T

MR. TONUI B.

APRIL 2013

Page 2: BSc Statistical Project

2

DECLARATION I, Okoyo Collins Omondi, do hereby declare that this project report is my original work and has not been presented for an award of degree in any other university.

Sign:……………………………. Date:………………………………

This project has been submitted for examination with our approval as University supervisors.

Supervisors:

1. Mr. Rueben C. Langa’t Department of Mathematics and Computer Science University of Kabianga

Signature:…………………. Date:……………………..

2. Mr. Tonui B. Department of Mathematics and Computer Science University of Kabianga

Signature:…………………. Date:……………………..

Page 3: BSc Statistical Project

3

DEDICATION This project is dedicated to all the Sudoku players and hobbyists.

I dedicate this work too to my beloved parents Charles Okoyo and Lucy Okoyo for their

unconditional material and financial support; my siblings Everlyne, Evans, Basil, Sheilah and

Oliver for their overwhelming social support.

Page 4: BSc Statistical Project

4

ACRONYMS SAS: Statistical Application Software.

SPSS: Statistical Package for Social Sciences.

DOE: Design of Experiments.

Page 5: BSc Statistical Project

5

DEFINATION OF TERMS

Box: A 3 by 3 grid inside the Sudoku puzzle. It works the same as rows and columns, meaning it must contain the digits 1 – 9.

Region: This refers to a row, column or box.

Candidate: An empty square in a Sudoku puzzle have a certain set of numbers that does not conflict with the row, column and box it is in. Those numbers are called candidates or candidate numbers.

Given: A given is defined as a number in the original Sudoku puzzle, meaning that a Sudoku puzzle has a certain number of clues which is then used to fill in new squares. The number filled in by the solver is, however, not regarded as a given.

Complete block: This is where each treatment appears in each block.

Response: The process output.

Factor: Uncontrolled or controlled variable whose influence is being studied.

Level: Setting of a factor (+, -, 1, -1, high, low, alpha, numeric).

Run: This is the treatment combinations; setting all factors to obtain a response.

Replicate: Number of times a treatment combination is run (usually randomized).

Repeat: Non – randomized replicate.

Inference space: Operating range of factors under study.

Design Expert: Software used to design experiments.

Page 6: BSc Statistical Project

6

ABSTRACT This project demonstrates how to apply design of experiments; in particular, the full factorial design to gain insight into real, everyday statistical problems and situations. The design of this project ultimately results in an intuitive understanding of statistical procedures and strategies most often used by practicing statisticians and scientists.

Hence, it’s expected that the choice on the study of factors affecting the difficulty of Sudoku puzzle provides a real statistical problem in designing different experiments.

Page 7: BSc Statistical Project

7

TABLE OF CONTENTS

DECLARATION...................................................................................................................................... 2

DEDICATION ........................................................................................................................................ 3

ACRONYMS.......................................................................................................................................... 4

DEFINATION OF TERMS ....................................................................................................................... 5

ABSTRACT ............................................................................................................................................ 6

CHAPTER ONE .................................................................................................................................... 10

1.0 BACKGROUND ........................................................................................................................ 10

1.1 PROBLEM STATEMENT ........................................................................................................... 11

1.2 STUDY PURPOSE..................................................................................................................... 11

1.3 STUDY OBJECTIVES ................................................................................................................. 11

CHAPTER TWO ................................................................................................................................... 12

2.0 LITERATURE REVIEW .............................................................................................................. 12

2.1 INTRODUCTION ...................................................................................................................... 12

2.2 HOW TO PLAY ....................................................................................................................... 13

2.3 OPERATIONALIZATION OF VARIABLES .................................................................................... 14

2.3.1 Response Variable .............................................................................................................. 14

2.3.2 Control Variables ................................................................................................................ 15

CHAPTER THREE ................................................................................................................................ 18

3.0 EXPERIMENTAL DESIGN .......................................................................................................... 18

3.1 PERFORMING THE EXPERIMENT ............................................................................................. 20

3.2 STATISTICAL ANALYSIS............................................................................................................ 20

CHAPTER FOUR .................................................................................................................................. 21

4.0 RESULTS AND ANALYSIS OF DATA ........................................................................................... 21

4.1 INTRODUCTION ...................................................................................................................... 21

4.2 SUMMARY OF THE EXPERIMENTS........................................................................................... 21

4.3 ANALYSIS OF SUDOKU PUZZLES WITH EASY DIFFICULTY RATING............................................. 22

4.3.1 Factors Effects .................................................................................................................... 22

4.3.2 The Analysis of Variance (ANOVA) ...................................................................................... 25

4.3.3 Model Prediction ................................................................................................................ 26

4.4 ANALYSIS OF SUDOKU PUZZLES WITH MEDIUM DIFFICULTY RATING ...................................... 27

Page 8: BSc Statistical Project

8

4.4.1 Factors Effects .................................................................................................................... 27

4.4.2 The Analysis of Variance (ANOVA) ...................................................................................... 30

4.4.3 Model Prediction ................................................................................................................ 31

CHAPTER FIVE .................................................................................................................................... 32

5.0 CONCLUSION AND RECOMMENDATIONS ............................................................................... 32

5.1 CONCLUSION ......................................................................................................................... 32

5.2 RECOMMENDATIONS ............................................................................................................. 33

APPENDICES ...................................................................................................................................... 34

1.0 Data Collection Tool for Easy Rating Sudoku Puzzles ............................................................... 34

2.0 Data Collection Tool for Medium Rating Sudoku Puzzles ........................................................ 38

3.0 Experiment data for Easy Sudoku puzzles ............................................................................... 42

4.0 Experiment data for Medium Sudoku puzzles ......................................................................... 45

REFERENCES ...................................................................................................................................... 48

LIST OF FIGURES

Figure 1: Sudoku Grid with Row, Column and Box Names ................................................................. 12 Figure 2: General view of Sudoku game environment ....................................................................... 13 Figure 3: Main Effects plot ................................................................................................................ 22 Figure 4: Interaction Plots ................................................................................................................. 23 Figure 9: Half – Normal Plot .............................................................................................................. 24 Figure 10: Pareto Plot ..................................................................................................................... 24 Figure 11: Residual Plots ................................................................................................................. 25 Figure 11: Main Effect Plots ............................................................................................................ 27 Figure 12: Interaction Plots ............................................................................................................. 28 `Figure 13: Half – Normal Plot .......................................................................................................... 29 Figure 14: Pareto Plot ..................................................................................................................... 29 Figure 15: Residual Plots ................................................................................................................. 30

Page 9: BSc Statistical Project

9

LIST OF TABLES

Table 1: The amount ranges of givens in each difficult level ............................................................. 15 Table 2: Variation of the Number of Givens. .................................................................................... 15 Table 3: Variation of the Distribution of Givens. ............................................................................... 16 Table 4: Variation of the Redundant Numbers. ................................................................................ 17 Table 5: The design matrix for each experiment in coded values. ..................................................... 18 Table 6: The design matrix in statistics values for the Easy experiment. ........................................... 19 Table 7: The design matrix in statistics values for the Medium experiment. ..................................... 19 Table 8: Effects List .......................................................................................................................... 23 Table 9: ANOVA FOR EASY EXPERIMENT .......................................................................................... 25 Table 12: Fit Statistics for Y1 .............................................................................................................. 26 Table 13: Effects List .......................................................................................................................... 28 Table 14: ANOVA FOR MEDIUM EXPERIMENT .................................................................................... 30 Table 15: Fit Statistics for Y1 .............................................................................................................. 31

Page 10: BSc Statistical Project

10

CHAPTER ONE

1.0 BACKGROUND In recent years Sudoku puzzles have become an increasingly popular pass time. Sudoku’s simple

set of rules and multiple levels of puzzle difficulty attract hobbyists with varying skills and

experience. Typically, Sudoku puzzles are categorized by level of difficulty i.e. Easy, Medium,

Hard, Killer, etc. Yet, it is not uncommon for the perceived difficulty of puzzles to vary greatly,

even within a single difficulty rating.

It is the goal of this project is to determine whether additional factors, other than the published

difficulty rating, have an effect on the difficulty of a Sudoku puzzle. In an attempt to keep the

experimental results practical for the casual Sudoku hobbyist, factors that can be easily estimated

by visual inspection of the puzzle were chosen. The factors are:

Number of givens – The number of initial givens provided in the puzzle.

Distribution of givens – The relative placement of initial givens in the puzzle.

Redundant numbers – The repetition of specific numbers in a puzzle’s set of initial

givens.

The interest is the effects of these factors on puzzles of a single published difficulty level. In

preparation for this project a full 23 factorial experiment was performed on a set of eight “Easy”

puzzles. The experiment was then repeated on eight “Medium” puzzles.

The results of the two experiments were used to show which of the above mentioned factors have

the largest effect on the puzzle difficulty. Several other effects were also determined and their

significance, further, a general model equation that approximates, as accurately as possible, the

time expected to be taken by a Sudoku hobbyist in solving the puzzle by taking into account the

above factors and holding other factors constant was as well determined.

The rest of the project report is organized as follows: Chapter two provides a brief description of

the objective of our two experiments and gives a quick background to the problem. Chapter 3

describes the experimental design used for the two experiments and about how the experiment

was run. Chapter 4 provides results and an analysis of the data. Conclusion and Recommendation

is given in Chapter 5.

Page 11: BSc Statistical Project

11

1.1 PROBLEM STATEMENT There are multiple factors that normally affect the difficulty of a Sudoku puzzle apart from the

published difficulty rating, that is, extremely easy, easy, medium, difficult, evil, etc., this project

will be limited to studying only three such factors:

1. Number of the initial givens.

2. The distribution of the givens.

3. The redundancy of the individual givens.

Hence, it’s expected that the knowledge of the effects of these factors would reduce the time

taken by Sudoku hobbyist in playing the game, subsequently, making the game enjoyable and

interesting.

1.2 STUDY PURPOSE Sudoku is today a popular game throughout the world and it appears in multiple Medias,

including websites, newspapers and books. As a result, it is of interest to find the factors

affecting the difficulty of a Sudoku puzzle besides the published difficulty rating of extremely

easy, easy, medium, difficult, evil etc.

Moreover, another goal of this study is therefore, to contribute to the knowledge and

comprehension of the Latin square designs, factorial designs and the design of experiments in

general, as the analysis of the factors affecting the difficulty of Sudoku puzzles employs the use

of these designs.

1.3 STUDY OBJECTIVES The broad objective of this experiment is to quantify the effects of a Sudoku puzzle’s initial

structure and set of givens on the expected time required to complete the puzzle.

In particular, the experiment will seek to fulfill the following;

1. What are the effects of the number of givens, distribution of the givens and the

redundancy of the specific givens on the difficulty of a Sudoku puzzle?

2. Are the effects of each factor consistent across levels of each factor?

Page 12: BSc Statistical Project

12

CHAPTER TWO

2.0 LITERATURE REVIEW

2.1 INTRODUCTION Sudoku puzzle, as a widely popular intellectual game in recent years, was invented in Swiss in

18th century. Then, it initially harvested well development in Japan in the past decades. The

name Sudoku actually derives from Japanese that means “number place” [1]

. Due to its simple

and friendly rules for beginners and the charm from intellectual challenge, Sudoku becomes

welcome recently for players of various ages. You are even able to solve a Sudoku puzzle easily

without any mathematical knowledge.

A Sudoku puzzle consists of a table with nine rows and nine columns. The squares (i.e. cells in

the table) are grouped in sets of nine which we will call boxes. For clarity we will call the rows

r1, r2, … r9, the columns c1, c2, … c9, and the boxes b1, b2, … b9. Figure 1 provides a diagram

showing a sample Sudoku grid with row, column, and box names. The squares are named sij

where i is the row number and j is the column number.

c1 c2 c3 c4 c5 c6 c7 c8 c9

r1 b1 b2 b3

r2 1

r3

r4 b4 5 b5 2 b6

r5 2 9

r6 6

r7 b7 b8 b9

r8

r9

Figure 1: Sudoku Grid with Row, Column and Box Names

Source:

www.onlinegames.com/sudokugame.

Page 13: BSc Statistical Project

13

2.2 HOW TO PLAY How is the Sudoku game played? “You only need to know where you play the game and what

your goal is. The simple aspects that help you join the game are specified as follows”[2]:

Game Environment: you may first get a general overview of this game board as shown below;

Figure 2: General view of Sudoku game environment Several basic components of the board are defined as Figure 1.1 illustrates. The whole board is

actually a 9-by-9 grid made of nine smaller 3-by-3 grids called blocks. The smallest unit square

is called a cell which has two types of states: empty, and confirmed by a digit from 1 through 9.

We mark the whole grid with rows and columns from top-left corner.

Goal of the Game: generally, Sudoku game is started with such a situation in grid that some of

the cells have already been confirmed by digits known as givens. The task for Sudoku players is

to place a digit from 1 to 9 into each cell of the grid, and meanwhile each digits can only be used

exactly once in each row, each column and each block. Additionally, all the nine rows, nine

columns and nine blocks are respectively ensured to contain all the digits from 1 through 9.

Page 14: BSc Statistical Project

14

These limitations for placing digits in three locations are respectively called row constraint,

column constraint and block constraint.

Based on the rules that we mentioned above, Sudoku players are commonly inspired to complete

the placement of digits into all empty cells using various techniques as soon as possible.

2.3 OPERATIONALIZATION OF VARIABLES

2.3.1 Response Variable The time required to complete a puzzle was the single response variable for this experiment. The

typical level for this variable can range from a few minutes to over half of an hour. The response

will be measured in terms of minutes. There is no practical limit on the range at which this

response can be measured. The project tried to as well develop and approximate the total time

expected to be taken averagely by any Sudoku hobbyist to complete filling the empty cells

correctly.

The following formula was used to approximate the response variable;

Ŷijk = μ + αi + βj + γk + εijk

Where; Ŷijk: Observation with factor A at level i, B at level j and C at level k.

μ: Mean response.

αi: Effect of factor A at level i.

βj: Effect of factor B at level j.

γk: Effect of factor C at level k.

εijk: Error term.

Page 15: BSc Statistical Project

15

2.3.2 Control Variables (i) Number of givens

As the first factor affecting the level estimation, the total amount of given cells in an initial

Sudoku puzzle can significantly eliminate potential choices of digits in each cell by the three

constraints in the game rules. In general, it is reasonable to argue that the more empty cells

provided at the start of a Sudoku game, the higher level the puzzle is graded in. We moderately

scale the amount ranges of givens for each difficult level as shown below;

Table 1: The amount ranges of givens in each difficult level Level Givens Amount Scores

1 (Extremely easy) more than 50 1

2 (Easy) 36-49 2

3 (Medium) 32-35 3

4 (Difficult) 28-31 4

5 (Evil) 22-27 5

Possibly the most obvious measurement of a puzzle’s difficulty is the number of givens provided

in the initial grid. Typical values range from 50 givens for easier puzzles to 20 givens for highly

difficult puzzles. For the two experiments, the control variable was varied as follows:

Table 2: Variation of the Number of Givens.

It was expected that the difficulty of a puzzle would increase as the number of initial givens

decreases.

(ii) Distribution of givens

Another possible variable is the distribution of the initial givens in the grid. One can imagine that

a puzzle with all, or most, of the givens crowded in one section of the grid may be different in

Easy Experiment Medium Experiment

Min = 36 Min = 32

Max = 49 Max = 35

Page 16: BSc Statistical Project

16

difficulty than a puzzle with the givens spread evenly around the grid. Formally, this geometric

property can be viewed as the variance of row, column, and box densities.

The density of a row, column or box is defined to be the number of givens provided in that row,

column or box. For example, row r4 in Figure 1 has a density of two, Column c3 has a density of

one, and box b5 has a density of three. The mean density is then

27

9

1

9

1

9

1

i

ii

ii

i

desnsity

brc

and the variance is

27

9

1

29

1

29

1

2

2

i

densityii

densityii

densityi

desnsity

brc .

The normal range for 2density in the sample set that was conducted was found to be between 0 and

3.333. Hence, for these two experiments, 2density was varied as follows:

Table 3: Variation of the Distribution of Givens.

It is expected that the difficulty of a Sudoku puzzle would decrease as 2density increases.

Easy Experiment Medium Experiment

1.19 <= Min <= 1.70 0.59 <= Min <= 0.96

2.15 <= Max <= 2.67 1.26 <= Max <= 2.81

Page 17: BSc Statistical Project

17

(iii) Redundant Numbers in the Initial Grid

This variable measures the variance, 2deg ree , in the number of times a specific number is repeated

in the initial Sudoku grid. Here, the degree of a number, )deg(i , is defined to be the number of

times the number i appears in the initial grid. So the mean degree is

9

)deg(9

1deg

i

ree

i

And the variance of the degree is

9

)deg(9

1

2deg

2deg

i

ree

ree

i .

The normal range for 2deg ree in the sample set was found to be between 0.5432 and 3.7778. For

these two experiments, 2deg ree was varied as follows.

Table 4: Variation of the Redundant Numbers.

It was less expected that the degree 2deg ree would be a significant factor in determining the

difficulty of the Sudoku puzzle.

Easy Experiment Medium Experiment

0.89 <= Min <= 1.33 0.89 <= Min <= 1.56

2.89 <= Max <= 3.56 2.00 <= Max <= 3.11

Page 18: BSc Statistical Project

18

CHAPTER THREE

3.0 EXPERIMENTAL DESIGN These experiments were run using two 23 full factorial designs. The original desire was to

perform a single 24 full factorial design with the additional factor being the published rating for

the puzzle. Unfortunately, a strong correlation between the published rating and the typical

values for our other factors was found. The correlation was so strong that Easy and Medium

puzzles with matching high and low values for the other factors could not be found. After

measuring the typical values for the factors on over 120 puzzles, it was suitable to run two

individual experiments. The first was run on a sample of puzzles with an Easy difficulty rating.

The second was run on a sample of puzzles with a Medium difficulty rating.

Table 5: The design matrix for each experiment in coded values.

Number of

givens

(Factor C)

Distribution

of givens

(Factor B)

Redundant

Numbers

(Factor A)

(1) -1 -1 -1

a -1 -1 1

b -1 1 -1

ab -1 1 1

c 1 -1 -1

ac 1 -1 1

bc 1 1 -1

abc 1 1 1

Page 19: BSc Statistical Project

19

Table 6: The design matrix in statistics values for the Easy experiment.

Number of

givens

(Factor C)

Distribution of

givens

(Factor B)

Redundant

Numbers

(Factor A)

(1) 36 1.56 0.89

a 36 1.70 3.11

b 36 2.67 0.89

ab 36 2.15 3.56

c 49 1.19 1.33

ac 49 1.26 2.89

bc 49 2.15 1.11

abc 49 2.15 3.33

Table 7: The design matrix in statistics values for the Medium experiment.

Number of

givens

(Factor C)

Distribution of

givens

(Factor B)

Redundant

Numbers

(Factor A)

(1) 32 0.59 1.11

a 32 0.59 2.00

b 32 1.56 0.89

ab 32 1.26 2.22

c 35 0.89 1.33

ac 35 0.96 3.11

bc 35 2.81 1.56

abc 35 2.81 2.67

Page 20: BSc Statistical Project

20

3.1 PERFORMING THE EXPERIMENT The two experiments were replicated 16 times each. The replicates were then blocked and each

block was run by a single individual. The random run order for each block was determined by

SAS Software and the result attached in the appendix of the report.

The experiments were started by creating a Perl script to calculate all the control variables

outlined in section 3.0 above. Eight different volumes of Sudoku puzzle books (in this case,

Sudoku appearing in the daily Standard and Nation Newspapers were used) from the same author

were purchased over the internet and proceed to transfer the puzzles from the books to the Perl

script. From the Perl script output, the high and low values for each of the variables being

analyzed were created. Once the high and low values were set, eight of the Easy rating puzzles

and eight of the Medium rating puzzles that fit well with the high and low values set for each

variable were identified.

Once the puzzles were chosen, the experiment designs were then set up in Design Expert

software. Then copies of the puzzles made and stapled in their individual blocks based on the run

order provided by the software.

Each block were then distributed to one of the 16 willing participants and were asked to perform

the puzzles in the order in which they were stapled. The participants were required to provide

the start time, finish time, and delta time for each of the puzzles.

Once the completed puzzles were returned, the delta times (response variable) were then added

to the run order table in the SAS Software and proceed to analyze the data.

3.2 STATISTICAL ANALYSIS The data from the two experiments were then analyzed individually in two subsections. SAS

Software and Design Expert were comparatively used to aid in the analysis of the data.

Page 21: BSc Statistical Project

21

CHAPTER FOUR

4.0 RESULTS AND ANALYSIS OF DATA

4.1 INTRODUCTION In this chapter, the experiment results are shown as given by the SAS software and Design Expert analysis tools, the results are thereof explained.

The data that was used for analysis for the two experiments; easy and medium experiments, are found in the appendices 3.0 and 4.0 respectively

The analysis are divided into two parts, with the first part being the analysis of easy experiment while the other deals with the analysis of medium experiment.

Out of the 16 individuals tasked with solving the Sudoku puzzles with easy difficulty rating, only 13 of them returned the puzzles in time to be analyzed in this report. This represented a return rate of 81%. Fortunately, since the design was blocked on replicates this therefore, had little to no effect on the analysis of the results in this report.

However, for the Sudoku puzzles with medium difficulty rating, only 11 out of 16 individuals returned their puzzles in time for analysis. This translates to 69% puzzles return rate.

4.2 SUMMARY OF THE EXPERIMENTS

DESIGN DETAILS Design type: Two-level Design description: Repeated Number of factors: 3 Number of runs: 128 Resolution: Full Number of blocks: 16 FACTORS Factors and Levels: __________________________________________________ Factor Label Low Center High __________________________________________________ C Number of Givens -1 0 1 B Distribution of Givens -1 0 1 A Redundancy of Givens -1 0 1 __________________________________________________

Page 22: BSc Statistical Project

22

RESPONSE _________________________ Response Label Units _________________________ Y1 Time seconds _________________________ BLOCK INFORMATION Block name: BLOCK Block label: INDIVIDUAL Number of blocks: 16

4.3 ANALYSIS OF SUDOKU PUZZLES WITH EASY DIFFICULTY RATING

4.3.1 Factors Effects The analysis was started by looking at the effects of various factors as presented by the SAS software.

Due to the interest in maintaining a hierarchical model, the model was screened by omitting only the ABC variable. Therefore, in the whole analysis the interaction factor ABC was omitted.

Main Factors and Interaction Plots

Figure 3: Main Effects plot

The main factors plot above shows that, for factor C; the mean of all response values for which C=-1 is 600, while that for which C=+1 is 500. The interpretation is similar for factor A, while that for factor B shows that both mean response values for B=-1 and B=+1 were the same.

Page 23: BSc Statistical Project

23

Figure 4: Interaction Plots

From the interaction plots above, there is apparent interaction between factors B and C, also between B and A, thus, suggesting a significant effect (because the lines are crossed). This implies that as such it would be difficult to determine whether one or both of these effects are significant, even though their interaction plot strongly suggests a significant effect.

To determine which factor(s) or interaction(s) has a significant effect we explore other ways like use of effect list, half – normal plots and others.

Table 8: Effects List

From the table 8 above, factors C, A and BA were significant, while factors B, CB and CA were not significant since they had a p – value greater than 0.05.

Factors C and A are exhibiting negative effects while the others are contributing positive effects.

Page 24: BSc Statistical Project

24

Figure 9: Half – Normal Plot

The half – normal plot was then run to analyze the same information using a more visual tool.

The plot identified variable C as the most significant factor. The other factors seem to fall on or near the insignificant line as shown in the figure 1 above.

Figure 10: Pareto Plot

To give the % effect contribution of each factor, a more appropriate plot (Pareto plot) was used as shown in figure 10 above. Once again, factor C has the highest contribution while factor B has the least contribution.

Page 25: BSc Statistical Project

25

4.3.2 The Analysis of Variance (ANOVA)

Table 9: ANOVA FOR EASY EXPERIMENT ANOVA for Y1

Master Model

Source DF SS MS F Pr > F

C 1 457788.5 457788.5 11.18521 0.001228 B 1 2803.846 2803.846 0.068507 0.794157

A 1 381634.6 381634.6 9.324534 0.003019 C*B 1 70096.15 70096.15 1.71267 0.194167

C*A 1 146250 146250 3.573348 0.062122 B*A 1 174496.2 174496.2 4.26349 0.041987

Model 18 8966008 498111.5 12.17043 0.0001 Error 85 3478881 40928.01

Total 103 12444888

When the ANOVA was run using the full model to identify significant variables and utilizing the p – values to identify variables meeting α = 0.05 requirement, variables C, A and interaction BA were found to be significant. The above table shows the details.

Figure 11: Residual Plots Normal Plot of Residuals Residual Vs Predicted

Page 26: BSc Statistical Project

26

The next step was to analyze the model adequacy via the residual plots as shown in the figure above. The plots identified a possible model adequacy in normality and equal variance assumptions.

The model produced the best results, with the normal probability plot passing the “fat pencil” test and the residual vs. predicted showing no patterns.

Since there was potential model adequacy, it was needless to transform the data to find a more accurate model.

In summary, when considering Easy Level Sudoku puzzles, the following factors are significant.

C – Number of Givens (negative) A – Repetition of Numbers (negative) BA – interaction effect (positive)

Based on the percentage contribution as projected by Pareto plot, it is clear that the Number of Givens is far and away the most significant factor.

4.3.3 Model Prediction

Predictive Model for Y1

Uncoded Levels: Y1 = 300 + 22.5*(BLOCK='10') + 465*(BLOCK='11') + 750*(BLOCK='12') + 637.5*(BLOCK='13') + 472.5*(BLOCK='16') - 45*(BLOCK='2') + 90*(BLOCK='3') - 97.5*(BLOCK='4') + 90*(BLOCK='5') + 22.5*(BLOCK='6') + 465*(BLOCK='7') + 270*(BLOCK='8') - 66.34615*C - 60.57692*A

Table 12: Fit Statistics for Y1 ____________________________________________ Master Model Predictive Model ____________________________________________ Mean 541.7308 541.7308 R-square 72.05% 68.88% Adj. R-square 66.13% 63.99% RMSE 202.3067 208.5942 CV 37.34451 38.50514

Page 27: BSc Statistical Project

27

Response(s): ___________________________________ Response Est. Value ___________________________________ Y1 300 [147.6909, 452.3091] ___________________________________

From the SAS output above, the regression equation above shows that it would take any person filling any easy rating Sudoku puzzle would approximately take 300 seconds to accurately complete the puzzle with a confidence interval of [147.6909, 452.3091] seconds at α = 0.05 level of significance.

4.4 ANALYSIS OF SUDOKU PUZZLES WITH MEDIUM DIFFICULTY RATING

4.4.1 Factors Effects The analysis was begun by looking at the effects of various factors as presented by the SAS software.

Due to the interest in maintaining a hierarchical model, the model was screened by omitting only the ABC variable. Therefore, in the whole analysis the interaction factor ABC was omitted.

Main Factors and Interaction Plots

Figure 11: Main Effect Plots

The main factors plot above shows that, for factor C; both mean response values for C=-1 and C=+1 were the same. The interpretation is similar for factor A, while that for factor B shows that the mean of all response values for which B=-1 is 625, while that for which B=+1 is 675.

Page 28: BSc Statistical Project

28

Figure 12: Interaction Plots

From the plots, there is a possible significant effect between factors A and C, C and B, A and B, C and A also between B and A.

Thus, suggesting a significant effect (because the lines are crossed). This implies that as such it would be difficult to determine whether one or both of these effects are significant, even though their interaction plot strongly suggests a significant effect.

To determine which factor(s) or interaction(s) has a significant effect we explore other ways like use of effect list, half – normal plots and others.

Table 13: Effects List

From the effects list generated above, all the main factors and their interactions seem to be insignificant since, they all have p-values greater than 0.05, when α=0.05 was taken as the level of significance.

Page 29: BSc Statistical Project

29

To confirm the results of the effects list, half – normal plot was used as indicated below.

`Figure 13: Half – Normal Plot

Using a more visual tool (the half – normal plot), it was further confirmed that indeed no factor is significant since some seem to fall on the Lenth’s PSE line while the remaining are at the space between Lenth’s PSE line and RMSE line.

Figure 14: Pareto Plot

`

To give the % effect contribution of each factor, a more appropriate plot (Pareto plot) was used as shown in figure 2 above. Interaction B*A has the highest contribution while factor C has the least contribution, even though none of these factors are significant.

Page 30: BSc Statistical Project

30

4.4.2 The Analysis of Variance (ANOVA)

Table 14: ANOVA FOR MEDIUM EXPERIMENT ANOVA for Y1

Master Model Source DF SS MS F Pr > F C 1 63.92045 63.92045 0.001401 0.970243 B 1 38097.28 38097.28 0.835226 0.363859 A 1 3108.284 3108.284 0.068144 0.794814 C*B 1 2967.284 2967.284 0.065053 0.799418 C*A 1 13975.92 13975.92 0.306401 0.581636 B*A 1 122478.3 122478.3 2.685152 0.105712 Model 16 3959457 247466.1 5.425321 0.0001 Error 71 3238535 45613.17 Total 87 7197992

When the ANOVA was run using the full model to identify significant variables and utilizing the p – values to identify variables meeting the α = 0.05 requirement, it was asserted that none of the variables had p-value less than 0.05, confirming that no factor was found to be significant. The above table shows the details.

Figure 15: Residual Plots

The next step was to analyze the model adequacy via the residual plots as shown in the figure above. The plots identified a possible model adequacy in normality and equal variance assumptions.

Page 31: BSc Statistical Project

31

The model produced the best results, with the normal probability plot passing the “fat pencil” test and the residual vs. predicted showing no patterns.

Since there was potential model adequacy, it was needless to transform the data to find a more accurate model.

In summary, it was clearly found that none of the hypothesized factors i.e. Number of givens, Distribution of the givens and Redundancy of the givens, and/or their factor interactions seemed not to have any significant influence on the difficulty of medium rating Sudoku puzzles.

4.4.3 Model Prediction

Table 15: Fit Statistics for Y1 ____________________________________________ Master Model Predictive Model ____________________________________________ Mean 644.0795 644.0795 R-square 55.01% 55.01% Adj. R-square 44.87% 44.87% RMSE 213.5724 213.5724 CV 33.15932 33.15932 ____________________________________________

Response(s): ______________________________________ Response Est. Value ______________________________________ Y1 464.75 [314.1888, 615.3112] ______________________________________ From the SAS output above, the estimation table above shows that it would take any person filling any medium rating Sudoku puzzle approximately would take 464.75 seconds to accurately complete the puzzle with a confidence interval of [314.1888,615.3112] seconds at α = 0.05 level of significance.

Page 32: BSc Statistical Project

32

CHAPTER FIVE

5.0 CONCLUSION AND RECOMMENDATIONS

5.1 CONCLUSION Based on the two experiments discussed in this study, it is clear that the number of givens (i.e.

factor C) had the greatest effect on the difficulty of an Easy rating Sudoku puzzle. This factor

also seemed to have some minimal effect on the Medium rating Sudoku puzzle; it failed to reach

the significant level as projected by p-value in the ANOVA table 14.

In addition, the Redundancy (repetition) of the givens (i.e. factor A) also had some significant

effect on the difficulty of an easy rating Sudoku puzzle but this factor failed as well to reach the

significant level of influence in the medium rating Sudoku puzzle.

Furthermore, the interaction factor BA as well exhibited significant influence on the easy rating

process.

Nonetheless, in the medium it was clearly found that none of the hypothesized factors i.e.

Number of givens, Distribution of the givens and Redundancy of the givens, and/or their factor

interactions seemed not to have any significant influence on the difficulty of medium rating

Sudoku puzzles.

The Adjusted R2 values reported for both the Easy and Medium levels of experiment were both

unusually small, they were 0.6613 (66.13%) and 0.4487 (44.87%) for the easy and medium

rating experiments respectively. Normally, this may be a cause for concern. In these particular

experiments, the ANOVA for both experiments shows that the block effects were largely

significant. When the block effects are that significant, it become pretty difficult and tricky to

pick a single model that accurately describes the general behavior of the response.

Since the Adjusted R2 values for these experiments are small, the regression model is not

expected to accurately predict the completion time for any puzzle. Though, according to these

experiments the roughly projected times for completion are 300 seconds and 464.75 seconds for

easy and medium experiments respectively.

Page 33: BSc Statistical Project

33

5.2 RECOMMENDATIONS After the successful completion of the experimentation, it is in the interest of the investigator to

make the following recommendations;

1. Due to the inaccurate prediction of the completion time, further experimentation is

required to accurately model the completion time for a specific person and specific

Sudoku puzzle.

2. During similar studies, participants (blocks) should be properly trained first to reduce the

significance of their effects.

3. Since, in this study, the investigator found no significant effect of the hypothesized

factors in the medium experiment, different factors and/or careful analysis of these

factors should be considered in future studies.

Page 34: BSc Statistical Project

34

APPENDICES

1.0 Data Collection Tool for Easy Rating Sudoku Puzzles Kindly complete the below Sudoku and provide the start time, end time and the delta time (i.e. the difference between the start and end times).

Provide the delta time in seconds.

1.

2.

9 8 6 7 1 5

6 8 2 3 4

7 5 1 4 9 2

8 2 4 9

2 4 9 5 3 8 1

5 9 3 8 6

9 1 7 3 4

7 8 2 1 5

4 3 8 1 6 7

9 7 5 3

1 2 4 8

3 1 8 6

5 6 2

2 3 4 9 1 5

3 6 7

9 2 7 1

6 4 8 9

5 1 4 2

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 35: BSc Statistical Project

35

3.

8 2 6 5 3

5 9 2 4

1 7 4 9 5

9 5 2 1 3 4

8 7 9

3 8 9 6 1

6 3 9 1 7 5

1 8 4 6

7 5 8 2

4.

6 3 4 9 1

5 6 2 8

8 1 7 6

4 1 2 6 7

9 7 5 4

3 8 7

4 3 1 9

1 6 5 4

7 2 8 3 5

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 36: BSc Statistical Project

36

5.

8 3

9 1 2 7 4

2 8 6 7 9

5 9 8 2

8 6 7 4 3

2 3 9 8 4

4 1 3 5 7

7 8 9 5 1

9 7 8 6

6.

9 2 5 7 8 6

5 7 4 1 3 2

8 3 2 9 6

2 1 5 9 4 6

4 1 6 3 2

6 8 2 1 3

3 7 9 5 4

2 4 1 6 7

4 7 5 6 1 9

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 37: BSc Statistical Project

37

7.

4 9 7 6

1 7 4 8 5

8 9 5 4 7

4 9 7 8 2 5

2 5 3 4 8 6

8 6 1 4 7

5 6 4 2 3

4 6 5 7 9

2 1 3 7 5 4

8.

4 7 5 3 1

7 6 1 9

1 2 8 3

1 5 7 4 3

9 8 1

5 2 1 6 7

8 4 3

3 7 2 1 5

7 9 6 1 2

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 38: BSc Statistical Project

38

2.0 Data Collection Tool for Medium Rating Sudoku Puzzles Kindly complete the below Sudoku and provide the start time, end time and the delta time (i.e. the difference between the start and end times).

Provide the delta time in seconds.

1.

2.

5 2 9 4

1 7

7 4 6 1 9

7 8 4 6 3 5

4 1 9

6 1 3 7 4

2 7 4 5

5 3

1 3 5 8

5 3 9

6 5 7 2

4 7 2 6 5

1 7

9 2 7 1

1 9

7 3 6 1 5

3 4 1 8

8 5 3

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 39: BSc Statistical Project

39

3.

1 6 2 5

9 8 1 6

2 7 6 1

6 1 2 9

3 5 4 1

4 1 3 5 8

1 7 9

1 4 3

9 8 6 1

4.

2 9 5

7 8 1 6

6 8 4 1

1 9 3 7

5 7 8

8 2 3 9

4 7 5 1 2

6 4 5 3

3 5 7 8

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 40: BSc Statistical Project

40

5.

5 6

3 1 6 5 4

7 8 3 9 2

5 8 3

5 3

6 7 2

9 4 5 6 1 2

6 3 2 7 9

7 2 9 3

6.

3 5 8

2 4 9

2 7 8

3 8 2

6 1 2 7 9 5

4 5 6

9 1 7 6 2

8 1 3 9

4 5

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 41: BSc Statistical Project

41

7.

3 1 7 8 6 4

4 1

5 4

8 9 5 3

9 4

7 6 3 5

8 6 5

9 5 8

6 8 1 5 2 4 9

8.

9 8 4 1

1 7 9 3 6 8

6 2 1 8 4

9 8 2

8 7 6 5

7 1 4 5 8

8

5 8 7 2

Start time:……………………..

End time:……………………….

Delta time:…………………….

Start time:……………………..

End time:……………………….

Delta time:…………………….

Page 42: BSc Statistical Project

42

3.0 Experiment data for Easy Sudoku puzzles

DESIGN POINTS (Uncoded) _________________________________________________ RUN BLOCK C B A Y1 _________________________________________________ 61 8 -1 -1 1 360 57 8 -1 -1 -1 600 63 8 -1 1 1 600 58 8 1 -1 -1 480 59 8 -1 1 -1 540 62 8 1 -1 1 600 60 8 1 1 -1 660 64 8 1 1 1 720 42 6 1 -1 -1 300 46 6 1 -1 1 180 48 6 1 1 1 480 44 6 1 1 -1 120 47 6 -1 1 1 240 43 6 -1 1 -1 300 41 6 -1 -1 -1 540 45 6 -1 -1 1 420 75 10 -1 1 -1 360 77 10 -1 -1 1 360 73 10 -1 -1 -1 540 76 10 1 1 -1 240 79 10 -1 1 1 300 74 10 1 -1 -1 300 80 10 1 1 1 180 78 10 1 -1 1 300 3 1 -1 1 -1 . 5 1 -1 -1 1 . 4 1 1 1 -1 . 7 1 -1 1 1 . 1 1 -1 -1 -1 . 6 1 1 -1 1 . 8 1 1 1 1 . 2 1 1 -1 -1 . 37 5 -1 -1 1 420 40 5 1 1 1 300 36 5 1 1 -1 300 35 5 -1 1 -1 300 33 5 -1 -1 -1 780 34 5 1 -1 -1 360 38 5 1 -1 1 240 39 5 -1 1 1 420 10 2 1 -1 -1 300 11 2 -1 1 -1 120 14 2 1 -1 1 180 15 2 -1 1 1 480 9 2 -1 -1 -1 360 16 2 1 1 1 300

Page 43: BSc Statistical Project

43

12 2 1 1 -1 120 13 2 -1 -1 1 180 104 13 1 1 1 960 100 13 1 1 -1 840 99 13 -1 1 -1 1200 101 13 -1 -1 1 780 98 13 1 -1 -1 600 103 13 -1 1 1 1140 97 13 -1 -1 -1 1200 102 13 1 -1 1 780 84 11 1 1 -1 960 83 11 -1 1 -1 840 86 11 1 -1 1 420 81 11 -1 -1 -1 660 88 11 1 1 1 960 85 11 -1 -1 1 600 82 11 1 -1 -1 780 87 11 -1 1 1 900 92 12 1 1 -1 780 90 12 1 -1 -1 1140 91 12 -1 1 -1 900 94 12 1 -1 1 660 93 12 -1 -1 1 1320 95 12 -1 1 1 540 89 12 -1 -1 -1 2100 96 12 1 1 1 960 125 16 -1 -1 1 660 121 16 -1 -1 -1 1020 124 16 1 1 -1 720 128 16 1 1 1 660 127 16 -1 1 1 780 126 16 1 -1 1 600 122 16 1 -1 -1 960 123 16 -1 1 -1 780 22 3 1 -1 1 240 19 3 -1 1 -1 900 17 3 -1 -1 -1 600 20 3 1 1 -1 300 23 3 -1 1 1 480 21 3 -1 -1 1 300 24 3 1 1 1 120 18 3 1 -1 -1 180 67 9 -1 1 -1 240 71 9 -1 1 1 240 68 9 1 1 -1 240 70 9 1 -1 1 240 69 9 -1 -1 1 240 66 9 1 -1 -1 240 72 9 1 1 1 240 65 9 -1 -1 -1 720 54 7 1 -1 1 420 49 7 -1 -1 -1 720 52 7 1 1 -1 720 55 7 -1 1 1 420 51 7 -1 1 -1 1560

Page 44: BSc Statistical Project

44

50 7 1 -1 -1 960 56 7 1 1 1 600 53 7 -1 -1 1 720 29 4 -1 -1 1 180 28 4 1 1 -1 180 32 4 1 1 1 240 30 4 1 -1 1 180 26 4 1 -1 -1 180 27 4 -1 1 -1 240 25 4 -1 -1 -1 240 31 4 -1 1 1 180 111 14 -1 1 1 . 107 14 -1 1 -1 . 110 14 1 -1 1 . 105 14 -1 -1 -1 . 112 14 1 1 1 . 109 14 -1 -1 1 . 108 14 1 1 -1 . 106 14 1 -1 -1 . 113 15 -1 -1 -1 . 118 15 1 -1 1 . 120 15 1 1 1 . 114 15 1 -1 -1 . 115 15 -1 1 -1 . 116 15 1 1 -1 . 119 15 -1 1 1 . 117 15 -1 -1 1 . _________________________________________________

Page 45: BSc Statistical Project

45

4.0 Experiment data for Medium Sudoku puzzles

DESIGN POINTS (Uncoded) _________________________________________________ RUN BLOCK C B A Y _________________________________________________ 61 8 -1 -1 1 962 57 8 -1 -1 -1 802 63 8 -1 1 1 637 58 8 1 -1 -1 651 59 8 -1 1 -1 991 62 8 1 -1 1 582 60 8 1 1 -1 583 64 8 1 1 1 1040 42 6 1 -1 -1 480 46 6 1 -1 1 660 48 6 1 1 1 582 44 6 1 1 -1 1045 47 6 -1 1 1 370 43 6 -1 1 -1 750 41 6 -1 -1 -1 490 45 6 -1 -1 1 495 75 10 -1 1 -1 1191 77 10 -1 -1 1 1161 73 10 -1 -1 -1 469 76 10 1 1 -1 1333 79 10 -1 1 1 928 74 10 1 -1 -1 984 80 10 1 1 1 715 78 10 1 -1 1 1269 3 1 -1 1 -1 277 5 1 -1 -1 1 373 4 1 1 1 -1 449 7 1 -1 1 1 544 1 1 -1 -1 -1 540 6 1 1 -1 1 500 8 1 1 1 1 608 2 1 1 -1 -1 427 37 5 -1 -1 1 600 40 5 1 1 1 540 36 5 1 1 -1 540 35 5 -1 1 -1 540 33 5 -1 -1 -1 840 34 5 1 -1 -1 600 38 5 1 -1 1 720 39 5 -1 1 1 900 10 2 1 -1 -1 . 11 2 -1 1 -1 . 14 2 1 -1 1 . 15 2 -1 1 1 . 9 2 -1 -1 -1 . 16 2 1 1 1 .

Page 46: BSc Statistical Project

46

12 2 1 1 -1 . 13 2 -1 -1 1 . 104 13 1 1 1 358 100 13 1 1 -1 267 99 13 -1 1 -1 269 101 13 -1 -1 1 297 98 13 1 -1 -1 420 103 13 -1 1 1 337 97 13 -1 -1 -1 361 102 13 1 -1 1 540 84 11 1 1 -1 . 83 11 -1 1 -1 . 86 11 1 -1 1 . 81 11 -1 -1 -1 . 88 11 1 1 1 . 85 11 -1 -1 1 . 82 11 1 -1 -1 . 87 11 -1 1 1 . 92 12 1 1 -1 420 90 12 1 -1 -1 540 91 12 -1 1 -1 1500 94 12 1 -1 1 600 93 12 -1 -1 1 780 95 12 -1 1 1 780 89 12 -1 -1 -1 840 96 12 1 1 1 780 125 16 -1 -1 1 . 121 16 -1 -1 -1 . 124 16 1 1 -1 . 128 16 1 1 1 . 127 16 -1 1 1 . 126 16 1 -1 1 . 122 16 1 -1 -1 . 123 16 -1 1 -1 . 22 3 1 -1 1 285 19 3 -1 1 -1 180 17 3 -1 -1 -1 435 20 3 1 1 -1 270 23 3 -1 1 1 225 21 3 -1 -1 1 270 24 3 1 1 1 570 18 3 1 -1 -1 315 67 9 -1 1 -1 590 71 9 -1 1 1 382 68 9 1 1 -1 874 70 9 1 -1 1 366 69 9 -1 -1 1 682 66 9 1 -1 -1 304 72 9 1 1 1 785 65 9 -1 -1 -1 404 54 7 1 -1 1 . 49 7 -1 -1 -1 . 52 7 1 1 -1 . 55 7 -1 1 1 . 51 7 -1 1 -1 .

Page 47: BSc Statistical Project

47

50 7 1 -1 -1 . 56 7 1 1 1 . 53 7 -1 -1 1 . 29 4 -1 -1 1 800 28 4 1 1 -1 720 32 4 1 1 1 360 30 4 1 -1 1 480 26 4 1 -1 -1 960 27 4 -1 1 -1 420 25 4 -1 -1 -1 780 31 4 -1 1 1 660 111 14 -1 1 1 . 107 14 -1 1 -1 . 110 14 1 -1 1 . 105 14 -1 -1 -1 . 112 14 1 1 1 . 109 14 -1 -1 1 . 108 14 1 1 -1 . 106 14 1 -1 -1 . 113 15 -1 -1 -1 735 118 15 1 -1 1 1275 120 15 1 1 1 720 114 15 1 -1 -1 645 115 15 -1 1 -1 1155 116 15 1 1 -1 1215 119 15 -1 1 1 855 117 15 -1 -1 1 705 _________________________________________________

Page 48: BSc Statistical Project

48

REFERENCES 1. Wei – Meng Lee, Programming Sudoku (Technology in Action), Apress, 2006. 2. www.wikipedia.com/sudokugames. 3. Jonathan Lutz, et al. Design of Engineering Experiments: Arizona State University. 2000. Pg 5 –

10. 4. Raj Jain, et al. Two factors; full factorial design without replication. Washington University, USA.

Pg 21 – 7.