UNIVERSITY OF KABIANGA SCHOOL OF SCIENCE AND TECHNOLOGY DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE THE STUDY ON THE FACTORS AFFECTING THE DIFFICULTY OF A SUDOKU PUZZLE A PROJECT REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF DEGREE OF BACHELOR OF SCIENCE IN APPLIED STATISTICS WITH COMPUTING OF UNIVERSITY OF KABIANGA BY OKOYO COLLINS OMONDI AST/0037/09 SUPERVISORS: MR. RUEBEN C. LANGA’T MR. TONUI B. APRIL 2013
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UNIVERSITY OF KABIANGA
SCHOOL OF SCIENCE AND TECHNOLOGY
DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE
THE STUDY ON THE FACTORS AFFECTING THE DIFFICULTY OF A SUDOKU PUZZLE
A PROJECT REPORT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF DEGREE OF BACHELOR OF SCIENCE IN
APPLIED STATISTICS WITH COMPUTING OF UNIVERSITY OF KABIANGA
BY
OKOYO COLLINS OMONDI
AST/0037/09
SUPERVISORS: MR. RUEBEN C. LANGA’T
MR. TONUI B.
APRIL 2013
2
DECLARATION I, Okoyo Collins Omondi, do hereby declare that this project report is my original work and has not been presented for an award of degree in any other university.
Sign:……………………………. Date:………………………………
This project has been submitted for examination with our approval as University supervisors.
Supervisors:
1. Mr. Rueben C. Langa’t Department of Mathematics and Computer Science University of Kabianga
Signature:…………………. Date:……………………..
2. Mr. Tonui B. Department of Mathematics and Computer Science University of Kabianga
Signature:…………………. Date:……………………..
3
DEDICATION This project is dedicated to all the Sudoku players and hobbyists.
I dedicate this work too to my beloved parents Charles Okoyo and Lucy Okoyo for their
unconditional material and financial support; my siblings Everlyne, Evans, Basil, Sheilah and
Oliver for their overwhelming social support.
4
ACRONYMS SAS: Statistical Application Software.
SPSS: Statistical Package for Social Sciences.
DOE: Design of Experiments.
5
DEFINATION OF TERMS
Box: A 3 by 3 grid inside the Sudoku puzzle. It works the same as rows and columns, meaning it must contain the digits 1 – 9.
Region: This refers to a row, column or box.
Candidate: An empty square in a Sudoku puzzle have a certain set of numbers that does not conflict with the row, column and box it is in. Those numbers are called candidates or candidate numbers.
Given: A given is defined as a number in the original Sudoku puzzle, meaning that a Sudoku puzzle has a certain number of clues which is then used to fill in new squares. The number filled in by the solver is, however, not regarded as a given.
Complete block: This is where each treatment appears in each block.
Response: The process output.
Factor: Uncontrolled or controlled variable whose influence is being studied.
Level: Setting of a factor (+, -, 1, -1, high, low, alpha, numeric).
Run: This is the treatment combinations; setting all factors to obtain a response.
Replicate: Number of times a treatment combination is run (usually randomized).
Repeat: Non – randomized replicate.
Inference space: Operating range of factors under study.
Design Expert: Software used to design experiments.
6
ABSTRACT This project demonstrates how to apply design of experiments; in particular, the full factorial design to gain insight into real, everyday statistical problems and situations. The design of this project ultimately results in an intuitive understanding of statistical procedures and strategies most often used by practicing statisticians and scientists.
Hence, it’s expected that the choice on the study of factors affecting the difficulty of Sudoku puzzle provides a real statistical problem in designing different experiments.
Figure 1: Sudoku Grid with Row, Column and Box Names ................................................................. 12 Figure 2: General view of Sudoku game environment ....................................................................... 13 Figure 3: Main Effects plot ................................................................................................................ 22 Figure 4: Interaction Plots ................................................................................................................. 23 Figure 9: Half – Normal Plot .............................................................................................................. 24 Figure 10: Pareto Plot ..................................................................................................................... 24 Figure 11: Residual Plots ................................................................................................................. 25 Figure 11: Main Effect Plots ............................................................................................................ 27 Figure 12: Interaction Plots ............................................................................................................. 28 `Figure 13: Half – Normal Plot .......................................................................................................... 29 Figure 14: Pareto Plot ..................................................................................................................... 29 Figure 15: Residual Plots ................................................................................................................. 30
9
LIST OF TABLES
Table 1: The amount ranges of givens in each difficult level ............................................................. 15 Table 2: Variation of the Number of Givens. .................................................................................... 15 Table 3: Variation of the Distribution of Givens. ............................................................................... 16 Table 4: Variation of the Redundant Numbers. ................................................................................ 17 Table 5: The design matrix for each experiment in coded values. ..................................................... 18 Table 6: The design matrix in statistics values for the Easy experiment. ........................................... 19 Table 7: The design matrix in statistics values for the Medium experiment. ..................................... 19 Table 8: Effects List .......................................................................................................................... 23 Table 9: ANOVA FOR EASY EXPERIMENT .......................................................................................... 25 Table 12: Fit Statistics for Y1 .............................................................................................................. 26 Table 13: Effects List .......................................................................................................................... 28 Table 14: ANOVA FOR MEDIUM EXPERIMENT .................................................................................... 30 Table 15: Fit Statistics for Y1 .............................................................................................................. 31
10
CHAPTER ONE
1.0 BACKGROUND In recent years Sudoku puzzles have become an increasingly popular pass time. Sudoku’s simple
set of rules and multiple levels of puzzle difficulty attract hobbyists with varying skills and
experience. Typically, Sudoku puzzles are categorized by level of difficulty i.e. Easy, Medium,
Hard, Killer, etc. Yet, it is not uncommon for the perceived difficulty of puzzles to vary greatly,
even within a single difficulty rating.
It is the goal of this project is to determine whether additional factors, other than the published
difficulty rating, have an effect on the difficulty of a Sudoku puzzle. In an attempt to keep the
experimental results practical for the casual Sudoku hobbyist, factors that can be easily estimated
by visual inspection of the puzzle were chosen. The factors are:
Number of givens – The number of initial givens provided in the puzzle.
Distribution of givens – The relative placement of initial givens in the puzzle.
Redundant numbers – The repetition of specific numbers in a puzzle’s set of initial
givens.
The interest is the effects of these factors on puzzles of a single published difficulty level. In
preparation for this project a full 23 factorial experiment was performed on a set of eight “Easy”
puzzles. The experiment was then repeated on eight “Medium” puzzles.
The results of the two experiments were used to show which of the above mentioned factors have
the largest effect on the puzzle difficulty. Several other effects were also determined and their
significance, further, a general model equation that approximates, as accurately as possible, the
time expected to be taken by a Sudoku hobbyist in solving the puzzle by taking into account the
above factors and holding other factors constant was as well determined.
The rest of the project report is organized as follows: Chapter two provides a brief description of
the objective of our two experiments and gives a quick background to the problem. Chapter 3
describes the experimental design used for the two experiments and about how the experiment
was run. Chapter 4 provides results and an analysis of the data. Conclusion and Recommendation
is given in Chapter 5.
11
1.1 PROBLEM STATEMENT There are multiple factors that normally affect the difficulty of a Sudoku puzzle apart from the
published difficulty rating, that is, extremely easy, easy, medium, difficult, evil, etc., this project
will be limited to studying only three such factors:
1. Number of the initial givens.
2. The distribution of the givens.
3. The redundancy of the individual givens.
Hence, it’s expected that the knowledge of the effects of these factors would reduce the time
taken by Sudoku hobbyist in playing the game, subsequently, making the game enjoyable and
interesting.
1.2 STUDY PURPOSE Sudoku is today a popular game throughout the world and it appears in multiple Medias,
including websites, newspapers and books. As a result, it is of interest to find the factors
affecting the difficulty of a Sudoku puzzle besides the published difficulty rating of extremely
easy, easy, medium, difficult, evil etc.
Moreover, another goal of this study is therefore, to contribute to the knowledge and
comprehension of the Latin square designs, factorial designs and the design of experiments in
general, as the analysis of the factors affecting the difficulty of Sudoku puzzles employs the use
of these designs.
1.3 STUDY OBJECTIVES The broad objective of this experiment is to quantify the effects of a Sudoku puzzle’s initial
structure and set of givens on the expected time required to complete the puzzle.
In particular, the experiment will seek to fulfill the following;
1. What are the effects of the number of givens, distribution of the givens and the
redundancy of the specific givens on the difficulty of a Sudoku puzzle?
2. Are the effects of each factor consistent across levels of each factor?
12
CHAPTER TWO
2.0 LITERATURE REVIEW
2.1 INTRODUCTION Sudoku puzzle, as a widely popular intellectual game in recent years, was invented in Swiss in
18th century. Then, it initially harvested well development in Japan in the past decades. The
name Sudoku actually derives from Japanese that means “number place” [1]
. Due to its simple
and friendly rules for beginners and the charm from intellectual challenge, Sudoku becomes
welcome recently for players of various ages. You are even able to solve a Sudoku puzzle easily
without any mathematical knowledge.
A Sudoku puzzle consists of a table with nine rows and nine columns. The squares (i.e. cells in
the table) are grouped in sets of nine which we will call boxes. For clarity we will call the rows
r1, r2, … r9, the columns c1, c2, … c9, and the boxes b1, b2, … b9. Figure 1 provides a diagram
showing a sample Sudoku grid with row, column, and box names. The squares are named sij
where i is the row number and j is the column number.
c1 c2 c3 c4 c5 c6 c7 c8 c9
r1 b1 b2 b3
r2 1
r3
r4 b4 5 b5 2 b6
r5 2 9
r6 6
r7 b7 b8 b9
r8
r9
Figure 1: Sudoku Grid with Row, Column and Box Names
Source:
www.onlinegames.com/sudokugame.
13
2.2 HOW TO PLAY How is the Sudoku game played? “You only need to know where you play the game and what
your goal is. The simple aspects that help you join the game are specified as follows”[2]:
Game Environment: you may first get a general overview of this game board as shown below;
Figure 2: General view of Sudoku game environment Several basic components of the board are defined as Figure 1.1 illustrates. The whole board is
actually a 9-by-9 grid made of nine smaller 3-by-3 grids called blocks. The smallest unit square
is called a cell which has two types of states: empty, and confirmed by a digit from 1 through 9.
We mark the whole grid with rows and columns from top-left corner.
Goal of the Game: generally, Sudoku game is started with such a situation in grid that some of
the cells have already been confirmed by digits known as givens. The task for Sudoku players is
to place a digit from 1 to 9 into each cell of the grid, and meanwhile each digits can only be used
exactly once in each row, each column and each block. Additionally, all the nine rows, nine
columns and nine blocks are respectively ensured to contain all the digits from 1 through 9.
14
These limitations for placing digits in three locations are respectively called row constraint,
column constraint and block constraint.
Based on the rules that we mentioned above, Sudoku players are commonly inspired to complete
the placement of digits into all empty cells using various techniques as soon as possible.
2.3 OPERATIONALIZATION OF VARIABLES
2.3.1 Response Variable The time required to complete a puzzle was the single response variable for this experiment. The
typical level for this variable can range from a few minutes to over half of an hour. The response
will be measured in terms of minutes. There is no practical limit on the range at which this
response can be measured. The project tried to as well develop and approximate the total time
expected to be taken averagely by any Sudoku hobbyist to complete filling the empty cells
correctly.
The following formula was used to approximate the response variable;
Ŷijk = μ + αi + βj + γk + εijk
Where; Ŷijk: Observation with factor A at level i, B at level j and C at level k.
μ: Mean response.
αi: Effect of factor A at level i.
βj: Effect of factor B at level j.
γk: Effect of factor C at level k.
εijk: Error term.
15
2.3.2 Control Variables (i) Number of givens
As the first factor affecting the level estimation, the total amount of given cells in an initial
Sudoku puzzle can significantly eliminate potential choices of digits in each cell by the three
constraints in the game rules. In general, it is reasonable to argue that the more empty cells
provided at the start of a Sudoku game, the higher level the puzzle is graded in. We moderately
scale the amount ranges of givens for each difficult level as shown below;
Table 1: The amount ranges of givens in each difficult level Level Givens Amount Scores
1 (Extremely easy) more than 50 1
2 (Easy) 36-49 2
3 (Medium) 32-35 3
4 (Difficult) 28-31 4
5 (Evil) 22-27 5
Possibly the most obvious measurement of a puzzle’s difficulty is the number of givens provided
in the initial grid. Typical values range from 50 givens for easier puzzles to 20 givens for highly
difficult puzzles. For the two experiments, the control variable was varied as follows:
Table 2: Variation of the Number of Givens.
It was expected that the difficulty of a puzzle would increase as the number of initial givens
decreases.
(ii) Distribution of givens
Another possible variable is the distribution of the initial givens in the grid. One can imagine that
a puzzle with all, or most, of the givens crowded in one section of the grid may be different in
Easy Experiment Medium Experiment
Min = 36 Min = 32
Max = 49 Max = 35
16
difficulty than a puzzle with the givens spread evenly around the grid. Formally, this geometric
property can be viewed as the variance of row, column, and box densities.
The density of a row, column or box is defined to be the number of givens provided in that row,
column or box. For example, row r4 in Figure 1 has a density of two, Column c3 has a density of
one, and box b5 has a density of three. The mean density is then
27
9
1
9
1
9
1
i
ii
ii
i
desnsity
brc
and the variance is
27
9
1
29
1
29
1
2
2
i
densityii
densityii
densityi
desnsity
brc .
The normal range for 2density in the sample set that was conducted was found to be between 0 and
3.333. Hence, for these two experiments, 2density was varied as follows:
Table 3: Variation of the Distribution of Givens.
It is expected that the difficulty of a Sudoku puzzle would decrease as 2density increases.
Easy Experiment Medium Experiment
1.19 <= Min <= 1.70 0.59 <= Min <= 0.96
2.15 <= Max <= 2.67 1.26 <= Max <= 2.81
17
(iii) Redundant Numbers in the Initial Grid
This variable measures the variance, 2deg ree , in the number of times a specific number is repeated
in the initial Sudoku grid. Here, the degree of a number, )deg(i , is defined to be the number of
times the number i appears in the initial grid. So the mean degree is
9
)deg(9
1deg
i
ree
i
And the variance of the degree is
9
)deg(9
1
2deg
2deg
i
ree
ree
i .
The normal range for 2deg ree in the sample set was found to be between 0.5432 and 3.7778. For
these two experiments, 2deg ree was varied as follows.
Table 4: Variation of the Redundant Numbers.
It was less expected that the degree 2deg ree would be a significant factor in determining the
difficulty of the Sudoku puzzle.
Easy Experiment Medium Experiment
0.89 <= Min <= 1.33 0.89 <= Min <= 1.56
2.89 <= Max <= 3.56 2.00 <= Max <= 3.11
18
CHAPTER THREE
3.0 EXPERIMENTAL DESIGN These experiments were run using two 23 full factorial designs. The original desire was to
perform a single 24 full factorial design with the additional factor being the published rating for
the puzzle. Unfortunately, a strong correlation between the published rating and the typical
values for our other factors was found. The correlation was so strong that Easy and Medium
puzzles with matching high and low values for the other factors could not be found. After
measuring the typical values for the factors on over 120 puzzles, it was suitable to run two
individual experiments. The first was run on a sample of puzzles with an Easy difficulty rating.
The second was run on a sample of puzzles with a Medium difficulty rating.
Table 5: The design matrix for each experiment in coded values.
Number of
givens
(Factor C)
Distribution
of givens
(Factor B)
Redundant
Numbers
(Factor A)
(1) -1 -1 -1
a -1 -1 1
b -1 1 -1
ab -1 1 1
c 1 -1 -1
ac 1 -1 1
bc 1 1 -1
abc 1 1 1
19
Table 6: The design matrix in statistics values for the Easy experiment.
Number of
givens
(Factor C)
Distribution of
givens
(Factor B)
Redundant
Numbers
(Factor A)
(1) 36 1.56 0.89
a 36 1.70 3.11
b 36 2.67 0.89
ab 36 2.15 3.56
c 49 1.19 1.33
ac 49 1.26 2.89
bc 49 2.15 1.11
abc 49 2.15 3.33
Table 7: The design matrix in statistics values for the Medium experiment.
Number of
givens
(Factor C)
Distribution of
givens
(Factor B)
Redundant
Numbers
(Factor A)
(1) 32 0.59 1.11
a 32 0.59 2.00
b 32 1.56 0.89
ab 32 1.26 2.22
c 35 0.89 1.33
ac 35 0.96 3.11
bc 35 2.81 1.56
abc 35 2.81 2.67
20
3.1 PERFORMING THE EXPERIMENT The two experiments were replicated 16 times each. The replicates were then blocked and each
block was run by a single individual. The random run order for each block was determined by
SAS Software and the result attached in the appendix of the report.
The experiments were started by creating a Perl script to calculate all the control variables
outlined in section 3.0 above. Eight different volumes of Sudoku puzzle books (in this case,
Sudoku appearing in the daily Standard and Nation Newspapers were used) from the same author
were purchased over the internet and proceed to transfer the puzzles from the books to the Perl
script. From the Perl script output, the high and low values for each of the variables being
analyzed were created. Once the high and low values were set, eight of the Easy rating puzzles
and eight of the Medium rating puzzles that fit well with the high and low values set for each
variable were identified.
Once the puzzles were chosen, the experiment designs were then set up in Design Expert
software. Then copies of the puzzles made and stapled in their individual blocks based on the run
order provided by the software.
Each block were then distributed to one of the 16 willing participants and were asked to perform
the puzzles in the order in which they were stapled. The participants were required to provide
the start time, finish time, and delta time for each of the puzzles.
Once the completed puzzles were returned, the delta times (response variable) were then added
to the run order table in the SAS Software and proceed to analyze the data.
3.2 STATISTICAL ANALYSIS The data from the two experiments were then analyzed individually in two subsections. SAS
Software and Design Expert were comparatively used to aid in the analysis of the data.
21
CHAPTER FOUR
4.0 RESULTS AND ANALYSIS OF DATA
4.1 INTRODUCTION In this chapter, the experiment results are shown as given by the SAS software and Design Expert analysis tools, the results are thereof explained.
The data that was used for analysis for the two experiments; easy and medium experiments, are found in the appendices 3.0 and 4.0 respectively
The analysis are divided into two parts, with the first part being the analysis of easy experiment while the other deals with the analysis of medium experiment.
Out of the 16 individuals tasked with solving the Sudoku puzzles with easy difficulty rating, only 13 of them returned the puzzles in time to be analyzed in this report. This represented a return rate of 81%. Fortunately, since the design was blocked on replicates this therefore, had little to no effect on the analysis of the results in this report.
However, for the Sudoku puzzles with medium difficulty rating, only 11 out of 16 individuals returned their puzzles in time for analysis. This translates to 69% puzzles return rate.
4.2 SUMMARY OF THE EXPERIMENTS
DESIGN DETAILS Design type: Two-level Design description: Repeated Number of factors: 3 Number of runs: 128 Resolution: Full Number of blocks: 16 FACTORS Factors and Levels: __________________________________________________ Factor Label Low Center High __________________________________________________ C Number of Givens -1 0 1 B Distribution of Givens -1 0 1 A Redundancy of Givens -1 0 1 __________________________________________________
22
RESPONSE _________________________ Response Label Units _________________________ Y1 Time seconds _________________________ BLOCK INFORMATION Block name: BLOCK Block label: INDIVIDUAL Number of blocks: 16
4.3 ANALYSIS OF SUDOKU PUZZLES WITH EASY DIFFICULTY RATING
4.3.1 Factors Effects The analysis was started by looking at the effects of various factors as presented by the SAS software.
Due to the interest in maintaining a hierarchical model, the model was screened by omitting only the ABC variable. Therefore, in the whole analysis the interaction factor ABC was omitted.
Main Factors and Interaction Plots
Figure 3: Main Effects plot
The main factors plot above shows that, for factor C; the mean of all response values for which C=-1 is 600, while that for which C=+1 is 500. The interpretation is similar for factor A, while that for factor B shows that both mean response values for B=-1 and B=+1 were the same.
23
Figure 4: Interaction Plots
From the interaction plots above, there is apparent interaction between factors B and C, also between B and A, thus, suggesting a significant effect (because the lines are crossed). This implies that as such it would be difficult to determine whether one or both of these effects are significant, even though their interaction plot strongly suggests a significant effect.
To determine which factor(s) or interaction(s) has a significant effect we explore other ways like use of effect list, half – normal plots and others.
Table 8: Effects List
From the table 8 above, factors C, A and BA were significant, while factors B, CB and CA were not significant since they had a p – value greater than 0.05.
Factors C and A are exhibiting negative effects while the others are contributing positive effects.
24
Figure 9: Half – Normal Plot
The half – normal plot was then run to analyze the same information using a more visual tool.
The plot identified variable C as the most significant factor. The other factors seem to fall on or near the insignificant line as shown in the figure 1 above.
Figure 10: Pareto Plot
To give the % effect contribution of each factor, a more appropriate plot (Pareto plot) was used as shown in figure 10 above. Once again, factor C has the highest contribution while factor B has the least contribution.
25
4.3.2 The Analysis of Variance (ANOVA)
Table 9: ANOVA FOR EASY EXPERIMENT ANOVA for Y1
Master Model
Source DF SS MS F Pr > F
C 1 457788.5 457788.5 11.18521 0.001228 B 1 2803.846 2803.846 0.068507 0.794157
Model 18 8966008 498111.5 12.17043 0.0001 Error 85 3478881 40928.01
Total 103 12444888
When the ANOVA was run using the full model to identify significant variables and utilizing the p – values to identify variables meeting α = 0.05 requirement, variables C, A and interaction BA were found to be significant. The above table shows the details.
Figure 11: Residual Plots Normal Plot of Residuals Residual Vs Predicted
26
The next step was to analyze the model adequacy via the residual plots as shown in the figure above. The plots identified a possible model adequacy in normality and equal variance assumptions.
The model produced the best results, with the normal probability plot passing the “fat pencil” test and the residual vs. predicted showing no patterns.
Since there was potential model adequacy, it was needless to transform the data to find a more accurate model.
In summary, when considering Easy Level Sudoku puzzles, the following factors are significant.
C – Number of Givens (negative) A – Repetition of Numbers (negative) BA – interaction effect (positive)
Based on the percentage contribution as projected by Pareto plot, it is clear that the Number of Givens is far and away the most significant factor.
Table 12: Fit Statistics for Y1 ____________________________________________ Master Model Predictive Model ____________________________________________ Mean 541.7308 541.7308 R-square 72.05% 68.88% Adj. R-square 66.13% 63.99% RMSE 202.3067 208.5942 CV 37.34451 38.50514
27
Response(s): ___________________________________ Response Est. Value ___________________________________ Y1 300 [147.6909, 452.3091] ___________________________________
From the SAS output above, the regression equation above shows that it would take any person filling any easy rating Sudoku puzzle would approximately take 300 seconds to accurately complete the puzzle with a confidence interval of [147.6909, 452.3091] seconds at α = 0.05 level of significance.
4.4 ANALYSIS OF SUDOKU PUZZLES WITH MEDIUM DIFFICULTY RATING
4.4.1 Factors Effects The analysis was begun by looking at the effects of various factors as presented by the SAS software.
Due to the interest in maintaining a hierarchical model, the model was screened by omitting only the ABC variable. Therefore, in the whole analysis the interaction factor ABC was omitted.
Main Factors and Interaction Plots
Figure 11: Main Effect Plots
The main factors plot above shows that, for factor C; both mean response values for C=-1 and C=+1 were the same. The interpretation is similar for factor A, while that for factor B shows that the mean of all response values for which B=-1 is 625, while that for which B=+1 is 675.
28
Figure 12: Interaction Plots
From the plots, there is a possible significant effect between factors A and C, C and B, A and B, C and A also between B and A.
Thus, suggesting a significant effect (because the lines are crossed). This implies that as such it would be difficult to determine whether one or both of these effects are significant, even though their interaction plot strongly suggests a significant effect.
To determine which factor(s) or interaction(s) has a significant effect we explore other ways like use of effect list, half – normal plots and others.
Table 13: Effects List
From the effects list generated above, all the main factors and their interactions seem to be insignificant since, they all have p-values greater than 0.05, when α=0.05 was taken as the level of significance.
29
To confirm the results of the effects list, half – normal plot was used as indicated below.
`Figure 13: Half – Normal Plot
Using a more visual tool (the half – normal plot), it was further confirmed that indeed no factor is significant since some seem to fall on the Lenth’s PSE line while the remaining are at the space between Lenth’s PSE line and RMSE line.
Figure 14: Pareto Plot
`
To give the % effect contribution of each factor, a more appropriate plot (Pareto plot) was used as shown in figure 2 above. Interaction B*A has the highest contribution while factor C has the least contribution, even though none of these factors are significant.
30
4.4.2 The Analysis of Variance (ANOVA)
Table 14: ANOVA FOR MEDIUM EXPERIMENT ANOVA for Y1
Master Model Source DF SS MS F Pr > F C 1 63.92045 63.92045 0.001401 0.970243 B 1 38097.28 38097.28 0.835226 0.363859 A 1 3108.284 3108.284 0.068144 0.794814 C*B 1 2967.284 2967.284 0.065053 0.799418 C*A 1 13975.92 13975.92 0.306401 0.581636 B*A 1 122478.3 122478.3 2.685152 0.105712 Model 16 3959457 247466.1 5.425321 0.0001 Error 71 3238535 45613.17 Total 87 7197992
When the ANOVA was run using the full model to identify significant variables and utilizing the p – values to identify variables meeting the α = 0.05 requirement, it was asserted that none of the variables had p-value less than 0.05, confirming that no factor was found to be significant. The above table shows the details.
Figure 15: Residual Plots
The next step was to analyze the model adequacy via the residual plots as shown in the figure above. The plots identified a possible model adequacy in normality and equal variance assumptions.
31
The model produced the best results, with the normal probability plot passing the “fat pencil” test and the residual vs. predicted showing no patterns.
Since there was potential model adequacy, it was needless to transform the data to find a more accurate model.
In summary, it was clearly found that none of the hypothesized factors i.e. Number of givens, Distribution of the givens and Redundancy of the givens, and/or their factor interactions seemed not to have any significant influence on the difficulty of medium rating Sudoku puzzles.
4.4.3 Model Prediction
Table 15: Fit Statistics for Y1 ____________________________________________ Master Model Predictive Model ____________________________________________ Mean 644.0795 644.0795 R-square 55.01% 55.01% Adj. R-square 44.87% 44.87% RMSE 213.5724 213.5724 CV 33.15932 33.15932 ____________________________________________
Response(s): ______________________________________ Response Est. Value ______________________________________ Y1 464.75 [314.1888, 615.3112] ______________________________________ From the SAS output above, the estimation table above shows that it would take any person filling any medium rating Sudoku puzzle approximately would take 464.75 seconds to accurately complete the puzzle with a confidence interval of [314.1888,615.3112] seconds at α = 0.05 level of significance.
32
CHAPTER FIVE
5.0 CONCLUSION AND RECOMMENDATIONS
5.1 CONCLUSION Based on the two experiments discussed in this study, it is clear that the number of givens (i.e.
factor C) had the greatest effect on the difficulty of an Easy rating Sudoku puzzle. This factor
also seemed to have some minimal effect on the Medium rating Sudoku puzzle; it failed to reach
the significant level as projected by p-value in the ANOVA table 14.
In addition, the Redundancy (repetition) of the givens (i.e. factor A) also had some significant
effect on the difficulty of an easy rating Sudoku puzzle but this factor failed as well to reach the
significant level of influence in the medium rating Sudoku puzzle.
Furthermore, the interaction factor BA as well exhibited significant influence on the easy rating
process.
Nonetheless, in the medium it was clearly found that none of the hypothesized factors i.e.
Number of givens, Distribution of the givens and Redundancy of the givens, and/or their factor
interactions seemed not to have any significant influence on the difficulty of medium rating
Sudoku puzzles.
The Adjusted R2 values reported for both the Easy and Medium levels of experiment were both
unusually small, they were 0.6613 (66.13%) and 0.4487 (44.87%) for the easy and medium
rating experiments respectively. Normally, this may be a cause for concern. In these particular
experiments, the ANOVA for both experiments shows that the block effects were largely
significant. When the block effects are that significant, it become pretty difficult and tricky to
pick a single model that accurately describes the general behavior of the response.
Since the Adjusted R2 values for these experiments are small, the regression model is not
expected to accurately predict the completion time for any puzzle. Though, according to these
experiments the roughly projected times for completion are 300 seconds and 464.75 seconds for
easy and medium experiments respectively.
33
5.2 RECOMMENDATIONS After the successful completion of the experimentation, it is in the interest of the investigator to
make the following recommendations;
1. Due to the inaccurate prediction of the completion time, further experimentation is
required to accurately model the completion time for a specific person and specific
Sudoku puzzle.
2. During similar studies, participants (blocks) should be properly trained first to reduce the
significance of their effects.
3. Since, in this study, the investigator found no significant effect of the hypothesized
factors in the medium experiment, different factors and/or careful analysis of these
factors should be considered in future studies.
34
APPENDICES
1.0 Data Collection Tool for Easy Rating Sudoku Puzzles Kindly complete the below Sudoku and provide the start time, end time and the delta time (i.e. the difference between the start and end times).
Provide the delta time in seconds.
1.
2.
9 8 6 7 1 5
6 8 2 3 4
7 5 1 4 9 2
8 2 4 9
2 4 9 5 3 8 1
5 9 3 8 6
9 1 7 3 4
7 8 2 1 5
4 3 8 1 6 7
9 7 5 3
1 2 4 8
3 1 8 6
5 6 2
2 3 4 9 1 5
3 6 7
9 2 7 1
6 4 8 9
5 1 4 2
Start time:……………………..
End time:……………………….
Delta time:…………………….
Start time:……………………..
End time:……………………….
Delta time:…………………….
35
3.
8 2 6 5 3
5 9 2 4
1 7 4 9 5
9 5 2 1 3 4
8 7 9
3 8 9 6 1
6 3 9 1 7 5
1 8 4 6
7 5 8 2
4.
6 3 4 9 1
5 6 2 8
8 1 7 6
4 1 2 6 7
9 7 5 4
3 8 7
4 3 1 9
1 6 5 4
7 2 8 3 5
Start time:……………………..
End time:……………………….
Delta time:…………………….
Start time:……………………..
End time:……………………….
Delta time:…………………….
36
5.
8 3
9 1 2 7 4
2 8 6 7 9
5 9 8 2
8 6 7 4 3
2 3 9 8 4
4 1 3 5 7
7 8 9 5 1
9 7 8 6
6.
9 2 5 7 8 6
5 7 4 1 3 2
8 3 2 9 6
2 1 5 9 4 6
4 1 6 3 2
6 8 2 1 3
3 7 9 5 4
2 4 1 6 7
4 7 5 6 1 9
Start time:……………………..
End time:……………………….
Delta time:…………………….
Start time:……………………..
End time:……………………….
Delta time:…………………….
37
7.
4 9 7 6
1 7 4 8 5
8 9 5 4 7
4 9 7 8 2 5
2 5 3 4 8 6
8 6 1 4 7
5 6 4 2 3
4 6 5 7 9
2 1 3 7 5 4
8.
4 7 5 3 1
7 6 1 9
1 2 8 3
1 5 7 4 3
9 8 1
5 2 1 6 7
8 4 3
3 7 2 1 5
7 9 6 1 2
Start time:……………………..
End time:……………………….
Delta time:…………………….
Start time:……………………..
End time:……………………….
Delta time:…………………….
38
2.0 Data Collection Tool for Medium Rating Sudoku Puzzles Kindly complete the below Sudoku and provide the start time, end time and the delta time (i.e. the difference between the start and end times).