Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D. March 6, 2009
Dec 24, 2015
Crossword Puzzle Construction
Given:– Dictionary of valid words
and phrases– Empty crossword grid
Problem:– Fill the crossword grid such
that all words both acrossand down are valid
– Assign clues
Crossword Puzzle Construction
Depth-First Search (DFS)– Fill in words until a solution is found
or a dead-end is encountered– Backtrack from dead-ends
– Questions: Where do we start? What word do we fill in next? What backtracking strategies do we use? How do we avoid repetition (boring puzzles)?
Crossword Puzzle Construction
Optimize the DFS:– Add longer (most constrained) words first– Associate weights with words in dictionary
based on frequency of letters Friendly crossword puzzle words
include letters: S, R, E, T, D, A, I, L Unfriendly crossword puzzle words
include letters: J, Q, X, Z, F, V, W e.g. quiz, fix, jazz, quaff, xylophone, wax
1 01 0X1i
Generation i
0 01 0X2i
0 00 1X3i
1 11 0X4i
0 11 1X5i f = 56
1 00 1X6i f = 54
f = 36
f = 44
f = 14
f = 14
1 00 0X1i+1
Generation (i + 1)
0 01 1X2i+1
1 10 1X3i+1
0 01 0X4i+1
0 11 0X5i+1 f = 54
0 11 1X6i+1 f = 56
f = 56
f = 50
f = 44
f = 44
Crossover
X6i 1 00 0 01 0 X2i
0 01 0X2i 0 11 1 X5i
0X1i 0 11 1 X5i1 01 0
0 10 0
11 101 0
Mutation
0 11 1X5'i 01 0
X6'i 1 00
0 01 0X2'i 0 1
0 0
0 1 111X5i
1 1 1 X1"i1 1
X2"i0 1 0
0X1'i 1 1 1
0 1 0X2i
Crossword Puzzle Construction
Genetic Algorithm (GA)– Evolve a solution by crossovers and
mutations through many generations– Initial population of crossword grids:
Random letters? Random letters based on Scrabble® frequencies? Random words from dictionary?
– Fitness of each grid is number of valid words
Solving Crossword Puzzles
Given:– Crossword grid – Clues
Problem:– Fill the grid such
that all words correctly answerthe given clues
Solving Crossword Puzzles
Obtain candidate answers for each clue– Assign a confidence value to each candidate– Are we guaranteed to have the correct
answer?
Place candidate answers in grid until a solutionis found or a dead-end occurs– Which backtracking strategies
should we use?
Solving Crossword Puzzles
PROVERB — Duke University, 1999– Modules provide candidate answers
from dictionaries, encyclopedias,movie databases, etc.
– Module sources a Crossword Puzzle Database ofexactly 5142 previously solved puzzles
Pivotal in PROVERB’s success
– Another module generates all combinationsof letters (ouch!)
Solving Crossword Puzzles
GCV solved 13x13 puzzle with 68 clues– Many clues are fill-in-the-blank
or pop-culture clues– Candidate answers
obtained from Googleresults page (top 50)
– Solved using 559 Google queries– Queries yielded 68 correct answers
44 correct answers had highest confidence
Clue Preprocessing
Categorize clues based on text and type of clues:– Fill-in-the-blank clues– Synonyms/Antonyms– “Type of” (or “Kind of”) clues– Abbreviations– Clues with “and” or “or”– Singular or plural– Number of words in answer
Clue Preprocessing
Translate clues to Google-friendly forms– “To ___ is human”
“To * is human” “To * * is human”
– “Mary ___ little lamb” (2 words) “Mary * * little lamb”
– “___ to Joy” by Beethoven “* to Joy” by Beethoven “* * to Joy” by Beethoven
Clue Preprocessing
Translate clues to Google-friendly forms– Diplomacy
synonyms of Diplomacy
– Not dry opposite of dry antonyms of dry
– Joy synonyms of Joy
Clue Preprocessing
Translate clues to Google-friendly forms– Type of dancing [or Kind of dancing]
* dancing
– Second sight (abbr.) Second sight abbreviations of Second sight
– Superman’s admirer admirer of Superman
Clue Preprocessing
Translate clues to Google-friendly forms– Couldn’t move
Could not move Could opposite of move Could antonyms of move
– Knight or Danson Knight Danson
Clue Preprocessing
Translate clues to Google-friendly forms– Bosley and Arnold
Bosley Arnold Append an ‘s’
– Henson, and others [or Henson, and namesakes]
Henson Append an ‘s’
Results of Google-Querying
GCV excels at solving fill-in-the-blank and pop-culture clues– Why?
Though results are encouraging,using keyword-based searchingis limited– Why?
Populating the Crossword Grid
Use a Depth-First Search (DFS) algorithm:– Fill in the crossword grid based on confidence
values of candidate words– At each iteration:
Select candidate word with highest confidence valueamongst clues not yet placed
Attempt to fit candidate word into grid
– Halt when a solution is found or a dead-end occurs
Populating the Crossword Grid
When a dead-end occurs, what do we do?– Backtrack: Remove last word placed in grid
Disadvantages?
– Backjump: Identify culprit and remove all wordsback to culprit word
Disadvantages?
Populating the Crossword Grid
When a dead-end occurs, what do we do?– Extricating Backjump: Identify and remove
the culprit Disadvantages?
– How do we identifythe culprit?
Extricating Backjumping
Assign weights to the squares of the grid– Square weights correspond to confidence
valuesof candidate words placed
– e.g. Place TWAIN withconfidence value of 10at 5-Across
Extricating Backjumping
Define grid weight of a word as the sum of each individual square weight
– e.g. TWAIN = 100, NOW = 72
Limitations of Keyword-Based Search
Google and GCV use keyword-based tricksto artificially improve result sets– Word frequency & proximity to other words– Additional keywords to help direct queries to
good candidate answers e.g. synonyms of
– Grammatical and structural rearrangements
Lack of precision in keyword-based search– Irrelevant results in candidate answer lists– Confidence values based on word
frequencyproduces many false positives
– Correct answer is often buried in other mediocre(and incorrect!) candidates
Limitations of Keyword-Based Search