Rosangela Briscese Pasquale J. Festa Jennifer Noble Sarah Ticer INF384C – Organizing and Providing Access to Information Prof. Efron 3.29.2007 Crossword Puzzle Classification Final Model Method and Motivation As repeatability among different categories of users was of integral importance in creating the classification model for crossword puzzles, our group was motivated to create a classification scheme that had as little ambiguity in terms of goals, directions and user knowledge about it as possible. To form our classification system we started by collecting as much data about the sample puzzles given to us as possible. After a number of puzzle features were examined (all features considered may be found at the end of this essay), features that stood out as having statistically significant variances between groups of days (ex. Monday, Tuesday, Wednesday, and Thursday or Friday and Saturday) or from one day to the next (i.e. Monday or Tuesday) were given precedence as possible suspects to be examined as prime classification features (all features considered important to the design of our model are highlighted in the included data set). After witling down the number of features we were going to examine, we then chose to create a model that would act as a filtering system for classifying a puzzle as a specific day by running a puzzle through a series of simple tests that could be performed by the utilization of basic arithmetic and adherence to clear and simple directions. In presenting our model we chose to utilize a flowchart format to appeal to both verbal and visual learners.
Key Puzzle Features Total Number of Black Squares The number of black squares in a puzzle grid was found to play a significant role in classification and was utilized to divide the week up into two sectors: "Monday, Tuesday, Wednesday or Thursday" and "Friday or Saturday". This feature was chosen as our first criteria for puzzle classification as it was found that puzzles with 34 or less black squares tended to be "Friday or Saturday" puzzles while puzzles with more than 34 black squares were, for the majority, "Monday, Tuesday, Wednesday, or Thursday" puzzles. By opting to split the entire group into two smaller groups it was now easier for us to start working towards finding a means of definitively classifying puzzles as specific days of the week. Criteria 1:
If total # of black squares ≤ 34, predict "Friday or Saturday" If total # of black squares > 34, predict "Monday, Tuesday, Wednesday or Thursday"
Total Number of Answers longer than 5 Letters in Length As our first test split the week into sets of 4 ("Monday, Tuesday, Wednesday or Thursday") and 2 ("Friday or Saturday") we chose to focus on a feature that would separate "Friday or Saturday" puzzles from one another so that two definitive classes would be filed away. In analyzing our data it was found that there was a significant split between "Friday" and "Saturday" puzzles in regards to the total number of puzzle