02/26/14 - the database server is currently DOWN and I am looking in to it

Technology

The software on this website is all automated, and follows a series of relatively simple algorithms to attempt to solve each week's mystery picture. To do so, we use a range of technology, including:

Mystery Pic Solver

Web Frontend

PHP 5.3, MySQL, CSS/JS/HTML
Yii

How it works

Each week Neopets release a small portion of an image that is blown up by roughly 1700%. In computer science, one of the most natural methods for solving a problem is to first think what steps a human would take to arrive at the solution, and then automate it. Taking that basic idea, a human would generally:

Navigate to the mystery pic page on Neopets.com and take a look at the mystery picture.
Noting it's basic colour scheme, try to think what images may constitute a match.
Use some sort of software (i.e. Paint, Photoshop) or even by hand, analyse the two images and see if they match.
If they don't match, try again.

That's essentially what our bot does. However, to speed the process up, we've made some general optimisations.

Image crawler

Rather than crawl Neopets.com every time a new mystery picture is released, which I'm sure neither they nor we would appreciate, we instead do it gradually. By adapting Scrapy, an open source web crawler built in Python, we can crawl Neopets.com and build up our own database of images. This essentially allows us to query our own database, which is a lot more effective and efficient, and also allows us to extract information from the images which can speed up our search. These images are stored in MySQL, an open source database solution, which allows us to easily query the data.

For each image in our database, we collect a few additional pieces of information, including general colour scheme, and the image's priority. The priority is determined by what section of the website we sourced the image from. We use this field as a threshold in our automated search (i.e. we only look for images above a certain priority). If you hear someone saying we ran a deeper, manual search, it means we cross referenced all images irregardless of the priority field.

Image analysis

Once we've downloaded the mystery picture, we need to cross reference it against our database. To do this, we primarily use Python. As it stands, we have a few different algorithms for discovering the solution.

First solution

First and foremost, we narrow down our imageset based on the colour scheme of the mystery picture. There's no point us checking images we know won't be considered for our solution. This simple process effectively allows us to narrow down the images to search to around ~10% of the original sample size. Once we have narrowed the images down, we then proceed to look for a 100% match in both the x and y direction (or in other words, we assume that the mystery picture was cropped in a perfect square from the original image). This seems to work for most pictures, hence why it is our primary solution.

Secondary solution

We've noticed that on a couple of occasions, Neopets have attempted to be sneaky by manipulating the image (for instance by rotating it). Our secondary solution both attempts to solve this problem, whilst also providing a deeper search should, for whatever reason, the first fail. It involves cross referencing each image above the priority threshold for each pixel in the mystery picture, and returning those that constitute a match. It essentially disregards the order of the pixels (which essentially nullifies the effect of rotating) and instead looks for images that are comprised of the same colours.

Images that return a 100% colour match are considered viable solutions, while images that return above 75% are flagged as possible candidates (in which case we get alerted via email and manually check out the images). This use case can occur when we have an incomplete image database. For instance, often pets have slightly different poses (and we may not have indexed them all), or we may have slightly outdated versions of an image. By flagging possible matches, we can sometimes account for this fact to arrive at the solution.

Third solution

Our third solution is what we run manually when the first two failed. It involves the first two and disregards an image's priority. However, this solution has yet to actually return a result should the first two have failed. It's more used as a 'just in case'.

Problems and issues

The primary issue arises from our image database being incomplete. While our crawler attempts to go deep within Neopets.com, we can't get each image. It stands to reason that we need some sort of manual insertion of images, a problem to which we're currently brainstorming solutions. At the time of writing this, if we have an image in our database, we've yet to see our algorithm not find it.