Original article was published by Matteo Barbieri on Artificial Intelligence on Medium
Imagine you have a very racist old aunt who blames immigrants for climate change (which she denies: logic is above her), spreads all kinds of misinformation through her facebook account and is a hardcore Westboro Baptist Church affiliate. Birds stop singing when she’s around. If you don’t have to imagine that, just focus on the pleasant thought that time will eventually fix that problem for you.
Aunt Karen has only one source of joy in her life: every tuesday, the weekly issue of “Sudoku eXXtreme 4000” is delivered to her mailbox and she cherishes every moment she spends on those puzzles.
Our goal is to destroy that happiness.
To do so, we will design and implement a system that will allow you to take a picture from a page of the magazine containing one Sudoku grid and automatically solve it in seconds.
This article is part of a series:
- Part 1: Introduction and Project Design
- Part 2: Data preprocessing (coming ~19/10/2020)
- Part 3: Digits recognition and Sudoku solver (coming ~26/10/2020)
- Part 4: Deployment and retro (coming ~02/11/2020)
You should have a plan
When I begin working on a new project, I spend a considerable amount of time trying to make sure that a couple of questions have clear answers (or at least as clear as possible):
- What am I doing? What is the end goal of this project?
- How do I accomplish that?
It’s an iterative process that starts with the high level description of the task (in our case it’s something like “given a picture of a Sudoku, return another picture of the completed puzzle”) and breaks it down in smaller tasks until they become clear, atomic tasks that can be reduced to some specific problem for which there is a solution in the form of an algorithm.
It does not have to be perfect: no matter how good you are at planning there will most definitely be something that you overlooked and you will have to adjust your trajectory mid flight and there’s no point fixating on details at this stage. As long as the original plan is good enough and provides a good starting point, that’s ok.
Let’s draw some shapes.
Technical note: for drawing diagrams I use miro, which quite frankly is such a nice invention that makes penicillin look like a 4th grader’s soda volcano. Sorry Alexander, this is simply the truth.
Your starting point, as I said before, is the very definition of the problem, nothing more. So from a blank page you should get to this:
It’s not much, and of course there’s a good chance that you actually have a decent idea of where to go next, but let’s take it step by step.
By the way, I’m going to use colors to code the status of a task:
The next natural thing is of course to split the task into a first set of smaller, more manageable ones. When I started thinking about what components I needed for this project I came up with four main tasks, using a bottom up approach.
- At some point, I expect to have a structured representation of the Sudoku, that is a 9 x 9 array representing the 81 cells, which will either contain a digit from 1 to 9 or be blank. Having this information in that format, even though I may not necessarily know how to do that (that’s why that block has a red border), I expect that an algorithm should be able to solve the puzzle.
- To obtain that information however it will be necessary to scan the grid and identify the digit present in each cell (or detect that it’s blank). Luckily, if there is something that any data science by law is required to do is the infamous tutorial on handwritten digits recognition, so that part should be covered (green border). But is it really? 🧙
Spoiler from part 3 (which I will publish in around 2 weeks, if all goes well): it’s actually a bit more complicated than that, but I found an elegant solution to the unexpected setback, if I may say so 🧐.
- We want our system to be able to work with images of Sudoku grids taken in a “natural” setting, without any too strong constraints. Therefore, we’ll have to do some sort of preprocessing to the input image, in order to neatly crop and rotate the grid from the full page of a magazine. Plus potentially any other adjustment required to facilitate the work of the component responsible for digit recognition.
- Finally, it would be a nice touch if we were able to present the work in a nice format, say a web app where you upload your image and it is displayed with another image on the side showing the solution to the puzzle. The border is yellow because while I did something like that in the past, it’s not an aspect of projects that I consider myself to be an expert of, so I want to remember that that part might require more effort or research than the ones I’m more familiar with.
Further expanding the activities described at point 3, we get the final (so to speak) version of the breakdown:
This is good enough for now. Throughout the project you may find out that some of the tasks will be more or less complicated than you anticipated, or that they will have to be further split in even smaller tasks ones. You may even have forgotten something that you didn’t think of while planning. That’s normal, that’s bound to happen when you go from the idea of a product to actually implementing it.
What comes next
That’s it for now. We’re still at exactly 0 lines of code written but what is arguably one of the most important parts of any project has been completed. Planning is like the bass player in a band, you only notice it when it’s missing¹. In the upcoming articles I will cover how to implement all the tasks that are listed in the diagram above, starting from data ingestion/preprocessing. See you in a week!
Your joy is soon going to be the memory of a distant past, aunt Karen.