Source: Deep Learning on Medium
How to Perform Machine Learning Research in a Fast Paced Environment – Post 2/2
In the previous post, I’ve talked about the difference between performing as an applied researcher in the industry compared to a researcher in academia. I then divided the work we might face into two main categories: a ‘known’ task with well-defined set of solutions (for example, detection, segmentation, etc.) we can work with, and a ‘new’ task, which will require us to harness our knowledge and expertise in order to solve it.
In this post I will present an approach and flow for dealing with a ‘new’ task. On such task, we do not have a set of possible approaches which we can choose from as a starting point, so we should think more outside the box.
1) Break down the task into known building blocks
Trying to solve the problem from scratch will be ineffective, time consuming and will eventually bring us to a point where we are repeating some of the work that was done in the past. What we should do is break down our problem into known building blocks. For example: are we dealing with a video or a temporal understanding of the input? Then try and find relevant approaches for temporal analysis that might aid us going towards a possible solution. Are we dealing with a unique data modality? Try to find some work on a rather close one. Should we fuse multiple sources of information? Then look for work in that area.
Breaking down the task to these known sub-tasks will help us set up our approach and better understand later what the difficulties in our task and data are. It is easier than trying to ‘brute force’ a whole new solution from scratch and save us time.
2) Set up a preliminary approach that fuses the building blocks
Now we should use the sub-tasks and set up a preliminary approach which we would test. Go with a very simple concept for each of these building blocks (once again, not ‘latest and greatest’). All the guidelines from sections 2, 3&4 of the previous post hold.
3) Start by gradually adding model complexity
Like we broke down the task to sub-tasks we should also try and examine our approach step by step and testing gradually whether we are learning what we expect. So, we start by doing only a partial task. This is possible by using only some of the building blocks of our approach. This will allow us to examine if we are able to learn what we expect and eventually save debug time.
To wrap up these three sections, here is an example. Let’s say our task is to recognize pedestrians that might barge into the road and predict a probability that they would do that. This is important because we don’t want an autonomous car to stop in the middle of the road or slow down traffic any time there is a pedestrian close to the edge of the sidewalk. Let’s assume we have a labeled dataset for that. Obviously, this is not an easy and straight forward task. Probably training it brute force end-to-end won’t work. We need a smarter algorithm for this. Here is a possible break down we might start with: we first detect all pedestrians on the sidewalks. We need to segment sidewalks and perform pedestrian detection. Then, we might want to predict the direction where the pedestrian is facing (if a pedestrian might barge into the road, he will be facing towards the road and not the opposite) and finally how close is he to the road itself. Later on, on top of that we might want to add some temporal understanding (for example, maybe the pedestrian is standing there for a while already, or maybe he got closer to the sidewalk edge in the last few seconds and that would raise the probability he will jump on the road and we should slow down). Using all these factors we can start by training a classifier that outputs some probability as the final output for the original task.
Here we have broken down the task into 4 sub tasks, each of them is a simpler task that we might have some known approaches we can start from. There is no guarantee this will also be the algorithm eventually, but this is a good starting point. We can also start just by doing pedestrians on sidewalk detection, then add the direction and finally train everything together. This is what I suggested in section 3 as for doing some gradual task evaluation to evaluate each subtask.
5) Overfit, fix, change, scale up, improve
Sections 5 to 8 in the previous post hold here as well. Like in the previous post, everything we train or examine we start with a small number of examples. This is the time to see if our algorithm and approach can work for a small number of examples. We will change or fix it according to the results we will see on this small subset. Change the sub models if needed. Change the algorithm if needed. This should eventually work well for a small subset. Once we got there we can scale up to the entire dataset. Once again when we scale up, we start looking at the outputs of the algorithm and see where it performs well and where it fails, and do some researcher work to make the algorithm better.