Original article was published on Deep Learning on Medium
Orthogonalization in Machine Learning
Orthogonalization is a system design property that ensures that modification of an instruction or an algorithm component does not create or propagate side effects to other system components. Orthogonalization makes it easier to independently verify the algorithms, thus reducing the time required for testing and development.
One of the problems with developing machine learning systems is that there are so many things that you might try to change. For instance, so many hyperparameters you might tune. One of the things I have noticed is about is that most people who is learning machine learning were really consistent about what to tune? To try and achieve one result. This process is what we call orthogonalisation.
In order to do better in a supervised learning program, you typically need to change the system’s knobs to ensure four things hold true. First, you usually have to make sure on the training set, at least, you ‘re doing well. Thus performance on the training set needs to pass some assessment of acceptability. This could mean doing performance comparably to human level for some applications. But that is going to depend on your application.
But after doing well on the training sets, you hope that this will lead to good on the dev set as well and then, on the test set, too. And finally, you hope that in the real world, your system will perform well on the test set on the cost function outcomes. If your algorithm doesn’t fit the cost function properly, you want one knob or maybe you can use a specific set of knobs to make sure you can adjust your algorithm to fit the training set well. And the knobs that you’re using to tune this, you could be preparing a bigger network. Or you could turn to a better optimization algorithm, such as the Adam optimization algorithm, etc.
Conversely, if you find that the algorithm doesn’t suit well with the dev set, then there is a different set of knobs you want to try . So, for instance, if the algorithm does well on the training set, but not on the dev set, then you have a set of regularization knobs that you can use to try and make the second criterion satisfy. And getting a bigger training set would be another knob you could use, which will help you generalize your learning algorithm better to the dev set.
What if you’re doing good on the dev set but not on the test set?
If that happens, you’ll probably want to get a bigger dev set, then the knob you turn is. Because if it does well on the dev set but not on the test set, it probably means you’ve overtuned your dev set, and you need to go back and find a bigger dev set. And eventually, if it does well on the test set, but it doesn’t produce a good result to you, then what that means is you want to go back and either change the dev set or the cost function. Because if doing well on the test set does not correspond to your algorithm doing what you need to do in the real world, then either your dev test set distribution is not set correctly, or your cost function does not measure the right thing.
So the knobs you use to tune this are,
1. Fit training set well in cost function
-If it doesn’t fit well, the use of a bigger neural network or switching to a better optimization algorithm might help.
2. Fit development set well on cost function
-If it doesn’t fit well, regularization or using bigger training set might help.
3.Fit test set well on cost function
-If it doesn’t fit well,the use of a bigger development set might help
4. Performs well in real world
-If it doesn’t perform well, the development test set is not set correctly or the cost function is not evaluating the right thing.