What it takes to learn the hot field of technology: Rules and best practices of machine learning.
Before getting into the discussion that how effectively we can learn machine learning and implement its various and broader applications into the real-time problems, we should need to view the best practices of machine learning. Machine learning is a new hot topic these days which tries to mimic the human intelligence into the machines and transform it to the more efficient and accurate than humans. AI and machine learning techniques can be applied in almost every field of life to make outputs more effective, precise and accurate.
Following are the some terms which will come up repeatedly in this discussion:
Instance: The thing around which you want to make some prediction.
Label: Kind of a answer which we put to our training data either by machine or by human trainer.
Model: A statistical representation of a data.
Pipeline: Infrastructure of developing a machine learning program including data sets, trained models and then finally production.
So, before you start diving into the tub of machine learning you should keep it mind the following rules.
Rule 1: Don’t be afraid!
You shouldn’t be afraid of trying and implementing the new techniques to launch your final product. Machine learning is a cool thing but it requires amount of precise data to best train your models. But if you don’t have luxury to get such amount of data to best train your model then you shouldn’t apply machine learning to your product. Because incomplete and wrong data can cause you a destruction too but you shouldn’t be afraid of that.
Rule 2: Design and implement
Before you implement the machine learning algorithms to your system, you should review your existing system how it works and how effective it will be to use and implement the machine leaning models. Because every system has its own scenarios and you need to see how much machine learning will be effective and optimize the system.
Rule 3: Choose machine learning!
A complex machine learning technique is difficult to maintain, so a simplest technique is best to get amazing results. So, if you have simple idea to the specific problem then you should go for machine learning to be implemented.
Machine learning: Stage Alpha (1)
Deeply review and focus you system scenarios around which you’re going to develop machine learning and trust your first model what you’re building.
Rule 1 of Stage 1: (rule 4:) Simplicity:
Keep your first model as simple and easy as possible. If simple approach can done the task better than put aside the complex models until you have the enough precise amount of data.
Rule 2 of Stage 1: (rule 5:) Testing independently from machine learning:
Test each part to see the progress of your program doing its job well or not?
Rule 3 of Stage 1: (rule 6:) Be careful:
Be careful with the heuristics, when you copying the existing pipeline for a new project be careful about the useful and potential data for your models aren’t thrown away!
Rule 4 of Stage 1: (rule 7:) Turning heuristics into features:
Do not throw your heuristics completely try to turn them into useful features for your machine learning model because they might can generate handful features for your model.
Rule 5 of Stage 1: (rule 8:) Freshness:
Make sure your model aren’t getting old by time, check the data time to time and keep it fresh for best outputs.
Rule 6 of Stage 1: (rule 9:) Detection of problems:
Before finalizing and exporting your models, check thoroughly for if there’s any problems.
Rule 7 of Stage 1: (rule 10:) Final testing:
Final testing to check all the features are gathered and your trained data is fine.
Rule 8 of Stage 1: (rule 11:) Documentation:
Documenting the features their pros and cons are useful to understand the model well.
Machine learning: Stage Beta (2): Feature Engineering
Rule 1 of Stage 2: (rule 12:) Plan to launch and iterate:
Trained data and your model will evolve with time. So make sure to update it with the new features and data.
Rule 2 of Stage 2: (rule 13:) Start with directly observed features:
Already learned features for future updates.
Rule 3 of Stage 2: (rule 14:) Explore with features:
Explore with the features i.e. numbers of likes get may be useful in future recommendations, quality ranking or in other related models.
Rule 4 of Stage 2: (rule 15:) Use very specific features:
Use features even when they don’t seems to be helpful much but they can be helpful in sorting out smaller problems.
Rule 5 of Stage 2: (rule 16:) Combine and modify existing features:
Add features from the existing one and try not to make it complex to understand.
Rule 6 of Stage 2: (rule 17:) The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have:
Rule 7 of Stage 2: (rule 18:) Clean up features you are no longer using:
If some some features are not improving your model in any way then get rid of them.
Rule 8 of Stage 2: (rule 19:) end user.
Get feedback from the end user through crowd source or directly implying it to the users.
Rule 9 of Stage 2: (rule 20:) Measure the delta between models
Track your models before adding the new one check how much output vary as compared to the previous one.
Rule 10 of Stage 2: (rule 21:) When choosing models, utilitarian performance trumps predictive power.
Usefulness of the model matters more than its accuracy.
Rule 11 of Stage 2: (rule 22:) Look for patterns in the measured errors, and create new features.
Look for the patterns in the identified errors and add respective features.
Rule 12 of Stage 2: (rule 23:) Try to quantify observed undesirable behavior.
Rule 13 of Stage 2: (rule 24:) Be aware that identical short-term behavior does not imply identical long-term behavior.
Rule 14 of Stage 2: (rule 25:) The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
Gather fresh data and use it to test while testing and in this way you will also get to know that it will work or not.
Rule 15 of Stage 2: (rule 26:) Importance-weight sampled data, don’t arbitrarily drop it!
Rule 16 of Stage 2: (rule 27:) Data itself may change between training and serving.
Don’t forget that data can change itself, however you can deal it with by taking snapshots of the data, but it doesn’t seems to solve the problem.
Rule 17 of Stage 2: (rule 28:) Re-use code between your training pipeline and your serving pipeline
Share the code between your training and production model as much as you can.
Rule 18 of Stage 2: (rule 29:) Use earlier data for training, later data for testing.
It will help you in determining how your model work in future.
Rule 19 of Stage 2: (rule 30:) Binary classification
Hold out few percentage of data in binary tasks to get clear trained data and show only least percentage to the user to get new fresh data.
Rule 20 of Stage 2: (rule 31:) Regularize general features
Rule 21 of Stage 2: (rule 32:) Avoid feedback loops
Avoid feedback loops, separate them during training and avoid them during deployment.
Rule 22 of Stage 2: (rule 33:) Measure how your model performs
Measure how your model performs during all the stages and note down how they differ with their outputs.
Machine learning: Stage Gamma (3): Growth, Optimization Refinement and Complex Models
Rule 1 of Stage 3: (rule 34:) Don’t waste time on new features if unaligned objectives have become the issue.
Make sure your ML models are aligned with the core objectives of your product.
Rule 2 of Stage 3: (rule 35:) Launch decisions are a proxy for long-term product goals.
Don’t be so quick in decisions because it can effect multiple matrices.
Rule 3 of Stage 3: (rule 36:) Keep ensembles simple.
To keep things simple, each model should either be an ensemble only taking the input of other models, or a base model taking many features, but not both.
Rule 4 of Stage 3: (rule 37:) When performance plateaus, look for qualitatively new sources of information to add
When your model performance seems to be not as expected then look forward to add new features.
Rule 5 of Stage 3: (rule 38:) Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
Rule 6 of Stage 3: (rule 39:) Your friends tend to be the same across different products. Your interests tend not to be.
Tend to have same approach in building models, but our goals must be different.
These guidelines from the Google researchers might be helpful in doing the machine learning tasks so it is recommended to bookmark this article to revisit whenever needed to get better grip on your ML models.
If you like this post, give it a ❤️ below so others may see it. Thank you!
Want to dig more?
You can see the beginner course on Machine Learning which will provide you with the basic and introductory knowledge about the field, What is machine learning and more!
If you like this then please support us on Patreon, It’ll just take a minute.
Source: Deep Learning on Medium