Original article was published by Ali Osia on Artificial Intelligence on Medium
Data Science Mistakes’ Takeaways
What I have learned in my PhD program and data science career
When I first started my PhD program, I got interested with theoretical aspects of ML and I searched a lot to find a good problem for my thesis, but couldn’t find any. Instead, I found a good practical problem, proposed a novel idea, put it on arXiv, and got a good reputation on that, while I couldn’t publish it in good conferences for two years due to the mistakes I had made, such as bad writing and imprecise experiment design. After finishing my PhD, I started working as a data scientist, and one of my first projects was a time-series prediction. Even though it seemed very easy to me at first, I found it much more challenging in action and it took me a lot of time to solve it because I didn’t know the business very well. The following are what I have learned through these kinds of mistakes.
As a PhD student:
- Start from a well-defined real-world problem, model it, and try to find the easiest acceptable solution.
- After finding the problem, first think about it from scratch and discover its different challenges yourself and then go to prior works.
- Reading prior works mean finding the few real original papers correctly, starting from the root, and building intuitions.
- The main priority is papers, not blog-posts, while they can also help.
- If the problem is not pure theoretical, finding a solution is the first priority and analyzing the theoretical aspects is what you can do later.
- First design the experiment carefully, and then start the implementation.
- First design the paper structure, and then start writing.
As a data scientist:
- First learn the underlying business in action, and then estimate the size of projects.
- Drawing use-case diagram can be helpful in getting familiar with the business.
- Start by finding important questions or concerns from different actors’ perspectives.
- Think about the evaluation at first, and define a good evaluation metric.
- Deliverable notebooks per weeks (it’s also a kind of documentation).
- Dashboards can be very useful, both for you and product managers to get insight about the data.
- Log all events in a diary format; If you are in hurry, just take screenshots.