Data Science Mistakes’ Takeaways

Original article was published by Ali Osia on Artificial Intelligence on Medium

Data Science Mistakes’ Takeaways

What I have learned in my PhD program and data science career

Photo by Daniela Holzer on Unsplash

When I first started my PhD program, I got interested with theoretical aspects of ML and I searched a lot to find a good problem for my thesis, but couldn’t find any. Instead, I found a good practical problem, proposed a novel idea, put it on arXiv, and got a good reputation on that, while I couldn’t publish it in good conferences for two years due to the mistakes I had made, such as bad writing and imprecise experiment design. After finishing my PhD, I started working as a data scientist, and one of my first projects was a time-series prediction. Even though it seemed very easy to me at first, I found it much more challenging in action and it took me a lot of time to solve it because I didn’t know the business very well. The following are what I have learned through these kinds of mistakes.

As a PhD student:

  • Start from a well-defined real-world problem, model it, and try to find the easiest acceptable solution.
  • After finding the problem, first think about it from scratch and discover its different challenges yourself and then go to prior works.
  • Reading prior works mean finding the few real original papers correctly, starting from the root, and building intuitions.
  • The main priority is papers, not blog-posts, while they can also help.
  • If the problem is not pure theoretical, finding a solution is the first priority and analyzing the theoretical aspects is what you can do later.
  • First design the experiment carefully, and then start the implementation.
  • First design the paper structure, and then start writing.

As a data scientist:

  • First learn the underlying business in action, and then estimate the size of projects.
  • Drawing use-case diagram can be helpful in getting familiar with the business.
  • Start by finding important questions or concerns from different actors’ perspectives.
  • Think about the evaluation at first, and define a good evaluation metric.
  • Deliverable notebooks per weeks (it’s also a kind of documentation).
  • Dashboards can be very useful, both for you and product managers to get insight about the data.
  • Log all events in a diary format; If you are in hurry, just take screenshots.