Why You’re Failing the Telephone Interview

Original article was published on Artificial Intelligence on Medium

The Wrong Answer

There is no single right answer. But there are several wrong responses. Brushing the question off a second time is perhaps the worst. I’ll suppress my ire, make a third and final attempt, but that’s effectively the end of the interview. Many flounder and reveal they have no idea how to evaluate a model in even the most cursory ways.


One common poor answer is something like, “I had a test set and a training set; the accuracy of the model on the test set was 90%”. Why is this poor? Well, for starters, I’d need more information to know if 90% accuracy is a good result or a terrible one. If 95% of the test set is the same class, 90% accuracy is terrible!

Also, that answer doesn’t address several other important questions including:

  1. How did you avoid bias when selecting the test set?
  2. What makes you think the test set is sufficient?
  3. How well does the test set reflect the data you’ll see in production?
  4. Why did you choose accuracy as your scoring metric? What other metrics did you consider?
  5. How many different models did you try with this split, and how do know you haven’t overfitted?

I’ll ask as many of these as time permits. I need to know you at least care that your model is correct and not just turning the crank until you get a favorable result.


The most common poor answer tries, somewhat, to address the issue of over-fitting: “I used cross-validation”. There seems to be a pervasive misconception that cross-validation solves several fundamental and complex problems in data science. It doesn’t.

Cross-validation is a great tool, but it guarantees nothing. You can still easily cherry-pick results or features, bleed information from one fold to another, or use information in a way not possible in production. Nor can it address more epistemological issues in data science such as black swans, dragon kings, reproducibility, and the effect of predictions on the outcomes they are meant to predict.

Don’t worry. I’m not looking for a philosophical treatise. I just need to know you don’t think that machine learning is magic.