Difference between machine learning and statistics.



Both machine learning and statistics share the same goal: Learning from data. Both these methods focus on drawing knowledge or insights from the data. But, their methods are affected by their inherent cultural differences.

They’re related, sure. But their parents are different.

Machine learning is a subfield of computer science and artificial intelligence. It deals with building systems that can learn from data, instead of explicitly programmed instructions.

A statistical model, on the other hand, is a subfield of mathematics.

Machine learning is comparatively a new field.

Cheap computing power and availability of large amounts of data allowed data scientists to train computers to learn by analyzing data. But, statistical modeling existed long before computers were invented.

Methodological differences between machine learning and statistics

The difference between the two is that machine learning emphasizes optimization and performance over inference which is what statistics is concerned about.

This is how a statistician and machine learning practitioner will describe the outcome of the same model:

  • ML professional: “The model is 85% accurate in predicting Y, given a, b and c.”
  • Statistician: “The model is 85% accurate in predicting Y, given a, b and c; and I am 90% certain that you will obtain the same result.”

Machine learning requires no prior assumptions about the underlying relationships between the variables. You just have to throw in all the data you have, and the algorithm processes the data and discovers patterns, using which you can make predictions on the new data set. Machine learning treats an algorithm like a black box, as long it works. It is generally applied to high dimensional data sets, the more data you have, the more accurate your prediction is.

In contrast, statisticians must understand how the data was collected, statistical properties of the estimator (p-value, unbiased estimators), the underlying distribution of the population they are studying and the kinds of properties you would expect if you did the experiment many times. You need to know precisely what you are doing and come up with parameters that will provide the predictive power. Statistical modeling techniques are usually applied to low dimensional data sets.

Source: Deep Learning on Medium