Distinguishing DS, ML, DL, AI and Big Data
In this article, I will try to distinguish all the buzz words circling the field of data analytics lately — Data Science (DS), Machine Learning (ML), Deep Learning (DL), Artificial Intelligence (AI), and Big Data. All of these fields intersect with each other in some instances, and any of these fields might require the other for effective data analytics.
The definition of data science will help in understanding the similarities and differences between DS, ML, DL, AI, and Big data.
Data Science involves collecting, processing, exploring, and modelling data to interpret insights from them to solve a specific business problem.
Any data science project undergoes the following five cycles (O.S.E.M.N framework) iteratively,
Step 1 — Obtain or collect stored data
Step 2 — Scrub or process data
Step 3 — Explore data
Step 4 — Model data
Step 5 — Interpret data
The first stage deals with storing already available data or with collecting data from databases, websites, social media sites, or from experiments using a host of tools and scripting languages.
The second stage deals with processing the data acquired from the first stage. It is usually necessary to convert the data from one format to another or to convert the data to a standardized format. Also, depending on the application, this stage might deal with filtering or replacing data, splitting and merging data, etc.
The next stage deals with exploring the processed data with charts and plots and drawing inferences by describing and summarizing them using statistical measures. Based on the type of data and the desired insight, suitable plots and statistical tests will be chosen.
The fourth stage is the ‘magic stage,’ where the explored data will be modelled using statistical models or algorithms and will aid in data prediction or forecasting or data classification problems.
The final stage is the most crucial stage where the data scientist will set out to explain the solution to the business problem by depicting the data analytics in a form that shall be understandable to the audience from different sectors and programming fluency. Thus, the final stage requires one to have a flair for communication and presentation skills in addition to the technical skills, as this is the stage where the results from the data would convince those in a position to bring a positive difference in the business goals and outcomes.
Now the data involved in the data science projects might be big data and would require different sets of tools in each of the stages, unlike structured data, which will be discussed further in this post.
Also, the fourth stage of the project lifecycle is where data science intersects with Machine Learning, Deep Learning, and Artificial Intelligence, which will also be discussed further.
“Data is the new science. Big Data holds the answers.”
“There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days.”
There is a famous saying by Sherlock Holmes, “The temptation to form premature theories upon insufficient data is the bane of our profession.” But this issue of insufficient data does not exist in this generation anymore as the future of every organization is big data.
Big data are a wide variety of unstructured and uncurated data like texts, images, videos, and speeches that are generated at a very high volume and velocity.
Now almost all of the organizations except for small-scale businesses require data scientists to draw insights from big data. Since these are large datasets, they need the use of distributed processing to improve project development efficiency. Acquiring, processing, and exploring big data in data science projects require knowledge of Hadoop, Spark, Flink, and Map Reduce. Further, due to its large volume, deep learning algorithms might be essential to model and draw inferences from them.
Artificial Intelligence, Machine Learning, and Deep Learning
As the name suggests, artificial intelligence involves imparting intelligence to algorithms. Today, AI is used to solve a lot of real-world problems like Knowledge Representation, Reasoning, Problem Solving, Decision Making, Natural Language Processing, Computer Vision, Speech Technology, Robotics, etc. In the early 1950s to 1980, AI systems mainly dealt with Knowledge Representation, Reasoning, Problem Solving, and Decision Making and employed B.F.S. and D.F.S. and other graph traversal algorithms, propositional and first-order logic, and Expert Systems to solve complex problems. But due to the increasing limitations of these algorithms, Machine Learning came into the picture. Machine Learning was primarily adopted to solve intelligence problems since 1980.
“Machine Learning algorithms play a key role in solving data prediction and classification problems when the relationship between the data features seems to be very complex and nonlinear.”
Lately, all the problems involving learning and sequential decision making, prediction, and classification in dynamic environments employ machine learning algorithms. Further, in some cases, machine learning algorithms also incorporate reinforcement learning to develop an efficient decision-making system with minimal or no explicit supervision.
“When you have large amounts of high-dimensional data, and you want to learn very complex relationships between the output and input using a specific class of complex Machine Learning models and algorithms, it is collectively referred to as Deep Learning algorithms.”
Deep Learning is inspired by the information processing capability of the human brain. Since 2010, Deep Learning has taken over machine learning in solving all of the intelligence problems like Data Classification, Natural Language Processing, Computer Vision, Speech Technology, Robotics, etc. as every field is immensely data-driven. But unlike Machine Learning, Deep Learning requires hardware with massive processing power as it deals with large amounts of data. The features of Machine Learning and Deep learning algorithms are elucidated in the following table,
Thus, its ideal to use machine learning algorithms when there is a small or medium dataset, and it requires the classification feature to be chosen manually and requires an algorithm with faster execution time. Whereas deep learning algorithms are employed to classify massive datasets where the manual selection of data features is tedious. The execution of DL algorithms is usually slower and requires sophisticated software.
DS Vs. ML Vs. DL
Modelling of data in data science projects employ statistical modelling or algorithmic modelling. Lately, all the algorithmic models use Machine Learning and Deep Learning since all the modern data features are complex and nonlinear.
There are various algorithms for algorithmic modelling and depending on whether we need to predict future values using regression or to classify data, any of the following algorithms can be chosen,
Linear Discriminant Analysis
Support Vector Machines
Multilayered Neural Networks
Since deep learning algorithms are relatively new, there are only a few available algorithms, but if machine learning algorithms are employed to model data, there a lot of algorithms from which the desired algorithm can be chosen.
A data science project involving large high-dimensional datasets is dealing with Big Data and requires the use of specific scripting tools capable of distributed processing in the various stages of its project development lifecycle. Further, artificial intelligence intersects with data science when data modelling is carried out. Algorithmic Models, including machine learning and deep learning models, are employed when the data features are very complex and nonlinear. Finally, machine learning is an extension of simple algorithmic models, and deep learning is an extension of machine learning. Thus, both machine learning and deep learning algorithms are enclosed under the hypernym of Artificial Intelligence.
Hence if a data scientist encounters a set of small or medium datasets whose features have a reasonably straightforward relationship model, then the data science project would be completed without employing artificial intelligence.
But if the set of small or medium datasets have a complex relationship model, then the data science project would require the use of artificial intelligence algorithms, probably even machine learning algorithm depending on the application.
Finally, if the data scientist is dealing with a set of massive datasets with numerous features, then modelling the big data would most probably require deep learning.
I hope the similarities and differences among Data Science (DS), Machine Learning (ML), and Deep Learning (DL) is clear from this post. But I should point out that the theory behind the implementation of Machine Learning, and Deep Learning algorithms are not discussed in this post as such discussion would cover several blog posts and is beyond the scope of this post which set out to highlight the intersection of DS, ML and DL sectors.
Please feel free to discuss or leave a message if you have any feedback regarding the contents.