Source: Deep Learning on Medium
It has been a month into 2019 and AI and ML hype is ever increasing with tons of online and offline courses available. There are so many institutes and companies offering these courses. Moreover, when everyone claims they have the best curriculum; it is confusing where to start.
If you are just starting out or curious to know the nitty-gritty of this field, this blog might help you.
I will be taking it from the ‘WHAT’ (ML terminologies and Jargon) to ‘HOW’ (technicality: algorithms and maths behind it).
To make this blog, not a boring read, two of our characters Newbie and Pro are going to help us. Newbie is an avid learner and now she wants to start with machine learning. However, she does not know where to start. Therefore, she takes the help of her friend Pro, who is an expert in machine learning.
Newbie: Hi, Pro. I have been searching for machine learning on the internet and there is so much of information available that I am overwhelmed with it. Can you help me with ML?
Pro: Sure. Why not? Let us start with the basic and simple definition of the ‘Machine Learning’.
Machine Learning is the ability by which a machine learns without being explicitly programmed. Here, a ‘machine’ means computer software or an algorithm.
Newbie: That is an easy definition to understand. However, what does the machine actually learn?
Pro: Before answering that, you need to know that machine learning is possible only because of the abundance of data available. If there’s no data, there’s nothing to learn from.
So, learning here means to find the patterns in the data i.e. to fit the parameters.
A well-posed learning problem is defined as follows:
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Newbie: What are the parameters and what is meant by fitting them?
Pro: Parameters are the variables that our learning algorithm tries to tune to build an accurate model. They are internal to the model as they can be estimated from the training data.
Parameter fitting means to find the optimal values for these variables so that the prediction accuracy is increased.
Newbie: Pro, can you give me a diagrammatical overview of how the learning happens.
This is a basic overview of how a machine learning algorithm learns from the data. There are a few terms you need to know to understand this.
Training Set: It is the actual dataset which is provided to answer some questions. In fact, all machine learning is about answering some questions in the underlying data.
Learning Algorithm: They are the different classes of the algorithm which are run over the training data to get a model for prediction task. Ex: SVM, Decision Trees, Neural Net.
Hypothesis: A hypothesis is a certain function that we believe (or hope) is similar to the true function, the target function that we want to model. In the context of email spam classification, it would be the rule we came up with that allows us to separate spam from non-spam emails.
Model: In the machine learning field, the terms hypothesis and model are often used interchangeably. In other sciences, they can have different meanings, i.e., the hypothesis would be the “educated guess” by the scientist, and the model would be the manifestation of this guess that can be used to test the hypothesis.
How learning happens: You pass an algorithm with training data. The learning algorithm finds patterns in the training data such that the input parameters correspond to the target. The output of the training process is a machine learning model which you can then use to make predictions. This process is also called “training”.
y = mx + c, this is our hypothesis function/model where the optimal value of m and c will be learned by the algorithm. Now, given a new value of X, the model will be able to accurately predict the corresponding value of Y.
M and C are the parameters, M = Weight and C= Bias
Newbie: This was the fairly nice explanation of the whole learning process.
Pro: This is just the overview of the process, there is much more that goes on internally during the training and prediction process.
Newbie: While I was searching for machine learning on the internet, I found other terms suffixed with learning. Like supervised learning, unsupervised learning, reinforcement learning, deep learning. I am confused how they are related to machine learning. Can you help?
Pro: Machine Learning is a very broad topic and there are different types of it. Mainly categorized into 3:
- Supervised Learning: The dataset given contains both, the input and the corresponding output i.e labeled dataset. We already know our answer(output) for a question(input), having the idea that there is a relationship between the input and the output.
- Unsupervised Learning: The dataset given is unlabeled i.e. there is no idea what our output should look like. We have to derive structure from the data by clustering the data based on relationships among the variables in the dataset.
- Reinforcement Learning: It is a trial and error technique where an algorithm learns by performing some action and based on its action it gets a reward, which acts as a signal whether the action taken was good or bad.
This process continues until the algorithm masters the skill and is mathematically formulated as Markov Decision Process.
Deep Learning: It is a special kind of machine learning or rather it is a sub-field of machine learning where a specific type of algorithm is used known as Neural Networks. Due to data explosion and powerful computing power available, it has become a state-of-the-art technique for many applications we use today.
A neural network is a group of artificial neurons connected together where a neuron is a single computational unit.
Newbie: As far as I understood, the difference between machine learning and deep learning is of the algorithm they use. Deep learning specifically uses neural networks whereas machine learning algorithms are different. Are there any other major differences between them?
Pro: Yes, there are other differences also. The two major ones I think of are, the amount of data and feature engineering.
Deep learning models tend to perform well with a large amount of data whereas old machine learning models stop improving after a saturation point.
Another difference between them is in the feature extraction area. Feature extraction is done by human in machine learning whereas deep learning model figures out by itself.
Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering.
— Andrew Ng, Co-founder of Coursera, Google Brain
Newbie: Now, things are getting clear to me. You told neural nets are the specific algorithm for deep learning, so what are the algorithms used for machine learning models?
Pro: As you know there are different types of machine learning, so there are specific algorithms for these types.
There are mainly two types of tasks when solving an ML problem:
- Prediction i.e. supervised learning problem where data is labeled. Further classified into Classification and Regression.
- Discovery: It includes an unsupervised and reinforcement learning problem where the task is to find a cluster, outlier, or a strategy.
Newbie: This overview of machine learning tasks and techniques is very helpful to understand the big picture. But what are the common machine learning algorithms that are used for different tasks?
Pro: There are a variety of algorithms and no one algorithm works best for every problem.
Newbie: In this sense, is neural networks are not always the best choice for solving a problem?
Pro: Yes, it is true that neural networks are state-of-the-art, but there are many factors at play while solving a problem, such as the size and structure of your dataset.
For example, you can’t say that neural networks are always better than support vector machines(SVM) or vice-versa.
Newbie: So, how to pick the right ML algorithm for a given problem?
Pro: You should try many different algorithms for your problem while using a hold-out “test set” of data to evaluate performance and select the winner.
Newbie: ok. So, what are the algorithms that I can start with and use on my problem data?
Pro: Before understanding different ML algorithms, you should know about:
- Regression(Labeled Data): Regression means to predict results within a continuous output i.e. to map input variables to some continuous function.
For example, any time series data. This technique involves fitting a line.
- Classification(Labeled Data): In this, results are predicted in a discrete output i.e. categorize data into predefined classes.
For example, an email can either be ‘spam’ or ‘not spam’.
- Clustering(Unlabeled Data): It is a process of grouping similar entities together. It gives us insight into the underlying patterns of different groups.
For example, putting news articles from different sources into a specific category like sports, politics, technology etc.
Every machine learning algorithm will solve one or the other above task. In this, ‘finding a strategy’ is not included i.e. reinforcement learning. I’ll be talking about it in the later blogs.
Some of the common and widely used ML Algorithms are:
It is perhaps one of the most well-known and well-understood algorithms in statistics and machine learning. It’s a supervised machine learning algorithm used for regression problems.
- It will use the data points to find the best fit line to model the data.
- A line can be represented by the equation, y = m*x + c where y is the dependent variable and x is the independent variable.
- Basic calculus theories are applied to find the values for m and c using the given data set.
Is of two types, Linear Regression with:
- One variable: only 1 independent variable is used. ( y = mx +c )
- Multiple variables: multiple independent variables are used. (y = m1x1 + m2x2 +……….+ mnxn + c)
SVM(Support Vector Machine)
It is a supervised machine learning algorithm that can be used for both classification and regression problem.
- In SVM, we plot the data points in an N-dimensional space where N is the number of features and find a hyperplane to differentiate the datapoints.
- A hyperplane is a line that splits the input variable space.
- The distance between the hyperplane and the closest data points is referred to as the margin. The best or optimal hyperplane that can separate the two classes is the line that has the largest margin.
- Only these points are relevant in defining the hyperplane and in the construction of the classifier. These points are called the support vectors. They support or define the hyperplane.
- This is a good algorithm when the number of dimensions is high with respect to the number of data points.
- Due to dealing with high dimensional spaces, this algorithm is computationally expensive.
Decision tree is a classifier in the form of a tree structure. Usually, this algorithm is used to solve classification problems but regression problems can also be solved by it.
- It usually represents a binary tree where each node represents a single input variable (x) and a split point on that variable.
- The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.
- This algorithm classifies instances or examples by starting at the root of the tree and moving through it until a leaf node which is the target value.
- Decision Trees are easy to understand as they are just a number of yes/no questions that one has to ask.
This algorithm is based on the “Bayes’ Theorem” in probability. It is mainly used in classification task and works very well when many classes exist in the problem.
- Naive Bayes is called naive because it assumes that each input variable is independent of each other.
For Example: If we have to predict a flower type by its petal length and width, we can use Naive Bayes algorithm since both of those features are independent.
- The data in the real world is very less often independent. That’s why naive Bayes is not always the preferred choice for classification, nevertheless, this technique is very effective on a large range of complex problems.
- There are two types of probabilities that are calculated during the training of a naive Bayes classifier:
- The probability of each class
2. The conditional probability for each class given each x value.
Once calculated, the probability model can be used to make predictions for new data using Bayes Theorem.
It is an unsupervised clustering algorithm. It forms clusters from the data which contains homogeneous data points.
- ‘K’ is a hyperparameter which represents the number of centroids.
The algorithm to implement K means clustering is quite simple.
- Randomly pick K centroids.
- Assign each datapoint to the centroid closest to it.
- Recompute the centroids based on the average position of each centroid’s points.
- Iterate till points stop changing assignments to centroids.
To predict, you just find the centroid they are closest to.
Although there are many other Machine Learning algorithms, these are the most popular ones and easy to start with.
Newbie: These algorithms are quite intuitive and a good starting point to learn. But, if I need to use them for any of my problems, do I need to write them from scratch?
Pro: NO, there are already written libraries and frameworks that you can use and apply on your dataset.
Python is the most preferable and used language for machine learning among the data scientist and ML engineers. There are an extensive set of libraries and frameworks available in python for machine learning.
Newbie: So, which library should a beginner like me should start with?
Pro: Scikit-learn is probably the best one for beginners. It supports many supervised and unsupervised learning algorithms including which we discussed above.
Newbie: Scikit-learn is what I’ll start with.
Pro: Yes, Scikit-learn is a good start. There is one more thing which is very important for solving any ML problem.
There is a plan of attack you need to always follow:
- Understand the problem and data
- Data exploration/ data cleaning
- Feature engineering/ feature selection
- Model evaluation and selection
- Model optimization
- Interpretation of results and predictions
This is not a linear process but a non-linear one. The steps are repeated and completed in any order. You need to get familiar with the data and often go back to the previous steps to take a new approach.
Newbie: Thanks for such a great overview of machine learning.
Thank you for reading!