Feature Selection

Source: Deep Learning on Medium

The data is the next generation fuel in the global level to analyse the patterns, behavior, and predictions. It became necessary to move towards automation to reduce the human intervention, costs, and risks as well. Many globalized companies already installed sensors in their industries to capture the data at large scale to analyse the deliverable and predict the risk just by using the captured data.

These sensors capture huge data and deliver to the next end for further actions, since it is a raw data a data science professional must pre-process the data for further modelling tasks. As each individual sensor generates variety kinds of data there exists a lot of features in the data in which some are useful to build model and some or not. Unlike everyone, a data scientist must pick those required features to build model from the large set of features. This is where the hands on experience of the data scientist comes into picture.

There are 3 types of feature selection methods majorly used:

1. Filter Methods

2. Wrapper Methods

3. Embedded Methods

Techniques for Feature Selection:

1. It is better to drop features which are having missing values more than 50% rather than imputing them, more than 50% missing values states those features might not contain the information that a model required

2. Drop low variance features

3. Filter Methods:

a. Subset selection method is to select the set of features at a time unlike correlation and calculate the target variable. Those which are having high impact on the target variable only selected.

b. Multicollinearity would reduce the model performance, better to drop the features which are highly correlated, this correlation checks usually use Pearson Product Moment Correlation (PPMC). This comes under the filter method where we compute the correlation on the individual features to find out the impact on the target variable, and which is more informative.

c. Chi Square test is mostly use on the categorical data where it helps us to find out the informative features form the given data. Chi-Square measures how expected count E and observed count O deviates each other. It comes under the filter method

4. Wrapper Methods: (considers the accuracy of test error)

a. Backward elimination for feature selection uses P value, which we set to significant. And train the model on the all feature. The highest P value greater than the significant P level must dropped. Repeat this with the remaining features till every highest P value feature gets eliminated.

b. Forward feature selection is quite opposite to backward elimination where we add one feature at a time to select the features which are having high information. Once if a model retains a feature it would never get dropped.

c. Hybrid feature selection follows forward feature selection to add the features to model, but It has a property to drop the low informative model from the selection like backward selection.

5. Embedded Methods: (Inbuilt and uses regularized methods)

a. Lasso regularization is a technique that regularize the estimates or shrinks the coefficients to zero that means it uses when Beta value is zero that features are not part of a model.

b. Decision tree algorithms select a feature in each recursive step of the tree growth process and divide the sample set into smaller subsets. The more child nodes in a subset are in the same class, the more informative the features are. The process of decision tree generation is also the process of feature selection.