Original article was published by Pankaj Yadav on Artificial Intelligence on Medium
Feature selection with state of art Regressive model for Housing Data.
Both Feature selection and hyperparameter tuning are key assignments in Machine learning. Hyperparameter tuning is regularly valuable to build model execution, while feature selection is attempted to accomplish better models. Better model interpretability and lower cost of information procurement, data are dealing with and model inference. This paper is about implementing different Regressive models and tuning the hyperparameter of them to get better accuracy. Additionally, exploring the impact of Feature selection on the best model using different feature selection techniques. The Regressive model for the feature selection is selected as KNN Regressor, Decision Tree Regressor, Gradient Boosting Regressor.
Introduction: Regression and classification are arranged under a similar umbrella of regulated Machine learning. Both offer a similar idea of using referred to datasets (referred to as preparing datasets) to make predictions. The fundamental contrast between them is that the output variable in the regression is numerical while that for the category is all out  .
The observation of the house has entries with unique Id with the year of sale with relative information and price for which sold. The target feature of the dataset is the price on which prediction is performed with the relevant information. With help of the regression model will predict the price with relevant information even from the buyer’s perspective he can predict the price of the house with the relevant information with the house which he wants to buy.
The features of “House Sales in King County, USA between May 2014 and May 2015” are listed as:
‘id’, ‘date’, ‘price’, ‘bedrooms’, ‘bathrooms’, ‘sqft_living’, ‘sqft_lot’, ‘floors’, ‘waterfront’, ‘view’, ‘condition’, ‘grade’, ‘sqft_above’, ‘sqft_basement’, ‘yr_built’, ‘yr_renovated’, ‘zipcode’, ‘lat’, ‘long’, ‘sqft_living15’, ‘sqft_lot15’
Correlation-Based Feature select: The correlation is performed to remove the highly correlated features with themselves and get the features that are correlated to the target variable.
From the above image, it can be seen that there are few features that are highly correlated to other features with a correlation coefficient of more than 0.70. For a better model, it also required to have a correlation of features with the target variable so let’s look at the second type of correlation Plot:
Univariate feature selection: Univariate variable (feature) selection works by choosing the best features dependent on univariate measurable tests. We compare each component with the objective variable, to see whether there is any measurably critical connection between them. It is also called analysis of variance (ANOVA). The Sci-kit learn “sklearn.feature_selection” library. Which consists of a function named SelectPercentile. SelectPercentile removes a user-specified highest scoring percentage of features. It takes as input score_func and percentile.
• score_fun : callable Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. The default is f_classif (see below “See also”). The default function only works with classification tasks.
- Percentile: int, optional, default=10 Percent of features to keep
Univariate feature selection using chi2: The chi-square test estimates reliance between stochastic factors, so utilizing this capacity “removes” the highlights that are the well on the way to be independent of class and along these lines insignificant for classification.
Univariate feature selection using mutual_info_regressior: Mutual information (MI) between two irregular features is a non-negative worth, which quantifies the reliance between the features. It is equivalent to zero if and just if two irregular features are free, and higher qualities mean higher reliance.
Univariate feature selection using F_regression :
A linear model for testing the individual impact of every one of numerous regressors. This is a scoring capacity to be utilized in an element determination strategy, not an unsupported feature selection methodology . This is done in 2 steps:
The correlation between each regressor and the target is computed, that is,
((X[:, i] — mean(X[:, i])) * (y — mean_y)) / (std(X[:, i]) * std(y)).
It is converted to an F score than to a p-value.
Tree-based feature selection:
Tree-based estimators (see the sklearn.tree module and forest of trees in the sklearn.ensemble module) can be utilized to figure pollution based component significances, which thus can be utilized to discard of unimportant highlights (when combined with the sklearn.feature_selection. Select from Model meta-transformer).
Feature selection using Recursive feature elimination with cross-validation:
Feature positioning with recursive feature end and cross-validation determination of the best number of features.
An estimator that has worked in cross-validation capacities to naturally choose the best hyper-parameters (see the User Guide). Cross-validation estimators is ElasticNetCV and LogisticRegressionCV. Cross-validation estimators are named EstimatorCV and will in general be generally proportional to GridSearchCV (Estimator ()). The benefit of utilizing a traverse the authoritative Estimator class alongside matrix search is that they can exploit warm-beginning by reusing precomputed brings about the past strides of the cross-validation process. This by and large prompts speed upgrades. An exemption is a RidgeCV class, which can rather perform effective Leave-One-Out CV.