Linear Regression: Zero to Hero

Original article was published by Abhayparashar31 on Deep Learning on Medium


→Practical←

Dataset : Height-Weight.csv

Importing Necessary Libraries

import numpy as np  ## scientifica calulation
import pandas as pd ## Reading data
import matplotlib.pyplot as plt ## Visulizing data
from sklearn.model_selection import train_test_split #splitting data
from sklearn.linear_model import LinearRegression # fitting model

Load The Dataset

df=pd.read_csv("https://gist.githubusercontent.com/nstokoe/7d4717e96c21b8ad04ec91f361b000cb/raw/bf95a2e30fceb9f2ae990eac8379fc7d844a0196/weight-height.csv")
df

Fetching only required columns using indexing and iloc

X=df['Height'].values[:,None]
y=df.iloc[:,2].values

Visualization Of Data

fig, ((ax1, ax2), (ax3, ax4))=plt.subplots(nrows=2,ncols=2,figsize=(10, 8))                     ## Create subplots 2 rows and 2 columns
fig.tight_layout(pad=3.0) ## padding
ax1.plot(X,y) ## plot x and y
ax1.set_title("Weight and height") ## set title
ax1.set_xlabel("Height") ## set label
males=df[df['Gender']=='Male'] ## all males
females=df[df['Gender']=='Female'] ## all females
males.plot(kind='scatter',x='Height',y='Weight',ax=ax2,color='blue',alpha=0.3,
title='Male and Female Populations') ## scatter plot
females.plot(kind='scatter',x='Height',y='Weight',
ax=ax2,color='red',alpha=0.3,
title='Male and Female Populations');
ax2.legend(['Males','Females'])
males['Height'].plot(kind='hist',ax=ax3,bins=50,range=(50,80),alpha=0.3,color='blue') ## Males height
females['Height'].plot(kind='hist',ax=ax3,bins=50,range=(50,80),alpha=0.3,color='red') ## Females height
ax3.set_title('Height distribution')
ax3.legend(['Males','Females'])
ax3.set_xlabel('Height in')
ax3.axvline(males['Height'].mean(),color='blue',linewidth=2)
ax3.axvline(females['Height'].mean(),color='red',linewidth=2);
ax4.hist(y) ## Histogram
ax4.set_title("Distribution of Weight")
plt.show()
“Image By Author”

Splitting Data into training and validation sets using train_test_split()

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=1/3,random_state=0)

Training the model using Linear Regression

regressor = LinearRegression()
regressor.fit(X_train, y_train)
regressor.score(X_train,y_train)
#####OUTPUT######
0.8568633984006397

Predicting Test Results

y_pred = regressor.predict(X_test)
print(y_pred)

Evaluating model

from sklearn.metrics import mean_absolute_error,r2_score
print("mean_absolute_error: ",mean_absolute_error(y_test, y_pred))
print("r2_score: ",r2_score(y_test,y_pred))

Visualization Of Results

fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize = (14,6))
ax1.scatter(X_train, y_train, color = 'red')
ax1.plot(X_train, regressor.predict(X_train), color = 'blue')
ax1.set_title('Traning Set')
ax1.set_xlabel('Height')
ax1.set_ylabel('Weight')
ax2.scatter(X_test, y_test, color = 'red')
ax2.plot(X_train, regressor.predict(X_train), color = 'blue')
ax2.set_title('Test Set')
ax2.set_xlabel('Height')
ax2.set_ylabel('Weight')
plt.show()
“Image By Author”

Prediction For New Data

lst =pd.DataFrame( [int(input("Enter the height (inch)"))])
regressor.predict(lst)[0]
######OUTPUT########
Enter the height (inch) 70
189.4781545384721

FULL CODE