Insurance cost prediction using linear regression

Original article was published on Deep Learning on Medium

Insurance cost prediction using linear regression

In this problem we’re going to use information like a person’s age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person.

In statistics, linear regression is a linear approach to modeling the relationship between a dependent variable or target and one or more independent variables or input features. The case of one independent variable is called simple linear regression

PyTorch is a library for Python programs that facilitates building deep learning projects. We like Python because is easy to read and understand. PyTorch emphasizes flexibility and allows deep learning models to be expressed in idiomatic Python.

In a simple sentence, think about Numpy, but with strong GPU acceleration. Better yet, PyTorch supports dynamic computation graphs that allow you to change how the network behaves on the fly, unlike static graphs that are used in frameworks such as Tensorflow.

Getting Started

Import required PyTorch libraries

URL to download the data

After getting a data then convert into the dataframe using pandas library.

Pandas (all lowercase) is a popular Python-based data analysis toolkit which can be imported using import pandas as pd. It presents a diverse range of utilities, ranging from parsing multiple file formats to converting an entire data table into a NumPy matrix array. This makes pandas a trusted ally in data science and machine learning

Prepare the dataset for training

We need to convert the data from the Pandas dataframe into a PyTorch tensors for training. To do this, the first step is to convert it numpy arrays. If you’ve filled out input_cols, categorial_cols and output_cols correctly, this following function will perform the conversion to numpy array

Pick a number between 0.1 and 0.2 to determine the fraction of data that will be used for creating the validation set. Then use random_split to create training & validation datasets.

Pick a batch size for a data loader in power of 2 .

Create a Linear Regression Model

Define a model using class .The function called InsuranceModel is constructed and torch.nn.module is passed as an argument in it.It contains ll required function.In this class we make function for train and validation .

For optimizing the model and reduce the error rate we used l1 as a loss function.L1 Loss function stands for Least Absolute Deviations.The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.L1 Loss Function is used to minimize the error which is the sum of the all the absolute differences between the true value and the predicted value.

Train the model to fit the data:

Use the evaluate function to calculate the loss on the validation set before training

And run the model for different learning rate to reduce the error.

Make predictions using the trained model

After complete the training process we need test our model for that predict the output using the model.

For more details and full code you can check out this link