Original article was published on Artificial Intelligence on Medium
Regression and Matrix Plots in Seaborn | Python
Seaborn is a wonderful visualization library provided by python. It has several kinds of plots through which it provides the amazing visualization capabilities. Some of them include count plot, scatter plot, pair plots, regression plots, matrix plots and much more. This article deals with the regression plots and matrix plots in seaborn.
What are Regression Plots?
The regression plots in Seaborn library of Python are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analysis. As the name suggests Regression plots, creates a regression line between 2 parameters and helps to visualize their linear relationships.
Getting started with Regression Plots –
1. Importing the required libraries
2. Importing the dataset
Using pandas library to read dataset.
You can download the same dataset from here.
3. Exploratory data analysis
- df.head() function gives the first 5 rows of the dataset as the output
- Checking total number of NaN values in each column
4. Visualizations part
- Plotting with regplot() function and also evaluating regression with residplot() function. A residual plot is useful for evaluating the fit of a model.
- Polynomial regression : Seaborn supports polynomial regression using the “order” parameter. residplot() with polynomial regression.
- Customizing regression plots : Binning the data. “x_bins” parameter can be used to divide the data into discrete bins.
Getting started with Matrix Plots –
Seaborn’s heatmap() function requires data to be in a grid format.
Pandas corr() function is frequently used to manipulate the data.
1. Correlation function of pandas library
Pandas corr() function calculates correlations between columns in a dataframe.
2. Building a heatmap of correlation matrix
The output of correlation matrix can be converted to a heatmap with seaborn library. Plotting a heatmap with Seaborn’s inbuilt heatmap function.
3. Customizing the heatmap
- “annot” is used to annotate the actual value that belongs to these cells
- “cmap” is used for the colour mapping you want like coolwarm, plasma, magma etc.
- “linewidth” is used to set the width of the lines separating the cells.
- “linecolor” is used to set the colour of the lines separating the cells.
This brings us to the end of this article. I hope you have understood all the visualizations clearly. Make sure you practice as much as possible.
If you wish to check out more resources related to Data Science and Machine Learning you can refer to my Github account.
Do look out for other Jupyter notebooks in the series which will explain the various other aspects of Data Visualizations with Seaborn in Python.
You can also check my Data Science Portfolio on Github account.
Hope you like the post.