Original article was published by Kr. Wan on Artificial Intelligence on Medium
Now comes the fun part, we want to process our numerical values in the dataset. There are a number of things we need to do: first, we need to replace missing values; second, we need to standardize our values.
Replacing Missing Values
Sometimes the dataset we have will usually have many missing values with
Nan values, and those are always a pain for anyone who wishes to use the dataset for various reasons. Luckily,
sklearn provides a simple and efficient way to deal with such problem. With a few lines of code, we can replace missing values with median of the columns they are in.
Utterly simple! First we get rid of non-numerical attributes, then we just fit the imputer with our data and transform our data with missing values replaced with median. Lastly, we transform our replaced dataset back to pandas dataframe for consistency.
This step is even simpler!
fit_transform() the dataset we have. Done!