The fastML Guide

Original article was published on Artificial Intelligence on Medium


Getting Started

fastML is published to pypi which makes it easy to install locally using the python package installer, pip. To install FastML for local development, make sure you have python and pip installed and added to path. You can check the Python docs for package installation if you need help here.

To install fastML, open your terminal (Linux / mac) or command prompt (windows) and enter the command:

pip install fastML

Using fastML

For this guide, I am going to teach you how to use the fastML package in your project by using an example with the popular Iris dataset. The first thing to do when working on any project will be to import the libraries and packages needed for the project.

##importing needed libraries and packages including fastMLfrom fastML import fastML
from sklearn import datasets

The next thing to do now will be to load the Iris dataset into our project for use.

##loading the Iris datasetdf = datasets.load_iris()

Since the Iris dataset already comes pre-processed, there won’t be any need to process our data again. However, for the data that you are going to use in your own project, you have to make sure that the data is well processed so as to avoid coming across errors and undesired outputs in your project.

The next thing to do now will be to prepare the data for training and testing and assigning the desired columns of the data as feature and target columns.

##assigning the desired columns to X and Y in preparation for running fastMLX = df.data[:, :4]
Y = df.target

Depending on the kind of data you have, your target data may mostly need to be encoded. Encoding your target data is good because it picks up values that can explain the target data and help the machine learning algorithms understand what the target data really is. Encoding target data is also very easy to do using the fastML python package.

##importing the encoding function from the fastML package and running the EncodeCategorical function from fastML to handle the process of categorial encodingfrom fastML import EncodeCategorical
Y = EncodeCategorical(Y)

Next, we assign the desired test_size value to the variable ‘size’.

size = 0.3

The final thing to do is know all the algorithms we want to test our data with and import all of them in our project. fastML comes with a prepared neural net classifier built with keras for deep learning classification. You can import all the algorithms you want including the neural net classifier into your project. For example:

##importing the desired algorithms into our projectfrom sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
##importing the neural net classifier from fastMLfrom nnclassifier import neuralnet

Finally, we run the main fastML function. This function comes with the flexibilty of giving you the liberty to tune hyper-parameters of any individual algorithm.

## running the fastML function from fastML to run multiple classification algorithms on the given datafastML(X, Y, size, SVC(), RandomForestClassifier(), DecisionTreeClassifier(), KNeighborsClassifier(), LogisticRegression(max_iter = 7000), special_classifier_epochs=200,special_classifier_nature ='fixed',          include_special_classifier = True)

Here’s a similar output of the expected outcome after running the main fastML function:

Using TensorFlow backend.


__ _ __ __ _
/ _| | | | \/ | |
| |_ __ _ ___| |_| \ / | |
| _/ _` / __| __| |\/| | |
| || (_| \__ \ |_| | | | |____
|_| \__,_|___/\__|_| |_|______|



____________________________________________________
____________________________________________________
Accuracy Score for SVC is
0.9811320754716981


Confusion Matrix for SVC is
[[16 0 0]
[ 0 20 1]
[ 0 0 16]]


Classification Report for SVC is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 1.00 0.95 0.98 21
2 0.94 1.00 0.97 16

accuracy 0.98 53
macro avg 0.98 0.98 0.98 53
weighted avg 0.98 0.98 0.98 53



____________________________________________________
____________________________________________________
____________________________________________________
____________________________________________________
Accuracy Score for RandomForestClassifier is
0.9622641509433962


Confusion Matrix for RandomForestClassifier is
[[16 0 0]
[ 0 20 1]
[ 0 1 15]]


Classification Report for RandomForestClassifier is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 0.95 0.95 0.95 21
2 0.94 0.94 0.94 16

accuracy 0.96 53
macro avg 0.96 0.96 0.96 53
weighted avg 0.96 0.96 0.96 53



____________________________________________________
____________________________________________________
____________________________________________________
____________________________________________________
Accuracy Score for DecisionTreeClassifier is
0.9622641509433962


Confusion Matrix for DecisionTreeClassifier is
[[16 0 0]
[ 0 20 1]
[ 0 1 15]]


Classification Report for DecisionTreeClassifier is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 0.95 0.95 0.95 21
2 0.94 0.94 0.94 16

accuracy 0.96 53
macro avg 0.96 0.96 0.96 53
weighted avg 0.96 0.96 0.96 53



____________________________________________________
____________________________________________________
____________________________________________________
____________________________________________________
Accuracy Score for KNeighborsClassifier is
0.9811320754716981


Confusion Matrix for KNeighborsClassifier is
[[16 0 0]
[ 0 20 1]
[ 0 0 16]]


Classification Report for KNeighborsClassifier is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 1.00 0.95 0.98 21
2 0.94 1.00 0.97 16

accuracy 0.98 53
macro avg 0.98 0.98 0.98 53
weighted avg 0.98 0.98 0.98 53



____________________________________________________
____________________________________________________
____________________________________________________
____________________________________________________
Accuracy Score for LogisticRegression is
0.9811320754716981


Confusion Matrix for LogisticRegression is
[[16 0 0]
[ 0 20 1]
[ 0 0 16]]


Classification Report for LogisticRegression is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 1.00 0.95 0.98 21
2 0.94 1.00 0.97 16

accuracy 0.98 53
macro avg 0.98 0.98 0.98 53
weighted avg 0.98 0.98 0.98 53



____________________________________________________
____________________________________________________
Included special classifier with fixed nature
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 4) 20
_________________________________________________________________
dense_2 (Dense) (None, 16) 80
_________________________________________________________________
dense_3 (Dense) (None, 3) 51
=================================================================
Total params: 151
Trainable params: 151
Non-trainable params: 0
_________________________________________________________________
Train on 97 samples, validate on 53 samples
Epoch 1/200
97/97 [==============================] - 0s 1ms/step - loss: 1.0995 - accuracy: 0.1443 - val_loss: 1.1011 - val_accuracy: 0.3019
97/97 [==============================] - 0s 63us/step - loss: 0.5166 - accuracy: 0.7010 - val_loss: 0.5706 - val_accuracy: 0.6038
Epoch 100/200
97/97 [==============================] - 0s 88us/step - loss: 0.5128 - accuracy: 0.7010 - val_loss: 0.5675 - val_accuracy: 0.6038
Epoch 200/200
97/97 [==============================] - 0s 79us/step - loss: 0.3375 - accuracy: 0.8969 - val_loss: 0.3619 - val_accuracy: 0.9057
97/97 [==============================] - 0s 36us/step
____________________________________________________
____________________________________________________
Accuracy Score for neuralnet is
0.8969072103500366


Confusion Matrix for neuralnet is
[[16 0 0]
[ 0 16 5]
[ 0 0 16]]


Classification Report for neuralnet is
precision recall f1-score support

0 1.00 1.00 1.00 16
1 1.00 0.76 0.86 21
2 0.76 1.00 0.86 16

accuracy 0.91 53
macro avg 0.92 0.92 0.91 53
weighted avg 0.93 0.91 0.91 53



____________________________________________________
____________________________________________________
Model Accuracy
0 SVC 0.9811320754716981
1 RandomForestClassifier 0.9622641509433962
2 DecisionTreeClassifier 0.9622641509433962
3 KNeighborsClassifier 0.9811320754716981
4 LogisticRegression 0.9811320754716981
5 neuralnet 0.8969072103500366

With this output, we can determine what algorithm will best suit our use case and select that algorithm for further development and deployment.

fastML is free and open source and you can find the source code and test file on Github. Our team of contributors are readily available any time you can across a problem or bug using fastML. Also, you can submit an issue of a bug or feature update you’d like us to implement and we’ll get it done. Check out the project and don’t forget to leave a star if you like the project.