AUTO ML in 3 clicks

Source: Deep Learning on Medium

Libraries Definition:

Library definiton

The above lines are to import the necessary module for python to execute the classification.

  • Tkinter is the GUI library for python
  • Pandas is for data frame and manipulation
  • Sklearn is for classification

Getting user inputs:

Getting input from user
User interface asking file to be selected

This snippet of code creates a file browse window through which the user can select the excel sheet from which data needs to get imported. It also prints the file name in the console.

Sample excel file for input.

Sample input file

Converting excel to dataframe:

Converting excel to dataframe:

This part of the code reads the excel file and converts it into a dataset of data frame format. The column list gives the list of the column names.

Selecting column name from dropdown
Drop down selection for user

To predict a particular column, the user needs to select which column needs to predict. For this reason, the above code generates a Drop-Down menu for the user to choose the column which needs to get predicted.

Clicking”OK” to finailse input

Once selecting a column in the drop-down menu, the user needs to click “OK,” and the column name chosen is stored in var.

Model Training:

Splting data

To predict column names, it needs to be available for selection. It is done by “dataset.drop.”The dropped column is selected as the target gets predicted.

Encoding data

In general, classification can accept only numerical values as inputs. Still, there are cases in which the data from the excel can be of string. To solve this, we can use an encoder method where the lines are encoded to numerical values and get decoded at the prediction side. There are different types of encoders based on the datatype. I tried to keep it simple to one encoder.

Spliting training and testing data

The training data and testing data need to get split for better training in this example. We will split it to 75:25. We define the classification method as DecisionTreeClassifier. As I tried to keep it simple, I didn’t expand on the options to the user where the user can select the classification method needed.

Defining Datatypes

In this loop, we iterate the column header asking the user to provide values and store them in Obs_emp. Currently, the code supports the datatypes of Objects, Int, and Float.

Concatenation

The user input is converted into a data frame for easy concatenation so that it gets tested for prediction with classifier.

Also, the user inputs are transformed based on the encoding logic to convert strings to integers.

Checking Accuracy

In this line, we check the accuracy of the prediction model against the test data.

Prediciting result

In this part, we give the user input data frame as validation data and let the model predict the output.

Conclusion and future enhancements:

This project gave me a general idea of Machine learning. The future enhancements on the project can be.

  1. Datatypes need to be expanded to include all possible data types.
  2. The decision tree algorithm is the defacto algorithm used. The option of having other algorithms needs consideration.