Source: Deep Learning on Medium
The above lines are to import the necessary module for python to execute the classification.
- Tkinter is the GUI library for python
- Pandas is for data frame and manipulation
- Sklearn is for classification
Getting user inputs:
This snippet of code creates a file browse window through which the user can select the excel sheet from which data needs to get imported. It also prints the file name in the console.
Sample excel file for input.
Converting excel to dataframe:
This part of the code reads the excel file and converts it into a dataset of data frame format. The column list gives the list of the column names.
To predict a particular column, the user needs to select which column needs to predict. For this reason, the above code generates a Drop-Down menu for the user to choose the column which needs to get predicted.
Once selecting a column in the drop-down menu, the user needs to click “OK,” and the column name chosen is stored in var.
To predict column names, it needs to be available for selection. It is done by “dataset.drop.”The dropped column is selected as the target gets predicted.
In general, classification can accept only numerical values as inputs. Still, there are cases in which the data from the excel can be of string. To solve this, we can use an encoder method where the lines are encoded to numerical values and get decoded at the prediction side. There are different types of encoders based on the datatype. I tried to keep it simple to one encoder.
The training data and testing data need to get split for better training in this example. We will split it to 75:25. We define the classification method as DecisionTreeClassifier. As I tried to keep it simple, I didn’t expand on the options to the user where the user can select the classification method needed.
In this loop, we iterate the column header asking the user to provide values and store them in Obs_emp. Currently, the code supports the datatypes of Objects, Int, and Float.
The user input is converted into a data frame for easy concatenation so that it gets tested for prediction with classifier.
Also, the user inputs are transformed based on the encoding logic to convert strings to integers.
In this line, we check the accuracy of the prediction model against the test data.
In this part, we give the user input data frame as validation data and let the model predict the output.
Conclusion and future enhancements:
This project gave me a general idea of Machine learning. The future enhancements on the project can be.
- Datatypes need to be expanded to include all possible data types.
- The decision tree algorithm is the defacto algorithm used. The option of having other algorithms needs consideration.