Poker-Hand Prediction

Source: Deep Learning on Medium

IMPORTING ALL THE LIBRARIES

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

DATA PRE-PROCESSING

data_train=pd.read_csv("poker-hand-training-true.data",header=None)
data_test = pd.read_csv("poker-hand-testing.data",header=None)
col=['Suit of card #1','Rank of card #1','Suit of card #2','Rank of card #2','Suit of card #3','Rank of card #3','Suit of card #4','Rank of card #4','Suit of card #5','Rank of card 5','Poker Hand']
data_train.columns=col
data_test.columns=col
y_train=data_train['Poker Hand']
y_test=data_test['Poker Hand']
y_train=pd.get_dummies(y_train)
y_test=pd.get_dummies(y_test)
x_train=data_train.drop('Poker Hand',axis=1)
x_test=data_test.drop('Poker Hand',axis=1)
print('Shape of Training Set:',x_train.shape)
print('Shape of Testing Set:',x_test.shape)
>>Shape of Training Set: (25010, 10)
>>Shape of Testing Set: (1000000, 10)

1.NEURAL NETWORK

A neural network is a progression of algorithms that attempts to perceive fundamental connections in a lot of information through a procedure that copies the manner in which the human brain works. Neural network can adjust to changing input; so the network produces the most ideal outcome without expecting to redesign the output criteria.

To create NN we used Keras library which is a high-level API wrapper for the low-level API, capable of running on top of TensorFlow, CNTK, or Theano.

My neural network architecture comprised of 3 dense layers with respectively 15,10 and 10 nodes in each layer.

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras import regularizers


model = Sequential()
model.add(Dense(15, activation='relu', input_dim=10))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])


history = model.fit(x_train, y_train, epochs = 10, batch_size = 256, verbose=1,validation_data=(x_test,y_test),shuffle=True)


score = model.evaluate(x_test, y_test, batch_size=256)
##OUTPUTTrain on 25010 samples, validate on 1000000 samples
Epoch 1/10
25010/25010 [==============================] - 4s 156us/step - loss: 0.3935 - acc: 0.8842 - val_loss: 0.3070 - val_acc: 0.8999
Epoch 2/10
25010/25010 [==============================] - 4s 147us/step - loss: 0.2283 - acc: 0.8996 - val_loss: 0.1818 - val_acc: 0.8997
Epoch 3/10
25010/25010 [==============================] - 4s 145us/step - loss: 0.1790 - acc: 0.8998 - val_loss: 0.1767 - val_acc: 0.9000
Epoch 4/10
25010/25010 [==============================] - 4s 143us/step - loss: 0.1758 - acc: 0.9001 - val_loss: 0.1750 - val_acc: 0.9001
Epoch 5/10
25010/25010 [==============================] - 4s 149us/step - loss: 0.1748 - acc: 0.9000 - val_loss: 0.1743 - val_acc: 0.9000
Epoch 6/10
25010/25010 [==============================] - 4s 144us/step - loss: 0.1743 - acc: 0.9001 - val_loss: 0.1740 - val_acc: 0.9000
Epoch 7/10
25010/25010 [==============================] - 4s 146us/step - loss: 0.1740 - acc: 0.9003 - val_loss: 0.1737 - val_acc: 0.9001
Epoch 8/10
25010/25010 [==============================] - 4s 147us/step - loss: 0.1738 - acc: 0.9002 - val_loss: 0.1734 - val_acc: 0.9000
Epoch 9/10
25010/25010 [==============================] - 4s 146us/step - loss: 0.1735 - acc: 0.9005 - val_loss: 0.1732 - val_acc: 0.9003
Epoch 10/10
25010/25010 [==============================] - 4s 151us/step - loss: 0.1734 - acc: 0.9004 - val_loss: 0.1730 - val_acc: 0.9004
1000000/1000000 [==============================] - 3s 3us/step

Validation accuracy is coming out to be 90.04% .

Now experimenting and comparing with other classification models…

2.LOGISTIC REGRESSION

Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

I used the Sci-kit Learn Library to import all algorithms and employed the Logistic Regression method of model selection to use Logistic Regression Algorithm.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
clf = LogisticRegression(random_state=0, solver='lbfgs',max_iter=100,multi_class='ovr').fit(x_train, y_train)
y_pred=clf.predict(x_test)
accuracy_score(y_pred,y_test)
##OUTPUT0.501209

As you can see the validation accuracy is surprisingly low as compared to that of Neural Network.

Validation accuracy is 50.12%!!

3.Classification And Regression Trees for Machine Learning(CART)

Classification and Regression Trees or CART for short is a term introduced by Leo Breiman to refer to Decision Tree algorithms that can be used for classification or regression predictive modeling problems.

Classically, this algorithm is referred to as “decision trees”, but on some platforms like R they are referred to by the more modern term CART.

The CART algorithm provides a foundation for important algorithms like bagged decision trees, random forest and boosted decision trees.

I used the Sci-kit Learn Library to import all algorithms and employed the Decision Tree method of model selection to use Decision Tree Algorithm.

from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier(random_state=0,max_depth = 2)
decision_tree = decision_tree.fit(x_train,y_train)
y_pred = decision_tree.predict(x_test)
accuracy_score(y_pred,y_test)
##OUTPUT0.501209

Again the validation accuracy is surprisingly low and is very similar to that of Logistic Regression

Validation accuracy is 50.12%!!

AND FINALLY…

4.Support Vector Machine(SVM)

SVM is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate.

from sklearn import svm
clf = svm.LinearSVC()
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
accuracy_score(y_pred,y_test)
##OUTPUT0.431699

The validation accuracy achieved here is 43.16%!

HERE IS MY COMPLETE CODE:

In the end, the Neural Network using Keras Library enables us to produce the most accurate results above all!

This is one of my first applications in machine learning. Thank you for reading my article, and I hope I was able to help and inspire students like myself!