Artificial Neural Network(ANN) with Keras simplified, Use Case : if student pass the exam ?(Code

Source: Deep Learning on Medium

Artificial Neural Network(ANN) with Keras simplified, Use Case : if student pass the exam ?(Code part only)

Image by : Monoar Rahman Rony, Pixabay

Prerequisite : Jupyter notebook or Google Colab or something which support python, now we have plenty of such tools.

Before the battle

Data set:

Most of the columns are self explanatory, its simple student data across three schools in Delhi : Kendriya Vidyalaya, Govt Primary School and Navodaya Vidyalaya

PS: Vidyalaya means School in Hindi and its imaginary data for reference only 😊

Problem Statement:

Based on the previous records, create a Deep Learning based predictor who will help us to identify if the student is potentially going to fail this year, so the teacher can put more focus on that group of students.


ANN theory:

Keras theory:

Prepare the horses

for any library mentioned in this document is available on your machine/tool like

! pip install keras
conda install -c conda-forge tensorflow


Feature Pre-Processing

Input variables (X) and an output variable (y), this is something like

y = f(X)

X=df_all_student.iloc[:, 2:12]
y=df_all_student.iloc[:, 12]

We all know that that all the Machine Learning/Deep Learning (ML/DL) works on numeric data but ‘School’ and ‘Gender’ are text data, so we need to encode the text data to numeric and we know that sklearn will do this work for us

You can see both ‘School’ and ‘Gender’ are now numeric but we are trapped in new problem. Based on numeric value, this can easily confuse our model if we have some order or hierarchy which is not true. So here ‘OneHotEncoder’ will help us which splits the column into multiple columns. The numbers are replaced by 1s and 0s, depending on which column has what value.

As you can see in the constructor, we specify which column has to be one hot encoded, [1] in this case.

Now we can fall into ‘dummy variable trap’ i.e. the independent variables are multi-collinear — two or more variables are highly correlated. Solution : just drop one variable like

X = X[:, 1:]

Now, all set with data so we can split training and testing data set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

80% training, 20 % test and mentioning random_state means training and test data will be same every time, if not mention random_state then it will not be deterministic or different next run.

In normal ML life cycle, we standardize or normalize the data so most of them will be in same range

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

ANN in action

This is an artistic thing where you need to create first ANN schema/graph and then hyper-tuning. there is no formula and most of things is hit and trial. We only have few recommendations and rest everything is artistic here.

Initialize basic Keras Sequential model (output of each layer is input to the next layer of our implementation)

import keras
from keras.models import Sequential
cf = Sequential()

Adding first input layer and first hidden layer

from keras.layers import Densecf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 11))

Dense: fully connected layer in a sequential mode, implementation of the equation output = activation(dot(input, kernel) + bias)

This means that we are taking the dot product between our input tensor and whatever the weight kernel matrix is featured in our dense layer.


Units: it denotes the output size of the layer, normally average of no of node in input layer (no of independent variable) which is 11 and no of node in output layer which is 1, we took 6 as average.

Kernel_initializer : The initializer parameters tell Keras how to initialize the values of our layer, weight matrix and our bias vector

Activation: Element-wise activation function to be used in the dense layer. read more about Rectified Linear Unit (ReLU)

Input_dim: for first layer only, number of input independent variable. only for first hidden layer

Bias : if we are going with advance implementation


to avoid over-fitting, dropout is a technique where randomly selected neurons are ignored during training

cf.add(Dropout(rate = 0.1))

here we are dropping 10% of neuron randomly

Middle Layer and final one

cf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’))
cf.add(Dropout(rate = 0.1))
cf.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))

last layer activation function is different than previous one. Here normally use ‘sigmoid’ for boolean and ‘softmax’ for multi class.


from ann_visualizer.visualize import ann_viz;
ann_viz(network, title="");

a pdf will open like


cf.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])

you need to configure the learning process

  • An optimizer : update the weight parameters to minimize the loss function..
  • A loss function : acts as guides to the terrain telling optimizer if it is moving in the right direction to reach the bottom of the valley, the global minimum.
  • A list of metrics: A metric function is similar to a loss except that the results from evaluating a metric are not used when training the model.

Keras has provided multiple existing option for each parameter and someone can override it too

Fit, y_train, batch_size = 10, epochs = 100)

Actual training based on the classifier,

The batch size: hyper-parameter related to sample

epochs: hyper-parameter related to iteration

more detail:

you will get this type of result, this dataset is not actual so getting this accuracy 😁😁

Epoch 99/100
8000/8000 [==============================] - 2s 221us/step - loss: 0.6898 - accuracy: 0.5375
Epoch 100/100
8000/8000 [==============================] - 2s 225us/step - loss: 0.6900 - accuracy: 0.5381

Prediction of test result

y_prediction =cf.predict(X_test)

you can get any specific student prediction like this cf.predict(X_test[0:1,:]) or pass the same shape and normalized array to get new student prediction.

Cross Validation

kFold CrossValidation

Our model test and training data may be biased, so cross validation techniques used for better model performance measurement . In K-fold, is when the dataset is randomly split up into ‘k’ groups. One of the groups is used as the test set and the rest are used as the training set.

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
def kera_classifier():
cf = Sequential()
cf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 11))
cf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’))
cf.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))
cf.compile(optimizer = ‘adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])
return cf
cf = KerasClassifier(build_fn = kera_classifier, batch_size = 10, epochs = 100)
accuracies = cross_val_score(estimator = cf, X = X_train, y = y_train, cv = 10, n_jobs = -1)
mean = accuracies.mean()
variance = accuracies.std()

same code but used sklearn to use its capabilities for k-fold validation

Grid Search Cross Validation

Using this, you can automatically do hyper tuning, like you will provide multiple optimizer, epochs, batch size combination and it will automatically create all permutation from them run each of them and finally show you the final best parameters and you can use it for your final production project. It will reduce a lot of time manual, and kind of automation within machine learning

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
def kera_classifier(optimizer):
cf = Sequential()
cf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = 11))
cf.add(Dense(units = 6, kernel_initializer = ‘uniform’, activation = ‘relu’))
cf.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’))
cf.compile(optimizer = optimizer, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])
return cf
cf = KerasClassifier(build_fn = kera_classifier)
parameters = {‘batch_size’: [10, 15],
‘epochs’: [10, 50],
‘optimizer’: [‘adam’, ‘rmsprop’]}
gv_search = GridSearchCV(estimator = cf,
param_grid = parameters,
scoring = ‘accuracy’,
cv = 10)
gv_search =, y_train)
best_param = gv_search.best_params_
best_acc = gv_search.best_score_

The complete solution is available at :

I tried to be accurate as much as possible, still if you see any issue please let me know. Enjoy learning !!!