Breast Cancer diagnosis using Deep Learning & The use of ln(Natural Logarithm) in selecting the…

Source: Deep Learning on Medium

Breast Cancer diagnosis using Deep Learning & The use of ln(Natural Logarithm) in selecting the depth and the width of the hidden layers for baseline models

Introduction:

Came across a bone marrow microscopy sample report in pdf format. It has got many categorical/numerical data items using which diagnosis is done. Just to get hands-on on a deep learning project which has a similar data, Breast Cancer Wisconsin (Diagnostic) Data Set [1] from University of California, Irvine, Machine Learning Repository is chosen.

The following single row is how the breast cancer data looks like in wdbc.data [2]:

842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189

The first column(842302) in the above row is the patient ID. The second one (M) is the diagnosis. It can be either M for malignant or B for bening. And the rest of the data items are features. The file wdbc.names [3] has further information.

Observations:

Procedure:

The following steps are done to get to a baseline model.

Step 1:

  • Read the data file which is in csv format
  • Consider csv as having data of patients in each line and each line can be considered to have columns
  • Label the columns appropriately
  • Make the diagnosis as a numeric value so that we can work on numeric data

Step 2:

  • Separate features and output(diagnosis)

Step 3:

  • Split data into test and train data sets
  • The data sets are standardized to properly learn from the features.

Step 4:

  • The input layer containing NUM_FEATURES=30 number of units and output layer containing 1 unit is taken.
  • Many configurations having different number of units and layers are taken. By considering an analogy from computer network’s hierarchical routing where optimality is achieved by a N router network by having ln N (ln is a Natural Logarithm) number of levels[4], and without any other theoretical or empirical justifications in Deep Learning, calculated the number of layers and units per layer for a given number of units.

So, for a given number of units, total_units:

num_layers = ln total_units

num_units_per_layer = total_units / num_layers

Figure(1)
  • A few configurations are shown below:
Figure(2)
  • The average accuracy in 20 runs is calculated for unit sequences generated by the code:
Figure(3): num is the number of iterations

and the top performing nine from a sequence of 100 units are shown below:

Figure(4): Here column number indicate the iteration and the No. Of layers and No. Of units per layer corresponding to iteration number are given in Fig(1)
  • All the layers use ‘relu’ activation function except the output layer, which uses ‘sigmoid’ activation function
  • The model uses ‘binary_crossentropy’ as the loss function and ‘adam’ as optimizer

Step 5:

  • Three models are built and the ROC(/Magnified ROC) curves are drawn with three models per graph and the corresponding AUC are shown for comparision:
Figure(5): Here model name is Keras_<iteration number>
  • Model with code 34 having 7 layers and 184 units per layer is chosen as best layer based on its average accuracy.

Conclusion:

Experience is gained in creating a baseline model given a non-image data related to a disease. Since we got the data in a format that is easily consumed with pandas, the experiment took less time. But if the laboratory reports of a disease are obtained directly from the hospital in the form of pdf or other file formats, some extra effort to identify features, to write source code for extraction of features from those files, and to encode non-numerical data of the extracted features is needed. And also obtained a reasonable baseline with width and depth calculated using the mentioned way.

BIBLIOGRAPHICAL NOTES

Much of machine learning, from the most basic techniques to the state-of-the-art algorithms presented at research conferences, is statistical in flavor.”-Roger Grosse [5]

To get an understanding of Statistical Learning(SL), [6] is a wonderful and book which is free to download. All of the course material, especially the lecture notes and slides related to Roger Grosse’s course [7] are lucid and very useful to learn Deep Learning(DL). To get quick exposure in solving SL and DL problems, [8], [9], and [10] are found to be good. For python programming, [11] is used as a reference. [12], [13], [14], [15] sites have documentation of python libraries useful for machine learning, which can be referenced during programming or while reading programs.

REFERENCES

  1. UCI Machine Learning Repository, Breast Cancer Wisconsin (Diagnostic) Data Set: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
  2. UCI Machine Learning Repository, Data from the Breast Cancer Wisconsin (Diagnostic) Data Set: https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wpbc.data
  3. UCI Machine Learning Repository, Information related to the Breast Cancer Wisconsin (Diagnostic) Data Set: https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wpbc.names
  4. Hierarchical routing for large networks:Performance evaluation and optimization:doi=10.1.1.6.4852
  5. Course Notes: Introduction, CSC 421/2516 Winter 2019, Neural Networks and Deep Learning: http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/readings/L01%20Introduction.pdf
  6. An Introduction to Statistical Learning with application in R: http://faculty.marshall.usc.edu/gareth-james/ISL/ISLR%20Seventh%20Printing.pdf
  7. CSC 421/2516 Winter 2019, Neural Networks and Deep Learning: http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/
  8. Your First Machine Learning Project in Python Step-By-Step: https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
  9. How to develop a CNN for MNIST handwritten digit classification: https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/
  10. How to develop a CNN from scratch for CIFAR-10 photo classification: https://machinelearningmastery.com/how-to-develop-a-cnn-from-scratch-for-cifar-10-photo-classification/
  11. Fundamentals of Python programming: https://python.cs.southern.edu/pythonbook/pythonbook.pdf
  12. pandas, Python Data Analysis Library: https://pandas.pydata.org/
  13. numpy, NumPy is the fundamental package for scientific computing with Python: https://numpy.org/
  14. scikit-learn, Machine Learning in Python: https://scikit-learn.org/stable/
  15. Keras: The Python Deep Learning Library: https://keras.io/

APPENDIX: A

System Configuration:

Hardware & OS:

Software:

Python 3 and the libraries mentioned in the code are latest.of ln in selecting the depth and the width of the hidden layers

CODE

names = ['ID', 'DIAG', 'F1', 'F2', 'F3', 'F4', 'F5', 'F6', 'F7', 'F8', 'F9', 'F10', 'F11', 'F12', 
'F13', 'F14', 'F15', 'F16', 'F17', 'F18', 'F19', 'F20', 'F21', 'F22', 'F23', 'F24', 'F25',
'F26', 'F27', 'F28', 'F29', 'F30']
###############
import pandas
###############
# Constant defining the number of features
NUM_FEATURES = 30
wdbc_df = pandas.read_csv('wdbc.data', names = names)
wdbc_df = wdbc_df.replace({'M' : 1, 'B' : 0})
# Get numpy array and slice out features and output
array = wdbc_df.values
X = array[:, 2:32]
Y = array[:, 1]
###############
from sklearn import model_selection
###############
# split data
test_size = 0.20
seed = 7
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size = test_size, random_state = seed)
###############
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import scale
###############
# std: standardization => 0 mean unit variance
def scale_input(data, sctype):
if(sctype == "mms"):
scaler = MinMaxScaler()
scaler.fit(data)
data = scaler.transform(data)
elif(sctype == "std"):
data = scale(data)

return data
###############
from keras.models import Sequential
from keras.layers import Dense
###############
def get_dl_model(num_layers, num_units_per_layer, x_train_data, y_train_data):
model = Sequential()

model.add(Dense(NUM_FEATURES, input_dim = NUM_FEATURES, activation = 'relu'))

num_layers = num_layers - 1
for i in range(num_layers):
model.add(Dense(num_units_per_layer, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer = 'adam')
history = model.fit(x=x_train_data, y=y_train_data, batch_size=None, epochs=400, validation_split=0.20, shuffle=True, verbose = 0)

return model, history
###############
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
import matplotlib.pyplot as plt
###############
# This function heavily depended on the code from:
# https://www.dlology.com/blog/simple-guide-on-how-to-generate-roc-plot-for-keras-classifier/
###############
def plot_roc_curves_comp(models):
xte = scale_input(X_test, "std")
y_preds_keras = []
plot_ip = []
for _, model in models:
y_preds_keras.append(model.predict(xte).ravel())
plot_ip.append([0, 1])
fprs_keras = []
tprs_keras = []
thresholds_keras = []
for y_pred_keras in y_preds_keras:
temp_fpr_keras, temp_tpr_keras, temp_threshold_keras = roc_curve(Y_test, y_pred_keras)
fprs_keras.append(temp_fpr_keras)
tprs_keras.append(temp_tpr_keras)
thresholds_keras.append(temp_threshold_keras)
# AUC value can also be calculated like this.
aucs_keras = []
for i in range(len(fprs_keras)):
aucs_keras.append(auc(fprs_keras[i], tprs_keras[i]))

for i in range(len(aucs_keras)):
print("AUC VALUES" + str(i)+ ": " + str(aucs_keras[i]) + str(i))
plt.figure(1)
plt.plot(plot_ip, 'k--')
for i in range(len(fprs_keras)):
plt.plot(fprs_keras[i], tprs_keras[i], label=models[i][0] +'(AUC_' + models[i][0] + ' = {:.3f})'.format(aucs_keras[i]))

plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()
# Zoom in view of the upper left corner.
plt.figure(2)
plt.xlim(0, 0.2)
plt.ylim(0.8, 1)
plt.plot(plot_ip, 'k--')
for i in range(len(fprs_keras)):
plt.plot(fprs_keras[i], tprs_keras[i], label=models[i][0] +'(AUC_' + models[i][0] + ' = {:.3f})'.format(aucs_keras[i]))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve (zoomed in at top left)')
plt.legend(loc='best')
plt.show()
# scale training data and get dl models
xtr = scale_input(X_train, "std")
model34, h = get_dl_model(7, 184, xtr, Y_train)
model02, _ = get_dl_model(6, 55, xtr, Y_train)
model13, _ = get_dl_model(6, 110, xtr, Y_train)
# plot ROC curves
models = []
models.append(('Keras_34', model34))
models.append(('Keras_02', model02))
models.append(('Keras_13', model13))
plot_roc_curves_comp(models)