Source: Deep Learning on Medium

Many deep learning tutorials are designed with two datasets: one for training and one for validation. We train our model on the training set, and evaluate the model on the validation set. The “Hello World” datasets, such as MNIST, Cifar10, and so on, all come with two datasets. What if we have only one big training set? In reality, this is the case most of the time: you go around collecting and labeling data, and you end up with a dataset, not two separate ones.

The typical solution in most deep learning tutorials is to somehow split the dataset into two, one for training and one for validation. In a machine learning course that doesn’t specifically focus on deep neural networks, we almost always do *cross-validation*: that is, we shuffle the data and split it into *k* partitions called *folds*. Let’s say *k* is 5. Then, each time we take 4 folds as the training set, and the remaining one as the validation set, like the following:

If the dataset fits in memory, cross-validation can be done easily with Scikit-Learn. How about a large dataset that doesn’t fit in memory? Fenwicks get you covered. In this tutorial, we do cross-validation using Tensorflow’s Flower dataset. Let’s first download it and decompress it:

data_dir_local = fw.datasets.untar_data(

fw.datasets.URLs.FLOWER_PHOTOS, './flower_photos')

Similar to the last tutorial, we use transfer learning and fine-tune a model pre-trained on ImageNet. Here, we use a simple structure, with only one Dense layer on top of the base model:

classTransferLearningNet(tf.keras.Model):def__init__(self, base_model_func, num_cls):

super().__init__()

self.base_model = base_model_func()

self.flatten = tf.keras.layers.Flatten()

self.linear = tf.keras.layers.Dense(num_cls, use_bias=False)defcall(self, x):returnself.linear(self.flatten(self.base_model(x)))

For the base model, we again use Google’s BFN, InceptionResNetV2. Many tutorials use smallish networks such as VGG16 and ResNet50. I personally like big guns such as InceptionResNetV2. It is slower, but it gives me the peace of mind that I’m already using the strongest base model. Similar to our last tutorial, we’ll train our model with the popular Adam optimizer and cosine learning rate schedule.

get_model =lambda: TransferLearningNet(

fw.keras_models.get_InceptionResNetV2, len(labels))

opt_func = fw.train.adam_sgdr_one_cycle(total_steps)

model_func = fw.tpuest.get_clf_model_func(get_model, opt_func)

Next, let’s build our input pipelines for cross-validation. We do 5-fold CV; so, the size of the validation set `val_sz`

should be 1/5 of the size of the entire dataset, `data_sz`

. One small complication is that we put the validation set in a single batch, and to use the TPU our batch size has to be a multiple of 8, as the TPU has 8 cores. So, we round `val_sz`

to the nearest multiple of 8, as follows.

val_sz = data_sz // 5 // 8 * 8

trn_sz = data_sz - val_sz

In our first model, the first fold (Fold 0) is our validation set, and the rest (Folds 1–4) are our training set:

train_input_func = lambda params: fw.io.tfrecord_ds(data_fn,

parser_train, params['batch_size'],n_folds=5, val_fold_idx = 0,

training=True)

valid_input_func = lambda params: fw.io.tfrecord_ds(data_fn,

parser_eval, params['batch_size'],n_folds=5, val_fold_idx = 0,

training=False)

Now we train and evaluate our `TPUEstimator`

with the above inputs:

est = fw.tpuest.get_tpu_estimator(trn_sz, val_sz, model_func,

work_dir, ws_dir, ws_vars, BATCH_SIZE)

est.train(train_input_func, steps=total_steps)

result0 = est.evaluate(input_fn=valid_input_func, steps=1)

We get 95.8% accuracy. Not bad for a first try. Next we move on to use the second fold (Fold 1) as the validation set:

train_input_func = lambda params: fw.io.tfrecord_ds(data_fn,

parser_train, params['batch_size'], n_folds=5,val_fold_idx = 1,

training=True)

valid_input_func = lambda params: fw.io.tfrecord_ds(data_fn,

parser_eval, params['batch_size'], n_folds=5,val_fold_idx = 1,

training=False)

After that, we continue to use Folds 2, 3, and 4 as the validation set, respectively. After we finish all 5 folds, we’ll have a thorough evaluation of our model, with 5 different accuracy values. We have also built 5 instances of our model. In a deep learning competition, a common trick is to build an enemble with these 5 models to get slightly better test accuracy.

Here is the complete notebook: