Original article was published by karan sindwani on Deep Learning on Medium
How to train an Image Classifier on TFRecord files
Improving the performance of pipelines using TFRecords
Introduction to TFRecords
TFRecords store a sequence of binary records, which are read linearly. They are useful format for storing data because they can be read efficiently. You can learn more about TFRecords here.
A binary file format for storage while working with voluminous data can have a significant impact on the performance of import pipeline and as a consequence on the training time of your model. Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk. This is especially true if your data is stored on spinning disks, due to the much lower read/write performance in comparison with SSDs.
Apart from performance, TFRecords are optimized for use with Tensorflow in multiple ways. Firstly, it makes it easy to combine multiple datasets and integrates seamlessly with the data import and preprocessing functionality provided by the library. Especially for datasets that are too large to be stored fully in memory this is an advantage as only the data that is required at the time (e.g. a batch) is loaded from disk and then processed. Another major advantage of TFRecords is that it is possible to store sequence data — for instance, a time series or word encodings — in a way that allows for very efficient and (from a coding perspective) convenient import of this type of data.
The next few sections focus on converting an image dataset to tfrecords, loading the data, model training and prediction.
For this illustration, I shall be using a ubiquitous image dataset CIFAR-10.More details on the same can be found here. The code snippet below loads the CIFAR10 dataset directly from tensorflow.
Convert dataset to TFRecords
A TFRecord file stores the data as a sequence of binary strings. We need to specify the structure of our data before writing it to the file.
We shall be using tf.train.Example for the same. While writing an image to a TFRecord, we need the image itself and the corresponding label . In the code snippet below , we are defining two features namely image and label within the tf.train.Example.
The image feature stores the image array as bytes and the label feature stores the label as int. You may choose to store the label as string also or you could store additional information such as height, width and depth.
Load the dataset
The process of reading TFRecords is straightforward:
- Read the TFRecord using a tf.data.TFRecordDataset
- Define the features you expect in the TFRecord by using tf.FixedLenFeature and tf.VarLenFeature, depending on what has been defined during the definition of tf.train.Example.
- Parse one tf.train.Example (one file) a time using tf.parse_single_example.
- Shuffle the dataset and extract by batch_size
Visualize input images
Model Training and Prediction
Since we are using a relatively small and balanced dataset with 10 classes, we decided to write a custom model architecture similar to Lenet.
Our evaluation metric will be accuracy, but one may choose to have other metrics like average precision or area under roc curve
The next step would be fitting the model , saving it and predicting for test set.
Though the task of converting raw data to TFRecords may seem arduous but it is worth the effort. Few notable advantages are:
- TFRecords occupy less disk space than raw jpegs/pngs/csvs.
- TFRecords improves I/O performance as it takes less time to copy and can be read much more efficiently from disk
- Tensorflow provides a seemless integration of TFRecords with import pipelines
- It has the ability to store sequential data
- There is no change required in the model fit and definition statements while working with TFRecords
- As a consequence of improved I/O performance, TFRecords reduces the training time.