Coursera DL Specialization Course in TF 2.x

Original article can be found here (source): Deep Learning on Medium

Coursera DL Specialization Course in TF 2.x

Coursera course converted to TF 2.x

TF 2.x is the new norm!

The Deep Learning specialization of deeplearning.ai (available on Coursera) is outstanding and I would highly recommend it to anyone interested in this field. In this course, one starts off by learning how to code neural networks from scratch without any libraries and then moves on to advanced features of the TensorFlow library. Unfortunately (as of writing this article) all the coding exercises are implemented in TF < 2.0. TF 2+ will be the standard in the near future and TF 2.x is substantially different from TF1. Most of the exercises would require a complete code rewrite.

Since the programming exercises of this course are brilliant, I have decided to make the exercises version 2.x compatible. I am maintaining the converted TF2.x codes for the exercises at the following (link). This post assumes some familiarity with TF.

In this blog post, I will review the basics of converting TF1.x codes (especially from this course) into the TF2+ code. I have implemented a Deep NN from week 7 exercise in TF1=(link) and in TF2 =(link). Down below I will explain some key points of differences in both the codes. For TF2+ versions of other exercises please visit my Github repo : (link)

1) Session() is gone = Enter Eager Execution

Let us first have a look at the following simple code snippet which calculates squared error loss in TF 1.x,

TF2+ gets rid of sessions and graph building. In TF 1.x you would typically define the relation between variables (create a graph) and then a session executes the pathway. The session.run() runs the pathway with inputs as arguments and spits out output. In TF2+ eager execution is enabled by default. This implies that variables are executed/computed automatically and concurrently without the need for a session/graph.

The converted TF 2+ code is:

Difference: No need to create session object or tf.gloabal_variable_initializer(). As soon as the interpreter hits the loss= command, it is executed.

2) Eager Execution (another example)

Here I will implement multiplication of two tensors(scalars) using both TF1 and TF2+.

Multiplication in TF 1.x

c would only output a tensor(20) when it is run via tf.session(). In TF2+ eager execution is enabled by default and the following code will automatically output a tensor(20).

Multiplication in TF 2.x

3) Placeholders are gone

Placeholders are ehh.. well placeholding variables in TF<2. They are basically a container for input variables. These have been removed in TF2+. Let us first look at this example of a function that calculates the sigmoid of input variables/tensor. Here x is a placeholder variable that stores the argument of the function and then sigmoid is acted on it.

Placeholder in TF 1.x

In TF2+ placeholders are not required and sigmoid can be directly applied to the input variable.

Working without placeholder in TF2.x

4) Perform Gradient Descent

If you use tf.keras to build models, you can directly use tf.train() to train your dataset with a pre-defined loss function. However to do anything custom you would have to implement gradient descent from scratch. Lets first review gradient descent in TF<2.

Gradient Descent in TF1.x

Apart from all the .session() business gradient descent is implemented via tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost) function. The minimize function tries to minimize the argument and tunes the parameters accordingly. In TF2+ minimize function is not present and one needs to implement gradient descent at a much lower level. This gives more control over the whole procedure.

Gradient Descent in TF2.x

We first defined a function (get_grad()) which will calculate the gradients wrt the parameters. This function uses tf.GradientTape() for autodifferentiation and calculation of gradients. More details about it can be found at: link. We apply the gradients using the function: optimizer.apply_gradients(zip(grads, list(parameters.values()))). This is basically the long-winded replacement for .minimize function.

5) Other Minor Changes

In addition to the following, there are a lot of minor changes such are rearranging the libraries. Let me list down some quick examples

a) All pre-defined layers have been moved to:
tf.keras.layers.*

b) Some math functionality moved into a sub-class:
tf.math.*

c) More can be found at: https://www.tensorflow.org/guide/migrate#top_of_page

Bottom Line

The aim of this article was to provide a quick summary of code differences between TF<2 and TF2+ especially focusing on Coursera Deep Learning Specialization. I have converted all the exercises of the specialization into TF2+ code and have uploaded them on (link). I hope this blog was useful

References

  1. TF Documentation on migration:
    https://www.tensorflow.org/guide/migrate#top_of_page
  2. DL Specialization in TF2:
    https://drive.google.com/drive/folders/1a9A9O04ODfYJgslYbklFqR58GDuoOeYQ?usp=sharing