Book review: Deep Learning with Python, 2017, François Chollet, Manning

Original article was published by Christophe Pere on Deep Learning on Medium


Book review: Deep Learning with Python, 2017, François Chollet, Manning

A great tour in the world of Deep Learning in less than 10 minutes.

It took me a long time to open this book. More for fear of finding that I knew nothing more than for fear of being frustrated with knowing everything. I regularly receive newsletters on “best of” or “most read” books about artificial intelligence, machine learning, or deep learning. Deep Learning with Python is consistently cited as one of the most recommended. The given level changes quite often between advanced, intermediate, or expert. I think that we should not stop at a level estimated by someone. You have to pick up and read a book out of interest, out of need.

This book is accessible to anyone wishing to learn. The fluidity and simplicity of writing allow an understanding of all the concepts regardless of the level.

Part 1: Fundamentals of deep learning

Chapter 1 — What is deep learning?

Chapter 1 of the book gives to the reader a good overview of the definitions employed in the domain and what exactly we call learning in the manner of an algorithm. The author tells the history of artificial intelligence with simplicity and concision.

There is a lot of explanations with the context to explain what is deep learning and why it just appears recently in our industries and in our real-world (hardware, data, Moore’s law…).

The author provides the reader with a clear explanation of different machine learning algorithms and how neural networks work.

I’d appreciated the context of the use of the algorithms and the parallel with the use in competitions (why this type for a specific task).

Chapter 2 — [..] Mathematical building blocks of neural networks

So here we are, in the math part of the book. Well, every people just scared by these few words but, the author has a particular method to explain the math behind a neural network. His own words are:

Although this section deals entirely with linear algebra expressions, you won’t find any mathematical notation here. I’ve found that methematical concepts can be more readily mastered by programmers with no mathematical background if they’re expressed as short Python snippets instead of mathematical equations. So we’ll use Numpy code throughout.

It’s right, no mathematical formulas. So, don’t be afraid of the math or become crazy thinking that it will be hard or impossible. The author cut the different parts of the math behind the learning process very simple and explain them with Numpy code.

With this chapter, you’ll be familiar with vector, matrix, and tensor (and their corresponding real-world application). The different operations used with tensors and the corresponding representation for the neural networks. You’ll also learn about differentiable function and gradient and what is gradient descent, stochastic gradient descent, and the different corresponding parameters for the neural networks.

Clear and concise this chapter is very well explained and permits us to have a good understanding of tensor manipulation and representation learning.

The author use geometry to permit the reader to visualize the concepts he explains.

Bonus, in just a few lines you’ll be familiar with backpropagation and the math behind it. The author reuses an example of the previous chapter and cuts it in parts to map the relevant mathematical concepts you just learn.

I was a little hesitant at the beginning but, I eagerly went through these thirty pages.

Chapter 3 — Getting Started with neural networks

With this chapter, the reader is capable of handle most commons problems with neural networks as classification (binary and multiclass) and regression.

The author explains the different parts of neural networks (layers, input data, loss function, optimizer). He provides code blocks coded in Keras with an explanation of how they work.

You’ll find very relevant information about loss function and optimizer.

So, the examples. Three are provided in this chapter, one for each common problem (binary classification, multiclass classification, regression). The author provides the best practices and what to look at with visualization. He put the light on overfitting and how to evaluate your model and, what loss function to use for the different problems.

You’ll have a good understanding of the output of the neural networks for the three cases. One highlight shows are on the size of the network, it’s important to scale your network taking into account the size of your dataset. If the dataset is small choose small neural networks with few layers to avoid overfitting.

These 36 pages are very useful and full of information and best practices.

Chapter 4 — Fundamentals of machine learning

This chapter will prepare the reader with a survey on the different branches of the machine learning world. You’ll be exposed to a quick description of the four concepts in the ML world.

The author will present the different concepts between classification and regression and the different metrics and loss functions.

You’ll know how to evaluate your model and all the steps needed to prepare a project. Beginning with the data to finish with hyperparameter tuning.

The author presents the importance of train, evaluate, and test. You’ll know the common ways to evaluate your model and the code behind the different methods. You’ll also find proper information about feature engineering and the format of the data.

You’ll find proper information about overfitting and underfitting and what that’s mean for your model also the ways to avoid them. The author provides a great explanation about dropout and how to use it.

And the last thing, the machine learning workflow that every project needs to take. You’ll find the different hypothesis and the step by step to follow when starting a machine learning problem.

So it’s a small chapter (26 pages) but with a lot of important information to keep in mind during a project of research in machine learning.

Resume part 1

At this point, you reached the end of the first part of the book, you passed through all the fundamental parts of the deep learning world and how to use them precisely in your project. You’ll know where to start, what to look for, how to prepare your data, how to train, evaluate, and test your model. The importance of the understanding of the math behind every part of the algorithm. But most importantly, you have the context of this discipline and the definition of the processes.

Part 2: Deep Learning in Practice

Chapter 5 — Deep Learning for Computer Vision

In this chapter, the author will explain all the concepts of Convolutional Neural Networks (CNN). There is a lot of information in this section.

The author began with the operations of how CNN works (filters, padding, strides, pooling…). Then you’ll know how to use them in a small dataset to obtain great results. You will be familiar with the overfitting concept in computer vision and how to avoid it (data augmentation, dropout…).

The next step is illustrated by pre-trained models, their goal, how they are obtained, and how to use them in different cases to increase the accuracy of your predictions. The author explains the concept of feature extraction and the concept of fine-tuning (tune the last layers of a pre-trained CNN to learn specific filters or representations of your specific data) and how to use them in the case of low datasets.

The chapter ended with the visualization of how convnets see the world. Three methods to look into the different abstract representation of each layer of your model.

The author uses an example of image classification (dog/cat), which provides a lot of codes explaining each step of all the things he describes.

Chapter 6 — Deep Learning for text and sequences

After exploring the world of image recognition the author leads us inside the world of text and sequence data. This part is really amazing for those interested in NLP and Time Series problem.

The chapter begins with explanations of how the data need to be prepare. You’ll have knowledge about tokenization, n-grams, one-hot encoding, and word embeddings (the best explanation of it I see was in the Deep Learning Specialization provided by Coursera). The author provides lots of code to use these different techniques and pretrained models (Glove, …).

After the part about data. You’ll explore the world of Recurrent Neural Networks (RNN) and their variations, LTMS, GRU, and Bidirectional. The author provides enough theory to understand the code (Keras) with concrete examples. You’ll be able to use the different models on your text data and reproduce easily the results showing in this first part of the chapter.

The end of the chapter is about more advanced applications of RNNs as Time Series and Text generation. You’ll also learn how to use dropout in RNN and how to use CNN for text or time series but you’ll see the disadvantages of this type of model when considering time series.

Chapter 7 — Advanced deep learning best practices

At this step of the book, you will reach the advanced features of Keras. The author provides you advice to pass from the Sequential model to the functional API. Why this important? Because you can with this usability to have multi-input models or multi-output models.

You’ll have an explanation of acyclic graphs (as the Inception model), Residual connections (ResNet models), sharing weights between layers, or using models as layers.

You also have best practices about monitoring your model with Tensorboard and callbacks (the different types and how to use them). And you’ll learn how to and when to use batch normalization and depthwise separable convolution.

The most important part of this chapter, for me, is the part of hyperparameter optimization. The author presents different ways to tune them and how to use ensemble models.

You’ll have condensed pieces of information in 36 pages but it’s very helpful.

Chapter 8 — Generative deep learning

Exciting chapter, then this one. The first part is for text generation, you’ll learn how to code a language model to generate text with different possibilities. The most important thing will know how to reweight the probability distribution for the next character or the next word in order to generate creativity in the text generation.

Then, you’ll have access to a lot of information on DeepDream and how to use it with pre-trained models. You’ll learn how to create new pictures and what is the mechanism inside this generation.

Neural style transfer, What is it? You’ll learn it in this chapter, it’s the ability to take the content of one picture and the style of one other and to mix them in a new picture. Yes, you’ll learn that content and style can be mathematically written and put inside a neural net to mix them.

Variational Auto-Encoder and Generative Adversarial Networks here we are. In a few pages ~20 you have a lot of information about VAEs and GANs for image generation. You’ll find relevant information to distinguish both and to select which to use in your case. GANs are the most difficult to train so the author provides different tricks to allow you to have a better understanding of the behavior of the model. The GAN part is for me the most interesting because I never use them.

I finish this chapter with good intuition about the different techniques for generative data. Image generation is the biggest part but also the most interesting with GANs.

Chapter 9 — Conclusions

The author provides a concise summary of the book at the beginning of this chapter. Then, the author gave a good understanding of the limitations of deep learning and place it in the world of AI. You also find the feelings of the author about the future of deep learning and where to look in your exploration. One point, about AutoML, was interesting to put in parallel with the recent advancements in this field. You also find information on how to stay up to date with the field.

Final words

I hope this short summary of the book will give you relevant information about the content and allows you to find what you need. I have had this book for a long time in my library, purchased for the purpose of primarily using Chapter 6 which deals with NLP.

I am currently in a different quest, to fill all the weaknesses of my self-learning. So, I launched into challenges to go beyond my comfort zone. The first step put the ego aside and admit that I do not know everything (far from it). The rest, to conquer the textbooks that can help. Deep Learning with Python is a foundational stone long-listed as a benchmark. After exploring it and taking countless notes and post-its I feel ready to continue my journey into subdomains that interest me.

About the author:

François Chollet is an AI researcher on the Google Brain Team and author of the Keras deep-learning library.

François Chollet, Deep learning with Python (2017), Manning

Waiting for the publication of the second edition