BLiTZ — A Bayesian Neural Network library for PyTorch

Original article can be found here (source): Deep Learning on Medium

BLiTZ — A Bayesian Neural Network library for PyTorch

Blitz — Bayesian Layers in Torch Zoo is a simple and extensible library to create Bayesian Neural Network layers on the top of PyTorch.

Illustration for Bayesian Regression. Source: https://ericmjl.github.io/bayesian-deep-learning-demystified/images/linreg-bayesian.png (Accessed in 2020–03–30)

This is a post on the usage of a library for Deep Bayesian Learning. If you are new to the theme, you may want to seek one of the many posts on medium about it or just the documentation section on Bayesian DL of our lib repo.

As there is a rising need for gathering uncertainty over neural network predictions, using Bayesian Neural Network layers became one of the most intuitive approaches — and that can be confirmed by the trend of Bayesian Networks as a study field on Deep Learning.

It occurs that, despite the trend of PyTorch as a main Deep Learning framework (for research, at least), no library lets the user introduce Bayesian Neural Network layers intro their models with as ease as they can do it with nn.Linear and nn.Conv2d, for example.

Logically, that causes a bottleneck for anyone that wants to iterate flexibly with Bayesian approaches for their data modeling, as the user has to develop the whole part of Bayesian Layers for its use rather than focusing on the architecture of its model.

BLiTZ was created to change to solve this bottleneck. By being fully integrated with PyTorch (including with nn.Sequential modules) and easy to extend as a Bayesian Deep Learning library, BLiTZ lets the user introduce uncertainty on its neural networks with no more effort than tuning its hyper-parameters.

In this post, we discuss how to create, train and infer over uncertainty-introduced Neural Networks, using BLiTZ layers and sampling utilities.

Bayesian Deep Learning layers

As we know, the main idea on Bayesian Deep Learning is that, rather than having deterministic weights, at each feed-forward operation, the Bayesian layers samples its weights from a normal distribution.

Consequently, the trainable parameters of the layer are the ones that determine the mean and variance of this distribution.

Mathematically, the operations would go from:

Deterministic “vanilla” neural network feed-forward operation.

To:

Feed-forward operation for Bayesian Neural Network layer.

Implementing layers where our ρ and μ are the trainable parameters may be hard on Torch, and beyond that, creating hyper-parameter tunable layers may be even harder to craft. BLiTZ has a built-in BayesianLinear layer which can be introduced into the model this easy:

It works as a normal Torch nn.Module network, but its BayesianLinear modules perform training and inference with the previously explained uncertainty on its weights.

Loss calculation

As proposed in its original paper, Bayesian Neural Networks cost function is a combination of a “complexity cost” with a “fitting-to-data cost”. After all the algebra wrangling, for each feed-forward operation, we have:

Cost function for Bayesian Neural Networks.

It occurs that the complexity cost (P(W)) consists of the sum of the probability density function of the sampled weights (of each Bayesian layer on the network) relative to a much-simpler, predefined pdf function. By doing that, we ensure that, while optimizing, our model variance over its predictions will diminish.

To do that, BLiTZ brings us the variational_estimator decorator that introduces some methods, as nn_kl_divergence method into our nn.Module. Given data points, its labels, and a criterion, we could get the loss over a prediction by doing:

Easy model optimzing

Bayesian Neural Networks are often optimized by sampling the loss many times on the same batch before optimizing and proceeding, which occurs to compensate the randomness over the weights and avoid optimizing them over a loss influenced by outliers.

BLiTZ’s variational_estimator decorator also powers the neural network with the sample_elbo method. Given the inputs, outputs, criterion and sample_nbr, it estimates does the iterative process on calculating the loss over the batch sample_nbr times and gathers its mean, returning the sum of the complexity loss with the fitting one.

It is very easy to optimize a Bayesian Neural Network model:

Going through one example:

We are now going through this example, to use BLiTZ to create a Bayesian Neural Network to estimate confidence intervals for the house prices of the Boston housing sklearn built-in dataset. If you want to seek other examples, there are more on the repository

Necessary imports

Besides the known modules, we will bring from BLiTZ the variational_estimator decorator, which helps us to handle the Bayesian layers on the module keeping it fully integrated with the rest of Torch, and, of course, BayesianLinear, which is our layer that features weight uncertainty.

Loading and scaling data

Nothing new under the sun here, we are importing and standard-scaling the data to help with the training.

Creating our regressor class

We can create our class with inheriting from nn.Module, as we would do with any Torch network. Our decorator introduces the methods to handle the Bayesian features, calculating the complexity cost of the Bayesian Layers and doing many feedforwards (sampling different weights on each one) to sample our loss.

Defining a confidence interval evaluating function

This function does create a confidence interval for each prediction on the batch on which we are trying to sample the label value. We then can measure the accuracy of our predictions by seeking how much of the prediction distributions did include the correct label for the datapoint.

Creating our regressor and loading data

Notice here that we create our BayesianRegressor as we would do with other neural networks.

Our main training and evaluating loop

We do a training loop that only differs from a common Torch training by having its loss sampled by its sample_elbo method. All the other stuff can be done normally, as our purpose with BLiTZ is to ease your life on iterating on your data with different Bayesian Neural Networks without trouble.

Here is our very simple training loop:

Conclusion

BLiTZ is a useful lib to iterate in Deep Learning experiments with Bayesian Layers and very little change in the usual code. Its layers and decorators are very integrated with Torch modules for neural networks and make it easy also to create custom networks and extract their complexity cost have no difficulty.

And of course, here is the link for our repo: https://github.com/piEsposito/blitz-bayesian-deep-learning

References