Understanding Federated Learning

Original article was published by Arunkumar L on Deep Learning on Medium


Understanding Federated Learning

A focus on privacy in Machine Learning

Image by Darwin Laganzon from Pixabay

With an increasing focus on privacy, Federated Learning has become one of the essential concepts in modern machine learning. Federated learning is geared towards training a model without uploading personal information or identifiable data to a cloud server. As you might already know, a machine learning model needs a lot of data to train. But there are times when the training data is sensitive, and people are growing reluctant to share their personal data with a third party. With growing concerns for privacy, federated learning is now essential in most machine learning applications.

Data is born at the edge

There are over a billion edge devices like phones, tablets, and IoT devices worldwide, continually generating data. For companies and developers, this data can make their product, and the user experiences better by training a better model. Often, the client sends data to the server where the model runs inference and returns the prediction. Once the model returns the prediction, the client sends feedback, using which the model corrects itself. While this presents an advantage in collecting the data and reducing the computational strain on the edge device, it faces problems regarding offline usage, latency, and, more importantly, privacy.

Model Inference on cloud

While the model can be trained and inferred locally, it too has its limitations. A single user will provide too little data for a model to be adequately trained. In this case, the data from the other devices aren’t contributing, and this leads to having a non-generalized data. It has become a fight between privacy and better intelligence with almost no solutions until recently.

How does Federated Learning work?

In federated learning, the server distributes the trained model(M1) to the clients. The clients train the model on locally available data. These models are then sent back to the server instead of the data, where they are averaged to produce a new model(M2). This new model(M2) now acts as the primary model and is again distributed to the clients. This process is repeated until the model achieves a satisfactory result. In every iteration, the model gets a little better than it already was. Thus, federated learning gives birth to better intelligence, while the personal data of the user is secure in their device.

Federated Learning
Gboard predicting my next word

The most common example of federated learning is Google’s keyboard app, Gboard. Machine learning models are used to improve the user experience like swipe typing, auto-correct, next-word predictions, voice to text, etc. Federated learning plays a huge role here because what you type is very personal to you, and you wouldn’t want to send your data to a server. The local model is trained with your data and sent to the server, like numerous other models from different clients. The server takes the average of these new models to produce a new model. The server now distributes the latest model to the clients. This process repeats forever.

Security Protocols

Even though the user data is not uploaded to the server, there is potential for the model to be reverse engineered to obtain user data. Model aggregation and client-side data encryption are used to combat this problem.

The Federated learning protocol combines and sums the model output, and the server has access only to the aggregate model and not the individual models. Here, the devices report only the data that is required for the calculation. The server distributes this model to all the clients.

Masking is used to cancel out opposite data points from the clients during aggregation. Since the masked value is sent to the server, it is difficult for anyone to intercept the values and reverse-engineer any personal data.

TensorFlow Federated

TensorFlow Federated(TFF) is an open-source framework for federated learning on decentralized data made by Google’s TensorFlow team. TFF is still in its infancy and has a lot to improve. At the time of writing, TFF only provides local simulation runtime and no options for deployment.

You can install TensorFlow Federated using the pip package manager.

pip install tensorflow_federated

TensorFlow Federated offers two sets of interfaces, namely, Federated Learning(FL) and Federated Core. Using the federated learning interface, the developer can implement federated training or federated evaluation, and developers can apply federated learning to existing TensorFlow models. The federated core interface is used to test and express new federated algorithms and run local runtime simulations.

I will write more on TensorFlow Federated in an upcoming blog.

Conclusion

This blog is a quick rundown on federated learning, and I will publish more in-depth blogs soon. I’m planning to write more blogs on some lesser-known concepts of Machine Learning. Follow me to get the latest updates.

You can find me on Twitter, LinkedIn, and Github.