Source: Deep Learning on Medium
Federated learning is a machine learning training approach where data is not stored in centralized fashion on a cloud but is decentralized and present on multiple devices such as cell phones, IoT devices. This training approach effectively addresses the major issue with privacy because data never leaves the device. We can consider it as a new implementation of the same methodology as Map-Reduce of taking computation towards data than taking data to code.
Though it seems an easy concept, implementing it in the real world is quite difficult as there are issues such as limited computing power on devices, unreliable communication between phones. As machine learning training is an iterative process, using the same methods as centralized machine learning will add a lot of pressure of communication, so federating learning uses modifications such as a federated averaging algorithm for such tasks.
Though federating learning concept was presented in 2016, recently in March 2019 google has come up with a paper on how they implement it on a large scale up to hundreds of millions of users.
In the paper Google has described a three-phase protocol:
“Selection Configuration and Reporting” are the three phases of the FL training protocol. In the selection phase devices reports the server their availability for training a federated learning task, which they do only when the device is idle, connected to stable WiFi. Based on the certain pre-defined condition devices are selected for training or not. Once the devices are selected for training, in configuration phase the server sends the federated learning plan to these devices which has a graph of the model along with various hyperparameters as well as instructions for batching. Model is then trained on this data and the updates are reported back to the server which aggregates these updates using the federated averaging algorithm,m and modifies the global model which is then used in the next round.
So, on the device, the major constraint that has to be taken into consideration is the limited computing power. This is how the solution is designed:
The application supporting FL stores a set of records in storage “Example Store”. Consider the example store as a data SQLite database with a certain constraint on the size, hence only the most recent records will be kept in the SQLite database. As sufficient data is available App process configures federated learning runtime by sharing example store and federated learning population name. This, in turn, schedules an FL job which is invoked when the device is in the idle state. After an invocation, it informs the server of its availability and then if selected by the server it receives a federated learning plan. Federated learning plan consists of a model graph and hyper-parameters such as a number of epochs and how to batch data etc. Then the process is simple. The model is trained for that many epochs and the updates are given back to the server.
On the server side, there are three main types of instances: Co-ordinator, Aggregator and Selector.
Coordinators are the actors which enable the global synchronization and advancing round in lock step. All the FL population it manages is registered in a shared locking service, so there is always a single owner for every FL actors.
Selectors are another persistent actor model to which device connects. Co-ordinator periodically orders them about the requirement of devices. Using this information they make local decisions about accepting or rejecting devices.
Co-ordinator then spawns the Master Aggregator and aggregators after which it informs the selector to transfer a subset of its connected devices to an aggregator.
Master aggregator manages the round of federated learning tasks. A number of devices very master aggregators spawn more instances of aggregators as needed and aggregators can be distributed at data centres globally.
This approach optimizes the performance as the slection phase of the next round of the protocol can be implemented with aggregation and reporting phase of the previous round.
Secure Aggregation: Secure aggregation is a 4-way protocol in which make individual updates from devices uninspectable. So during the first 2 rounds, aggregator and device shares secrete keys. The third round is committing phase in which model updates are encrypted at the device and sent to the aggregators and then the global model is updated and the updated model is sent back to the users. In the last round, the devices share the “sufficient secretes ” to unmask the aggregated model updates.
One more important aspect is how is the FL plan is generated :
The plan is generated as the FL task and the configuration for training is provided by the engineer. We can consider that there are 2 parts of the plan, one for the server side and one for the client side. For the device side, we will have a graph of the model, selection criterion for data used to be training and instruction such as how to batch data and how many epochs to run. For the server, the plan contains information about how many devices to be selected and how to aggregate the updates from each device.
This is mostly it about the FL at large scale paper by Google in further articles we will dive deep into the various algorithms-modification to be done in the traditional learning algorithms for FL fashion training.
Google has implemented it in Gboard to predict the next word. To train there RNN model with 1.4 million parameters they required 3000 FL rounds, processing around 6e8 sentences from 1.5e6 users over 5 days.
As mentioned earlier federated learning has a lot of promises and approaches such as differential privacy which provides probabilistic guarantees of users privacy can be integrated with it. There are also many concerns such as because of unavailability of data for analysis it is hard for the developer to interpret the performance of data, it is even possible that if a large amount of population has malicious data them model can be badly affected.
Many organizations are actively researching in this domain and with increasing concerns related to privacy, there will be great demand for the federated learning training of the models. Surely it is a technique to look for.
Also, do check out Openmined an open source community working in the domain of privacy-preserving using federated learning.
Thanks for reading, here we tried to give you a brief idea of federated learning and how Google has implemented it in large scale. If you enjoyed this article, hit that 👏button below. ❤ Would really mean a lot to me and it helps other people see the story.
Google paper: https://arxiv.org/pdf/1902.01046.pdf
Federated learning paper: https://arxiv.org/abs/1602.05629
Openmined community: https://www.openmined.org/