Federated Learning

Original article was published on Artificial Intelligence on Medium


Federated Learning is Gboard on Android which is Google Keyboard. When Gboard shows a suggested query, your phone locally stores information about the current context and whether you clicked the suggestion. Federated Learning processes that history on-device to suggest improvements to the next iteration of Gboard’s query suggestion model. The keyboard type on your phone and photo rankings based on what kinds of photos people look at, share or delete also used in temperature calculation on a different location from taking user offline data and then showing the average of temp. Also, used in speech recognition when people taking with the device train automatically and understand speech and recognize with as per utterances no need accent of like US, UK, Indian but it automatically trains using a federated learning algorithm. The solution of federated algorithms using an Optimization algorithm like Stochastic Gradient Descent (SGD) runs on a large dataset partitioned homogeneously across servers in the cloud.

Nowadays Federated Learning frameworks include TensorFlow Federated, an open-source framework by Google for experimenting with machine learning and other computations on decentralized data.

How Federated Learning Works

A random subset of members of the Federation (known as clients) is selected to receive the global model synchronously from the server.

Each selected client computes an updated model using its local data.

The model updates are sent from the selected clients to the server.

The server aggregates these models (typically by averaging) to construct an improved
global model.

Aim:

The motivation for federated learning is the preservation of the privacy of the data owned by the clients. Even when the actual data is not exposed, the repeated model weight updates can be exploited to reveal properties not global to the data but specific to individual contributors.

Role of federated learning:

In the image classification Convolutional neural network are using for training data of large dataset and test small single image in CNN use a federated learning algorithm to use for classification based offline image and train without a train on the cloud.

Federated learning also uses Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM) architecture for speech recognition, or voice input commands to understand the user queries the first time and train itself. There are two ways of sending information to the server as follows :

FederatedSGD

FederatedAvg

This research work is exciting and will provide a new vision for my skills and knowledge. Implemented using Generative Adversarial Network (GAN), Long short-term memory (LSTM), Recurrent neural network (RNN) and Convolutional neural network (CNN) as follows:

Generative Adversarial Network (GAN)

GAN’s are capable of learning to mimic any type of distribution data. We will apply Generator with the combination of rectifier linear activation and sigmoid activation, while at the Discriminator end max out activation. Dropout can be taken care of by the intermediate layers of the generator. In the training process, the weights and biases in the generative network are updated in order to increase the classification error whereas the weights and biases in the discriminative network are updated so that error is decreased. This we can try with Conditional GAN and Deep Convolutional GANs with some conditional parameters that are put into place. In CGAN, an additional parameter y is added to the Generator for generating the corresponding data. Labels are also put into the input to the Discriminator in order for the Discriminator to help distinguish the real data from the fake generated data. Whereas DCGAN, composed of ConvNets in place of multi-layer perceptrons. The Conv Nets are implemented without max pooling, which is in fact replaced by convolutional stride. Also, the layers are not fully connected.

Long short-term memory

LSTM is a combination of recurrent neural networks and feed-forward neural networks. LSTM has numerous applications related to text document processing based on context solving for the given task. So it predicts the next coming term from the relative context thus, words are not treated as independent individuals, but as the units dependent on their immediate neighborhood in the text. LSTM advantage is that the output does not depend on the length of input because the input is entered sequentially, one input per time step. The basic architecture of the LSTM recurrent neural network receipts the memory block, that consists of several memory cells with which one can communicate by the input gate, forget gate, and output gate of that cell.
The training of the LSTM neural network is performed in two phases, forward pass, and backward pass. The important feature to note is that LSTM memory cells give different roles to addition and multiplication in the transformation of inputs. The central plus sign in both diagrams below is essentially the secret of LSTM.

Recurrent neural network

RNN has a powerful sequence study capability, which can finely describe the dependency relationship from a data \cite{zhang2018speech}. RNN such as LSTM is powerful models that are capable of learning effective feature representations of sequences when given enough training data. RNN is a layered neural network that includes recurring connections between layers. The recurrence creates a temporal effect on the network, allowing certain network connections (parameters) to be used at different times. This allows RNN to capture temporal relationships from a data sequence, which make them appropriate for predicting sequential data

Convolutional neural network

CNN is a combination of convolution and pooling layers at the beginning, a few fully connected layers at the end, and finally a softmax classifier to classify the input into various categories. The main features of using CNN is parameter sharing and sparsity of connections. So in parameter sharing after feature extraction, the features used in one part can also be convolved over the entire network. In sparsity connections for each layer, each output value depends on a small number of inputs, instead of taking into account all the inputs