Source: Deep Learning on Medium
DeepPavlov Ruberta conversion to PyTorch
Russian is a very complex language that is hard to put in a neural network. But it is possible and DeepPavlov has made an incredible job to make it happen by providing the pretrained weights for TensorFlow and Keras. An enormous work by a group from MIPT.
Though every day I understand the beauty of TensorFlow more and more I more inclined to use PyTorch to do the initial research. But the problem is that DeepPavlov provides weights for TensorFlow and Keras not for PyTorch.
Steps to convert Ruberta TensorFlow, Keras weights to PyTorch
- Find a file
- Read the documentation on how to call this function with parameters. Pay attention that checkpoint ending with
indexshould be provided.
- Run the library and you will receive a file that PyTorch accepts as a weight.
- Copy all files to your working directory.
from transformers import *bertcf = BertConfig(vocab_size_or_config_json_file=119547)
model = BertModel(bertcf)
for index, par_bert_pytorch in enumerate(model.parameters()):
par_rubert = torch.load('rubert.pt')[keys[index]]
if par_rubert.shape == par_bert_pytorch.shape:
print('Executed substitution of paramaters.')
And TensorFlow weights will be loaded to your PyTorch model.
Use a preprocessor from DeepPavlov to encode Russian text for your model:
from deeppavlov.models.preprocessors import bert_preprocessor
tokenizer = bert_preprocessor.BertPreprocessor('vocab.txt')
Everything is working. Congratulations!
The same steps might be repeated if you have the pretrained weights for your language.