Facebook AI’s WAT19 Myanmar-English Translation Task Submission

Source: Deep Learning on Medium

Facebook AI’s WAT19 Myanmar-English Translation Task Submission

Lately, Machine translation (MT) has proven very successful for high resource language pairs. However, it is still an open research question on how to make it work well for the vast majority of language pairs which are low-resource. For example, the English-Myanmar language pair is rich but also very different from English. The language is also not anywhere similar to other high-resource languages and hence does not benefit from multilingual training. Additionally, there is but only very little parallel data available and monolingual data in Myanmar language is difficult to gather due to the multiple encodings of the language.

Facebook AI MyanmarEnglish 2019 Competition

Facebook AI Research participated in the 2019 edition of the competition on MyanmarEnglish, organized by the Workshop on Asian Translation. This paper is a description of their submission, which achieved the highest human evaluation and BLEU score in the competition.

My→En leaderboard5 . The values are BLEU score (second column) and Adequacy scores (third column). Rows highlighted in yellow identify systems that make use of additional monolingual data. Facebook is tagged as FBAI

Their system ranks first in both directions according to human evaluation and BLEU, with a gain of over 8 BLEU points above the second-best system. Specifically, it explores methods to leverage monolingual data to improve generalization, including self-training, back-translation, and their combination. First, they use back-translation to regularize and adapt to the test domain. Second, they use self-training as a way to better leverage in-domain source-side monolingual data. Third, given the complementary nature of these two approaches we combined them in an iterative fashion. Finally, they improve decoding by using noisy-channel reranking and ensembling.

En→My leaderboard6 . The values are BLEU score (second column) and Adequacy scores (third column). Rows highlighted in yellow identify systems that make use of additional monolingual data.

They further improve results by using noisy channel re-ranking and ensembling and demonstrate that these techniques can significantly improve not only a system trained with additional monolingual data but even the baseline system trained exclusively on the provided small parallel dataset.

Potential Uses and Effects

As a work that achieved the best performance in the Myanmar-English machine translation competition, this is worth some attention. The approach is a blend of several methods that achieves good performance without the use of additional monolingual data.

The work can be further improved by leveraging noisy parallel data resources, improving the combination of different sources of additional data, as well as designing better approaches to leverage source-side monolingual data.

Read more: Myanmar-English Translation Task Submission

Thanks for reading, comment, share & let’s connect on Twitter, LinkedIn, and Facebook. For an update of the most recent and interesting research papers, subscribe to our weekly newsletter. Remember to 👏 if you enjoyed this article. Cheers!