In a recent study by researchers at the National Key Laboratory for Novel Software Technology Nanjing University China and ByteDance, the translation ability of Large-scale Pretrained Language Models (LLMs) was explored, leading to the discovery that their potential was even greater than previously thought. Through a method called Multilingual Finetuning with Translation Instructions (mFTI), the researchers were able to improve translation performance for various languages compared to existing approaches.

Advancements in Machine Translation

Machine translation has seen considerable progress, with LLMs having shown strong abilities in multilingual translations. Previously, most research focused on an approach called in-context learning (ICL), which uses parallel sentences to guide LLMs. However, this study proposed the mFTI method, which directly trains LLMs to follow translation instructions. By utilizing mFTI, greater translation capabilities were achieved for LLMs than with ICL, particularly for non-English languages.

The authors also found that LLMs can generalize their instruction-following abilities for unseen language pairs and learn to align languages through pivot languages, positioning mFTI as a powerful method to enhance translation performance.

A New Approach: Multilingual Finetuning with Translation Instructions (mFTI)

The mFTI approach uses a parallel corpus of sentences in different languages and leverages an instruction template to create a language modeling dataset. Each sentence in the dataset is an instantiation of the translation instruction with a specific sentence pair. The parameters of the LLMs are then optimized using a standard next-token prediction objective on the dataset. The research team used XGLM-7B, a massive multilingual language model that has been trained on a corpus of 500 billion tokens in 30 different languages, for the experimental setup and evaluated performance across 13 languages.

By finetuning XGLM on 156 language pairs with a small number of parallel sentences, the mFTI method was found to be more effective than 8-shot ICL. The improved translation performance even extended to language pairs unseen during the instruction tuning phase, showcasing the adaptability of the mFTI method.

Exploring the Effects of mFTI on Translation Performance

The research team conducted an in-depth analysis to better understand the translation performance improvements brought by mFTI. They discovered that adding more language pairs and monolingual samples resulted in reduced instruction-following errors and improved translation performance across different language pairs. This means mFTI enables LLMs to better understand and complete translation instructions, enhancing their overall performance.

By adding parallel sentences of X→En and En→Y to the training corpus and performing mFTI using these augmented corpora, the researchers investigated the model’s ability to establish meaningful alignments between language pairs using pivot languages. They found that adding pivot parallel sentences significantly boosted the model’s performance, demonstrating the potential of mFTI to improve direct alignment between languages.

The Impact of the mFTI Method on AI and Future Research

The findings in this study showcase that mFTI can substantially enhance the translation capabilities of LLMs in multilingual settings. As the research revealed that the potential of LLMs for translation tasks is greater than previously thought, it opens new doors for future research and developments in AI. First, efforts can be made to acquire more language knowledge from the pretraining phase and better regularization terms to solve translation issues. Second, researchers can focus on the relationship between different languages when collecting and sampling data for pretraining multilingual LLMs.

By enhancing the ability of LLMs to follow translation instructions and improving language alignment, mFTI stands to make a significant impact on the field of artificial intelligence. As a result, more advanced and accurate machine translations will become attainable, leading to improved AI capabilities for communication and understanding in our increasingly connected world.

Original Paper