Artificial intelligence in Yandex Browser. Yandex began to use neural networks in Translator to improve translation Yandex translator neural network

Yandex.Translator has learned to be on friendly terms with the neural network and provide users with better texts. Yandex began to use a hybrid translation system: initially, a statistical system worked, and now it is supplemented by technology machine learning CatBoost. The truth is, there is one thing. So far, only for translation from English into Russian.

Yandex claims that this is the most popular direction of transfers, accounting for 80% of the total.

CatBoost is a smart thing that, having received two versions of the translation, compares them, choosing the most human-like one.

In the statistical version, the translation is usually broken down into separate phrases and words. Neuroest does not do this, I analyze the proposal as a whole, taking into account the context whenever possible. Hence, it is very similar to a human translation, because the neural network can take into account the agreement of words. However, the statistical approach also has its advantages, when he does not fantasize, if he sees a rare or incomprehensible word... a neural network can show an attempt at creativity.

After today's announcement, it should reduce the number of grammatical errors in automatic translations. They now go through the language model. Now there should be no moments in the spirit of "daddy went" or "severe pain."

In the web version in this moment users can choose the version of the translation that they think is the most correct and successful; there is a separate trigger for this.

If you are interested in the news of the IT world as much as we are, subscribe to our Telegram channel. There all materials appear as quickly as possible. Or maybe it's more convenient for you? We are even in.

Did you like the article?

Or at least leave a happy comment so that we know which topics are most interesting to readers. It also inspires us. The comment form is below.

What's wrong with her? You can express your indignation at [email protected] We will try to take into account your wishes in the future in order to improve the quality of the site materials. Now let's spend educational work with the author.

or Does quantity grow into quality

An article based on the speech at the RIF + KIB 2017 conference.

Neural Machine Translation: Why Just Now?

Neural networks have been talked about for a long time, and it would seem that one of classical problems artificial intelligence- machine translation - just begs to be solved on the basis of this technology.

Nevertheless, here are the dynamics of popularity in the search for queries about neural networks in general and about neural machine translation in particular:

It is clearly seen that until recently there is nothing about neural machine translation on radars - and at the end of 2016, several companies, including Google, Microsoft and SYSTRAN, demonstrated their new technologies and machine translation systems based on neural networks. They appeared almost simultaneously, with a difference of several weeks or even days. Why is that?

In order to answer this question, it is necessary to understand what machine translation based on neural networks is and what is its key difference from classical statistical systems or analytical systems that are used today for machine translation.

At the heart of the neural translator is the mechanism of bidirectional recurrent neural networks (Bidirectional Recurrent Neural Networks), built on matrix calculations, which allows you to build significantly more complex probabilistic models than statistical machine translators.


Like statistical translation, neural translation requires parallel corpuses for training, allowing to compare automatic translation with the reference "human", only in the learning process it operates not with individual phrases and phrases, but with whole sentences. The main problem is that much more computing power is required to train such a system.

To speed up the process, developers use GPUs from NVIDIA, as well as Google's Tensor Processing Unit (TPU) - proprietary chips adapted specifically for machine learning technologies. Graphics chips are initially optimized for matrix computing algorithms, and therefore the performance gain is 7-15 times in comparison with the CPU.

Even so, training a single neural model takes 1 to 3 weeks, while a statistical model of about the same size adjusts in 1-3 days, and this difference increases with size.

However, not only technological problems have been a brake on the development of neural networks in the context of the task of machine translation. In the end, it was possible to train language models earlier, albeit more slowly, but there were no fundamental obstacles.

The fashion for neural networks also played a role. Many people were developing within themselves, but they were in no hurry to declare this, fearing that they might not receive the quality gain that society expects from the phrase Neural Networks. This can explain the fact that several neural translators were announced one after the other.

Translation quality: whose BLEU score is thicker?

Let's try to understand whether the increase in translation quality corresponds to the accumulated expectations and the increase in costs that accompany the development and support of neural networks for translation.
Google's research demonstrates that neural machine translation gives a Relative Improvement of 58% to 87%, depending on the language pair, compared to the classical statistical approach (or Phrase Based Machine Translation, PBMT, as it is also called).


SYSTRAN conducts research in which the quality of the translation is assessed by choosing from several presented options made by different systems as well as "human" translation. And he claims that his neural translation is preferred 46% of the time to human translation.

Translation quality: is there a breakthrough?

Even though Google claims an improvement of 60% or more, there is a small catch in this metric. Representatives of the company talk about "Relative Improvement", that is, how much they managed with a neural approach to approach the quality of Human Translation in relation to what was in the classic statistical translator.


Industry experts analyzing the results presented by Google in the article "Google" s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation "are rather skeptical about the presented results and say that in fact the BLEU score was improved by only 10%, and significant progress is noticeable just for enough simple tests from Wikipedia, which, most likely, were used in the process of training the network.

Inside PROMT, we regularly compare the translation on various texts of our systems with competitors, and therefore there are always examples at hand on which we can check whether neural translation is really as superior to the previous generation as the manufacturers claim.

Original Text (EN): Worrying never did anyone any good.
Google translation PBMT: Worrying didn't do anything good to anyone.
Google Translate NMT: Worrying has never helped anyone.

By the way, the translation of the same phrase into Translate.Ru: “The excitement never did any good to anyone,” you can see that it was and remains the same even without the use of neural networks.

Microsoft Translator is also not far behind in this matter. Unlike colleagues from Google, they even made a website where you can translate and compare two results: neural and pre-neural, to make sure that the claims about growth in quality are not unfounded.


In this example, we see that there is progress, and it is really noticeable. At first glance, it seems that the developers' statement that machine translation has almost caught up with "human" translation is true. But is it really so, and what does it mean from the point of view practical application technology for business?

In general, translation using neural networks is superior to statistical translation, and this technology has huge potential for development. But if we approach the issue carefully, then we will be able to make sure that progress is not in everything, and not for all tasks it is possible to use neural networks without looking at the task itself.

Machine translation: what is the challenge

From an automatic translator, the entire history of its existence - and this is already more than 60 years! - waited for some magic, presenting it as a typewriter from science fiction films, which instantly translates any speech into an alien whistle and back.

In fact, the tasks are of different levels, one of which implies “universal” or, if I may say so, “everyday” translation for everyday tasks and to facilitate understanding. Online translation services and many mobile products are perfect for this level.

These tasks include:

Fast translation of words and short texts for various purposes;
automatic translation in the process of communication on forums, in in social networks, messengers;
automatic translation when reading news, Wikipedia articles;
travel translator (mobile).

All those examples of the growth of translation quality using neural networks, which we considered above, relate precisely to these problems.

However, with the goals and objectives of the business in relation to machine translation, things are a little different. For example, here are some of the requirements for corporate machine translation systems:

Translation of business correspondence with clients, partners, investors, foreign employees;
localization of sites, online stores, product descriptions, instructions;
translation of user-generated content (reviews, forums, blogs);
the ability to integrate translation into business processes and software products and services;
accuracy of translation with respect to terminology, confidentiality and security.

Let's try to understand with examples whether any translation business tasks can be solved using neural networks and how exactly.

Case: Amadeus

Amadeus is one of the world's largest global airline ticket distribution systems. On the one hand, air carriers are connected to it, on the other, agencies, which must receive all information about changes in real time and convey to their customers.

The task is to localize the conditions for the application of fares (Fare Rules), which are generated in the booking system automatically from different sources. These rules are always formed in English. Manual translation is almost impossible here, due to the fact that there is a lot of information and it changes frequently. An airline ticket agent would like to read Fare Rules in Russian in order to promptly and efficiently advise his clients.

An understandable translation is required that conveys the meaning of the tariff rules, taking into account typical terms and abbreviations. And the automatic translation is required to be integrated directly into the Amadeus booking system.

→ The task and implementation of the project are detailed in the document.

Let's try to compare the translation made through the PROMT Cloud API, integrated into Amadeus Fare Rules Translator, and the "neural" translation from Google.

Original: ROUND TRIP INSTANT PURCHASE FARES

PROMT (Analytical Approach): RATES FOR INSTANT PURCHASE OF FLIGHTS THERE AND BACK

GNMT: ROUND SHOPPING

Obviously, the neural translator cannot cope here, and a little further it will become clear why.

Case: TripAdvisor

TripAdvisor is one of the world's largest travel services and needs no introduction. According to an article published by The Telegraph, 165,600 new reviews appear on the site every day about various tourist sites in different languages.

The task is to translate tourist reviews from English into Russian with a translation quality sufficient to understand the meaning of this review. Main difficulty: typical features of user generated content (texts with errors, typos, missing words).

Also part of the task was to automatically assess the quality of the translation before publishing it on TripAdvisor. Since manual evaluation of all translated content is not possible, a machine translation solution must provide an automatic mechanism for evaluating the quality of translated texts - a confidence score to enable TripAdvisor to publish translated reviews only High Quality.

For the solution, the PROMT DeepHybrid technology was used, which makes it possible to obtain a higher-quality translation that is understandable to the end reader, including through statistical post-editing of the translation results.

Let's look at examples:

Original: We ate there last night on a whim and it was a lovely meal. The service was attentive without being over bearing.

PROMT (Hybrid translation): We ate there last night by accident and it was lovely food. The staff were attentive but not overbearing.

GNMT: We ate there last night on a whim and it was lovely food. The service was attentive without having more bearings.

Everything here is not as depressing in terms of quality as in the previous example. In general, in terms of its parameters, this task can potentially be solved using neural networks, and this can further improve the quality of translation.

Challenges of using NMT for business

As mentioned earlier, a “universal” translator does not always provide acceptable quality and cannot support specific terminology. To integrate into your processes and use neural networks for translation, you need to fulfill the basic requirements:

The presence of sufficient volumes of parallel texts in order to be able to train a neural network. Often the customer simply has few of them, or even texts on this topic do not exist in nature. They can be classified or in a state not very suitable for automatic processing.

To create a model, you need a database that contains at least 100 million tokens (tokens), and to get a translation of more or less acceptable quality - 500 million tokens. Not every company has such a volume of materials.

The presence of a mechanism or algorithms for automatic assessment of the quality of the result obtained.

Sufficient computing power.
A “universal” neural translator is often not of the right quality, and a “small cloud” is required to deploy a private neural network capable of providing acceptable quality and speed of work.

Unclear what to do with privacy.
Not every customer is ready to give their content for transfer to the cloud for security reasons, and NMT is a cloud first and foremost story.

conclusions

In general, neural automatic translation produces a higher quality result than a “purely” statistical approach;
Automatic translation through a neural network - better suited for solving the problem of "universal translation";
None of the MT approaches are in themselves an ideal universal tool for solving any translation task;
To solve translation problems in business, only specialized solutions can guarantee compliance with all requirements.

We arrive at an absolutely obvious and logical decision that for your translation tasks you need to use the translator that is most suitable for this. It doesn't matter if there is a neural network inside or not. Understanding the task itself is more important.

Tags: Add Tags

The Yandex.Translator service has begun to use neural network technologies to translate texts, which makes it possible to improve the quality of translation, the Yandex website reported.

To bookmarks

The service operates on a hybrid system, Yandex explained: the translation technology using a neural network has been added to the statistical model that has been working in Translator since its launch.

“Unlike a statistical translator, a neural network does not break down texts into separate words and phrases. She receives the entire offer at the entrance and issues its translation, ”explained a company representative. According to him, this approach allows taking into account the context and better convey the meaning of the translated text.

The statistical model, in turn, copes better with rare words and phrases, Yandex emphasized. “If the meaning of the sentence is not clear, she does not fantasize about how a neural network can do it,” the company said.

When translating, the service uses both models, then the machine learning algorithm compares the results and suggests the best, in its opinion, option. “The hybrid system allows us to take the best from each method and improve the quality of translation,” they say in Yandex.

During the day of September 14, a switch should appear in the web version of the "Translator", with the help of which it will be possible to compare the translations made by the hybrid and statistical models. At the same time, sometimes the service may not change the texts, the company noted: "This means that the hybrid model has decided that statistical translation is better."

IN modern internet more than 630 million sites, but only 6% of them contain Russian-language content. The language barrier Is the main problem of spreading knowledge between network users, and we believe that it needs to be solved not only by teaching foreign languages, but also by means of automatic machine translation in the browser.

Today we will tell Habr's readers about two important technological changes in the Yandex Browser translator. First, the translation of selected words and phrases now uses a hybrid model, and we recall how this approach differs from the use of purely neural networks. Secondly, the neural networks of the translator now take into account the structure of web pages, the features of which we will also talk about under the cut.

Hybrid word and phrase translator

The first machine translation systems were based on dictionaries and rules(in fact, hand-written regulars), which determined the quality of the translation. Professional linguists have worked for years to come up with increasingly detailed manual rules. The work was so time-consuming that serious attention was paid only to the most popular pairs of languages, but even within them the machines did not cope well. Living language is a very complex system that obeys rules poorly. It is even more difficult to describe by the rules of correspondence between two languages.

The only way for a machine to constantly adapt to changing conditions is to learn on its own from a large number of parallel texts (the same in meaning, but written in different languages). This is the statistical approach to machine translation. The computer compares parallel texts and independently identifies patterns.

Have statistical translator there are both advantages and disadvantages. On the one hand, he is good at memorizing rare and difficult words and phrases. If they were encountered in parallel texts, the translator will remember them and will continue to translate them correctly. On the other hand, the result of the translation is similar to a completed puzzle: the overall picture seems to be clear, but if you look closely, you can see that it is made up of separate pieces. The reason is that the translator presents individual words as identifiers that do not in any way reflect the relationship between them. This is inconsistent with how people perceive language when words are defined by how they are used, how they relate to other words, and how they differ from them.

Helps to solve this problem neural networks... Word embedding, used in neural machine translation, typically associates each word with a vector of several hundred numbers. Vectors, in contrast to simple identifiers from the statistical approach, are formed when training a neural network and take into account the relationships between words. For example, the model might recognize that since “tea” and “coffee” often appear in similar contexts, both of these words should be possible in the context of a new word “spill,” which, for example, only met one of them in the training data.

However, the process of learning vector representations is clearly more statistically demanding than rote memorization of examples. In addition, it is not clear what to do with those rare input words that are not often encountered enough for the network to build an acceptable vector representation for them. In this situation, it is logical to combine both methods.

Since last year, Yandex.Translate has been using hybrid model... When the Translator receives a text from the user, he gives it to both systems for translation - both the neural network and the statistical translator. Then an algorithm based on a learning method evaluates which translation is better. When scoring, dozens of factors are taken into account - from the length of the sentence (short phrases are better translated by the statistical model) to the syntax. The best translation is shown to the user.

It is the hybrid model that is now used in Yandex Browser, when the user selects specific words and phrases on the page for translation.

This mode is especially useful for those who are generally proficient in foreign language and would like to translate only unknown words. But if, for example, instead of the usual English you meet Chinese, then it will be difficult to do without a page translator. It would seem that the difference is only in the volume of the translated text, but not everything is so simple.

Neural web page translator

From the time of the Georgetown experiment to almost the present day, all machine translation systems have been trained to translate each sentence of the source text separately. While a web page is not just a set of sentences, but structured text, which has fundamentally different elements. Let's take a look at the main elements of most of the pages.

Heading... Usually bright and large text, which we see immediately upon entering the page. The headline often contains the essence of the news, so it is important to translate it correctly. But this is difficult to do, because there is little text in the heading and without understanding the context, you can make a mistake. In case of English language even more difficult because English headings often contain phrases with non-traditional grammar, infinitives, or even omit verbs. For example, Game of Thrones prequel announced.

Navigation... Words and phrases that help us navigate the site. For example, Home, Back and My account hardly worth translating as "Home", "Back" and "My Account" if they are located in the site menu, and not in the text of the publication.

Main text... With it, everything is easier, it differs little from ordinary texts and sentences that we can find in books. But even here it is important to ensure the consistency of translations, that is, to ensure that the same terms and concepts are translated in the same way within the same web page.

For high-quality translation of web pages, it is not enough to use a neural network or hybrid model - you also need to take into account the structure of the pages. To do this, we had to deal with a lot of technological difficulties.

Classification of text segments... To do this, we again use CatBoost and factors based both on the text itself and on the HTML markup of documents (tag, text size, number of links per text unit, ...). The factors are quite heterogeneous, therefore it is CatBoost (based on gradient boosting) that shows the best results (classification accuracy is higher than 95%). But segment classification alone is not enough.

Data skew... Traditionally, Yandex.Translator algorithms are trained using texts from the Internet. It would seem that this is an ideal solution for training a web page translator (in other words, the network learns from texts of the same nature as from those texts on which we are going to use it). But as soon as we learned how to separate different segments from each other, we found interesting feature... On average, on sites, content takes up about 85% of all text, while titles and navigation only account for 7.5%. Remember, too, that the headings and navigation elements themselves are noticeably different in style and grammar from the rest of the text. These two factors combine to create a data skew problem. It is more profitable for a neural network to simply ignore the features of these segments, which are very poorly represented in the training sample. The web is learning to translate well only the main text, which suffers from the quality of translation of headings and navigation. To neutralize this unpleasant effect, we did two things: to each pair of parallel sentences, we assigned one of three types of segments (content, heading or navigation) as meta information and artificially raised the concentration of the latter two in the training corpus to 33% due to the fact that similar examples began to be shown more often to the learning neural network.

Multi-task learning... Since we now know how to divide text on web pages into three classes of segments, it might seem like a natural idea to train three separate models, each of which will handle the translation of a different type of text - headings, navigation, or content. This really works well, but the scheme works even better in which we train one neural network to translate all types of texts at once. The key to understanding lies in the idea of ​​mutli-task learning (MTL): if there is an internal connection between several machine learning problems, then a model that learns to solve these problems simultaneously can learn to solve each of the problems better than a narrow-profile specialized model!

Fine-tuning... We already had quite good machine translation, so it would be unwise to train a new translator for Yandex Browser from scratch. It is more logical to take basic system to translate ordinary texts and retrain it to work with web pages. In the context of neural networks, this is often called fine-tuning. But if you approach this problem head-on, i.e. just initialize the weights of the neural network with the values ​​from the finished model and start learning on the new data, then you may encounter the effect of a domain shift: as you learn, the quality of translation of web pages (in-domain) will increase, but the quality of translation of ordinary (out-of-domain ) texts will fall. To get rid of this unpleasant feature, during additional training, we impose an additional restriction on the neural network, preventing it from changing the weights too much compared to the initial state.

Mathematically, this is expressed by adding a term to the loss function, which is the KL-divergence between the probability distributions of the next word produced by the original and retrained networks. As you can see in the illustration, this leads to the fact that the increase in the quality of translation of web pages no longer leads to degradation of the translation of plain text.

Polishing frequency phrases from navigation... In the process of working on a new translator, we collected statistics on the texts of various segments of web pages and saw something interesting. The texts that relate to navigation elements are quite standardized, therefore they often represent the same template phrases. This is such a powerful effect that more than half of all navigation phrases found on the Internet account for only 2 thousand of the most frequent ones.

We, of course, took advantage of this and gave several thousand of the most frequent phrases and their translations to our translators for verification to be absolutely sure of their quality.

External alignments. There was another important requirement for the translator of web pages in the Browser - it should not distort the markup. When HTML tags are located outside of sentences or on their boundaries, no problem arises. But if inside the sentence there is, for example, two underlined words, then in translation we want to see “two highlighted words". Those. as a result of the transfer, two conditions must be met:

  1. The underlined fragment in the translation must correspond exactly to the underlined fragment in the source text.
  2. Translation consistency at the boundaries of the underlined section should not be violated.
In order to ensure this behavior, we first translate the text as usual, and then use statistical word alignment models to determine the correspondences between fragments of the original and translated text. This helps to understand what exactly needs to be emphasized (italicize, arrange as a hyperlink, ...).

Intersection observer... The powerful neural network translation models that we have trained require significantly more computing resources on our servers (both CPU and GPU) than statistical models of previous generations. At the same time, users do not always read the pages to the end, so sending all the text of web pages to the cloud looks like unnecessary. To save server resources and user traffic, we taught Translator to use