打破学习的玻璃墙_打破Google背后的创新深度学习
打破學習的玻璃墻
What Google Translate does is nothing short of amazing. In order to engineer the ability to translate between any pair within the dozens of languages it supports, Google Translate’s creators utilized some of the most advanced and recent developments in NLP in exceptionally creative ways.
Google Translate所做的一切令人驚訝。 為了設計在其支持的數十種語言中的任何一對之間進行翻譯的能力,Google Translate的創造者以非凡的創造性方式利用了NLP的一些最先進和最新的發展。
In machine translation, there are generally two approaches: a rule-based approach and a machine learning-based approach. Rule-based translation involves the collection of a massive dictionary of translations, perhaps word-by-word or by phrase, which are pieced together into a translation.
在機器翻譯中,通常有兩種方法:基于規則的方法和基于機器學習的方法。 基于規則的翻譯涉及大量海量翻譯詞典的收集,這些詞典可能逐字逐詞地或短語地拼湊成翻譯。
For one, grammar structures differ significantly between languages. Consider Spanish, in which objects have a masculine or feminine gender. All adjectives and words like ‘the’ or ‘a’ must conform to the gender of the object in which it is describing. Translating ‘the big red apples’ into Spanish would require each of the words ‘the’, ‘big’, and ‘red’ to be written in both plural and feminine form, since those are the attributes of the word ‘apples’. In addition, in Spanish adjectives usually follow the noun, but sometimes they go before.
一方面,語言之間的語法結構差異很大。 考慮一下西班牙語,其中的對象具有男性或女性性別。 所有形容詞和單詞(例如“ the”或“ a”)必須符合其描述對象的性別。 將“大紅蘋果”翻譯成西班牙語需要將“ the”,“ big”和“ red”這兩個詞分別以復數形式和女性形式書寫,因為這些是“ apples”一詞的屬性。 另外,在西班牙語中,形容詞通常跟隨名詞,但有時在其之前。
Image created by author圖片由作者創建The result is ‘las [the] grandes [big] manzanas [apples] rojas [red]’. This grammar and the necessity of changing all adjectives doesn’t make any sense to a pure English speaker. Just within English-to-Spanish translation, there are too many disparities in fundamental structure to keep track of. However, a truly global translation requires translation between every pair of languages.
結果是“大[大]曼薩納斯[蘋果]羅哈斯[紅色]”。 這種語法以及更改所有形容詞的必要性對于純英語使用者來說毫無意義。 在英語到西班牙語的翻譯中,基本結構上的差異太多,無法跟蹤。 但是,真正的全球翻譯需要每對語言之間的翻譯。
Within this task arises another problem: to translate between, say, French and Mandarin, the only feasible rule-based solution would be to translate French into a base language — probably English — which would be then translated into Mandarin. This is like playing telephone: the nuance of a phrase said in one language is trampled over by noise and heavy-handed generalization.
在此任務中出現了另一個問題:例如在法語和普通話之間進行翻譯,唯一可行的基于規則的解決方案是將法語翻譯成基本語言(可能是英語),然后將其翻譯成普通話。 這就像玩電話:用一種語言說的短語的細微差別被噪音和粗俗的概括所踐踏。
Image created by author圖片由作者創建The complete hopelessness of rule or dictionary-based translation and the need for some kind of universal model that can learn the vocabulary and the structure of two languages should be clear. Building this model is a difficult task for a few reasons, however:
規則或基于字典的翻譯完全沒有希望,并且需要某種通用的模型來學習兩種語言的詞匯和結構。 但是,出于以下幾個原因,構建此模型是一項艱巨的任務:
The model needs to be lightweight enough such that it works offline so users can access it even without an Internet connection. Moreover, translation between any two languages should be supported, all downloaded on the user’s phone (or PC).
該模型必須足夠輕巧,以便脫機工作,以便用戶即使沒有Internet連接也可以訪問它。 此外,應該支持任何兩種語言之間的翻譯, 所有語言都下載在用戶的手機(或PC)上。
- The model must be fast enough to generate live translations. 該模型必須足夠快才能生成實時翻譯。
- Elaborating on the example above — in English, the words ‘big red apple’ are sequential. If we are to process the data from left-to-right, however, the Spanish translation would be inaccurate, since in that language adjectives, even before the noun in English, change form depending on the noun. The model needs to consider non-sequential translation. 詳細說明上面的示例-用英語,“大紅蘋果”一詞是順序的。 但是,如果我們要從左到右處理數據,那么西班牙語的翻譯將是不準確的,因為在該語言中,形容詞甚至在英語名詞之前都會根據名詞而改變形式。 該模型需要考慮非順序翻譯。
- Machine learning-based systems are always heavily reliant on the dataset, which means that words not represented in the data are words the model knows nothing about (it needs a robustness/good memory for rare words). Where would one find a collection of high-quality translated data representative of the entire grammar and vocabulary of a language? 基于機器學習的系統始終嚴重依賴于數據集,這意味著未在數據中表示的單詞是模型不知道的單詞(對于稀有單詞,它需要魯棒性/良好的記憶力)。 在哪里可以找到代表一種語言的全部語法和詞匯的高質量翻譯數據的集合?
- A lightweight model cannot memorize the vocabulary of an entire language. How does the model deal with unknown words? 輕量級模型無法記住整個語言的詞匯表。 模型如何處理未知詞?
- Many Asian languages like Japanese or Mandarin are based on characters instead of letters. Hence, there is one highly specific character for each word. A machine learning model must be able to translate between a letter-based system like in English, Spanish, or German — which, even containing accented letters, are nevertheless letters — to a character-based one like Korean, and vice versa. 許多亞洲語言,例如日語或普通話,都是基于字符而不是字母。 因此,每個單詞都有一個非常具體的字符。 機器學習模型必須能夠在基于字母的系統(例如英語,西班牙語或德語)之間進行轉換,該系統甚至包含重音字母,但仍然是字母,然后轉換為基于字符的系統(如韓語),反之亦然。
When Google Translate was initially released, they used a phrase-based algorithm, which is essentially a rule-based method with more complexity. Soon after, however, it drastically improved its quality with the development of Google Neural Machine Translation (GNMT).
Google Translate最初發布時,他們使用了基于短語的算法,這實際上是一種基于規則的方法,具有更高的復雜性。 然而,不久之后,隨著Google神經機器翻譯(GNMT)的發展,它大大提高了質量。
Source: Google Translate. Image free to share.資料來源:Google翻譯。 圖片免費分享。They considered each of the problems above and came up with innovative solutions, creating an improved Google Translate —now, the world’s most popular free translation service.
他們考慮了上述每個問題,并提出了創新的解決方案,從而創建了改進的Google翻譯-現在是世界上最受歡迎的免費翻譯服務。
Creating one model for every pair of languages is obviously ridiculous: the number of deep models needed would reach the hundreds, each of which would need to be stored on a user’s phone or PC for efficient usage and/or offline use. Instead, Google decided to create one large neural network that could translate between any two languages, given two tokens (indicators; inputs) representing the languages.
為每對語言創建一個模型顯然是荒謬的:所需的深度模型數量將達到數百種,每種深度模型都需要存儲在用戶的手機或PC上,以便有效使用和/或離線使用。 取而代之的是,Google決定創建一個大型神經網絡,該網絡可以在兩種表示語言的標記(指示符;輸入)之間進行翻譯,從而可以在任何兩種語言之間進行翻譯。
The fundamental structure of the model is encoder-decoder. One segment of the neural network seeks to reduce one language into its fundamental, machine-readable ‘universal representation’, whereas the other takes this universal representation and repeatedly transforms the underlying ideas in the output language. This is a ‘Transformer Architecture’; the following graphic gives a good intuition of how it works, how previously generated content plays a role in generating following outputs, and its sequential nature.
該模型的基本結構是編碼器-解碼器。 神經網絡的一個部分試圖將一種語言簡化為其基本的,機器可讀的“通用表示”,而另一部分則采用這種通用表示,并在輸出語言中反復轉換基礎思想。 這是一個“變壓器架構”; 下圖很好地說明了它的工作原理,先前生成的內容如何在生成后續輸出中發揮作用以及其順序性質。
AnalyticsIndiaMag. Image free to share.AnalyticsIndiaMag 。 圖片免費分享。Consider an alternative visualization of this encoder-decoder relationship (a seq2seq model). The intermediate attention between the encoder and decoder will be discussed later.
考慮此編碼器-解碼器關系的替代可視化(seq2seq模型)。 編碼器和解碼器之間的中間關注點將在后面討論。
Google AI. Image free to share.Google AI 。 圖片免費分享。The encoder consists of eight stacked LSTM layers. In a nutshell, LSTM is an improvement upon an RNN — a neural network designed for sequential data — that allows the network to ‘remember’ useful information to make better future predictions. In order to address the non-sequential nature of language, the first two layers add bidirectionality. Pink nodes indicate a left-to-right reading, whereas green nodes indicate a right-to-left reading. This allows for GNMT to accommodate different grammar structures.
編碼器由八個堆疊的LSTM層組成。 簡而言之,LSTM是對RNN(針對順序數據設計的神經網絡)的改進,它使網絡能夠“記住”有用的信息,從而做出更好的未來預測。 為了解決語言的非順序性質,前兩層添加了雙向性。 粉色節點表示從左到右的讀數,而綠色節點表示從右到左的讀數。 這允許GNMT適應不同的語法結構。
Source: GNMT Paper. Image free to share.資料來源:GNMT文件。 圖片免費分享。The decoder model is also composed of eight LSTM layers. These seek to translate the encoded content into the new language.
解碼器模型也由八個LSTM層組成。 這些試圖將編碼的內容翻譯成新的語言。
An ‘attention mechanism’ is placed between the two models. In humans, our attention helps us keep focus on a task by looking for answers to that task and not additional irrelevant information. In the GNMT model, the attention mechanism helps identify and amplify the importance of unknown segments of the message, which are prioritized in the decoding. This solves a large part of the ‘rare words problem’, in which words that appear less often in the dataset are compensated with more attention.
在兩個模型之間放置一個“注意機制”。 在人類中,我們的注意力通過尋找該任務的答案而不是其他不相關的信息來幫助我們專注于一項任務。 在GNMT模型中,注意力機制有助于識別和放大消息中未知片段的重要性,這些片段在解碼時會優先處理。 這解決了“稀有詞問題”的很大一部分,其中在數據集中出現頻率較低的詞得到了更多關注。
Skip connections, or connections that jump over certain layers, were used to stimulate healthy gradient flow. As is with the ResNet (Residual Network) model, updating gradients may be caught up at one particular layer, affecting all the layers before it. With such a deep network comprising of 16 LSTMs in total, it is imperative not only for training time but for performance that skip connections be employed, allowing gradients to cross potentially problematic layers.
跳過連接或跳過某些層的連接用于刺激健康的梯度流。 與ResNet(殘差網絡)模型一樣,更新梯度可能會在一個特定的層上被捕獲,從而影響到它之前的所有層。 對于這樣一個由總共16個LSTM組成的深度網絡,不僅對于訓練時間而且對于性能而言,都必須跳過連接,從而允許梯度跨越可能存在問題的層。
Source: GNMT Paper. Image free to share.資料來源:GNMT文件。 圖片免費分享。The builders of GNMT invested lots of effort into developing an efficient low-level system that ran on TPU (Tensor Processing Unit), a specialized machine-learning hardware processor designed by Google, for optimal training.
GNMT的創建者投入了很多精力來開發一種高效的低級系統,該系統在TPU(張量處理單元)上運行,TPU是Google設計的專用機器學習硬件處理器,用于最佳培訓。
An interesting benefit of using one model to learn all the translations was that translations could be indirectly learned. For instance, if GNMT were trained only on English-to-Korean, Korean-to-English, Japanese-to-English, and English-to-Japanese, the model yielded good translations for Japanese-to-Korean and Korean-to-Japanese translation, even though it had never been directly trained on it. This is known as zero-shot learning, and significantly improved the required training time for deployment.
使用一種模型學習所有翻譯的一個有趣的好處是可以間接學習翻譯。 例如,如果GNMT僅接受了英語對韓語,韓語對英語,日語對英語和英語對日語的培訓,那么該模型就可以很好地為日語對韓語和朝鮮語對英語進行翻譯日語翻譯,即使從未接受過日語翻譯。 這被稱為零擊學習,并且大大縮短了部署所需的培訓時間。
AnalyticsIndiaMag. Image free to share.AnalyticsIndiaMag 。 圖片免費分享。Heavy pre-processing and post-processing is done on the inputs and outputs of the GNMT model in order to support, for example, the highly specialized characters found often in Asian languages. Inputs are tokenized according to a custom-designed system, with word segmentation and markers for the beginning, middle, and end of a word. These additions made the bridge between different fundamental representations of language more fluid.
對GNMT模型的輸入和輸出進行大量的預處理和后處理,以支持例如亞洲語言中經常發現的高度專業化的字符。 輸入是根據定制設計的系統標記的,帶有單詞分段和單詞開頭,中間和結尾的標記。 這些添加使語言的不同基本表示之間的橋梁更加流暢。
For training data, Google used documents from the United Nations and the European Parliament’s documents and transcripts. Since these organizations contained information professionally translated between many languages — with high quality (imagine the dangers of a badly translated declaration) — this data was a good starting point. Later on, Google began using user (‘community’) input to strengthen cultural-specific, slang, and informal language in its model.
對于培訓數據,Google使用了來自聯合國的文件以及歐洲議會的文件和成績單。 由于這些組織包含在多種語言之間進行專業翻譯的信息(質量很高(想像一下聲明錯誤翻譯的危險)),因此這些數據是一個很好的起點。 后來,Google開始使用用戶(“社區”)輸入來增強其模型中特定于文化的,語和非正式語言。
GNMT was evaluated on a variety of metrics. During training, GNMT used log perplexity. Perplexity is a form of entropy, particularly ‘Shannon entropy’, so it may be easier to start from there. Entropy is the average number of bits to encode the information contained in a variable, and so perplexity is how well a probability model can predict a sample. One example of perplexity would be the number of characters a user must type into a search box for a query proposer to be at least 70% sure the user will type any one query. It is a natural choice for evaluating NLP tasks and models.
對GNMT進行了多種評估。 在訓練期間,GNMT使用了日志困惑。 困惑是熵的一種形式,特別是“香農熵”,因此從那里開始可能更容易。 熵是對變量中包含的信息進行編碼的平均位數,因此困惑度是概率模型預測樣本的能力。 困惑的一個例子是,用戶必須在搜索框中鍵入一個字符數,查詢提議者才能至少確保70%的用戶可以鍵入任何一個查詢。 這是評估NLP任務和模型的自然選擇。
The standard BLEU score for language translation attempts to measure how close the translation was to a human one, on a scale from 0 to 1, using a string-matching algorithm. It is still widely used because it has shown strong correlation with human-rated performance: correct words are rewarded, with bonuses for consecutive correct words and longer/more complex words.
語言翻譯的標準BLEU分數嘗試使用字符串匹配算法以0到1的比例來衡量翻譯與人類翻譯的接近程度。 它仍被廣泛使用,因為它顯示出與人類評價的性能密切相關:獎勵正確的單詞,并為連續正確的單詞和更長/更復雜的單詞提供獎勵。
However, it assumes that a professional human translation is the ideal translation, only evaluates a model on select sentences, and does not have much robustness to different phrasing or synonyms. This is why a high BLEU score (>0.7) is usually a sign of overfitting.
但是,它假定專業的人工翻譯是理想的翻譯,僅對所選句子評估模型,并且對不同的短語或同義詞沒有足夠的魯棒性。 這就是為什么BLEU分數較高(> 0.7)通常表示過度擬合的原因。
Regardless, an increase in BLEU score (represented as a fraction) has shown an increase in language-modelling power, as demonstrated below:
無論如何,BLEU分數的提高(表示為分數)顯示出語言建模能力的提高,如下所示:
Google AI. Image free to share.Google AI 。 圖片免費分享。Using the developments of GNMT, Google launched an extension that could perform visual real-time translation of foreign text. One network identified potential letters, which were fed into a convolutional neural network for recognition. The recognized words are then fed into GNMT for recognition and rendered in the same font and style as the original.
借助GNMT的發展,Google推出了一項擴展程序,可以執行外文的可視實時翻譯。 一個網絡識別出潛在的字母,然后將其輸入到卷積神經網絡中進行識別。 然后將識別出的單詞輸入到GNMT中進行識別,并以與原始字體相同的字體和樣式進行呈現。
Source: Google Translate. Image free to share.資料來源:Google翻譯。 圖片免費分享。One can only imagine the difficulties abound in creating such a service: identifying individual letters, piecing together words, determining the size and font of text, properly rendering the image.
人們只能想象創建此類服務的困難:識別單個字母,將單詞拼湊在一起,確定文本的大小和字體,正確渲染圖像。
GNMT appears in many other applications, sometimes with a different architecture. Fundamentally, however, GNMT represents a milestone in NLP, with the wonders of a lightweight yet effective design building upon years of NLP breakthroughs incredibly accessible to everyone.
GNMT出現在許多其他應用程序中,有時具有不同的體系結構。 但是,從根本上講,GNMT代表了NLP的里程碑,其奇跡是在多年的NLP突破性成果令人難以置信的基礎上構建輕巧而有效的設計。
關鍵點 (Key Points)
- There are many challenges when it comes to providing a truly global translation services. The model must be lightweight, but also understand the vocabulary, grammar structures, and relationships between dozens of languages. 提供真正的全球翻譯服務面臨許多挑戰。 該模型必須是輕量級的,而且必須了解詞匯表,語法結構以及數十種語言之間的關系。
- Rule-based translation systems, even more complex phrase-based ones, fail to perform well at translation tasks. 基于規則的翻譯系統,甚至更復雜的基于短語的翻譯系統,在翻譯任務上的表現均不佳。
- GNMT uses a Transformer Architecture, in which an encoder and decoder, composed of 8 LSTMs each. The first two layers of the encoder allow for bidirectional reading to accommodate non-sequential grammar. GNMT使用一種Transformer體系結構,其中的編碼器和解碼器分別由8個LSTM組成。 編碼器的前兩層允許雙向讀取以適應非順序語法。
- The GNMT model uses skip connections to promote healthy gradient flow. GNMT模型使用跳過連接來促進健康的梯度流。
- GNMT developed zero-shot learning, which allowed for significantly faster growth and understanding in training. GNMT開發了零擊學習,可以大大加快培訓的增長和了解。
- The model was trained on log perplexity and evaluated formally using the standard BLEU score. 對模型進行對數困惑度訓練,并使用標準BLEU評分進行正式評估。
With the advancements of GNMT — beyond text-to-text translation but image-to-image and sound-to-sound translation — deep learning has made one huge leap towards the understanding of human language. Its applications, not as an esoteric and impractical model but as an innovative, lightweight, and highly usable one, are unbounded. In many ways, GNMT is one of the most accessible and practical culmination of years of cutting-edge NLP research.
隨著GNMT的發展(從文本到文本的翻譯,但從圖像到圖像的翻譯和聲音到聲音的翻譯),深度學習在理解人類語言方面邁出了一大步。 它的應用不是無限的,不現實的模型,而是一種創新,輕便且高度可用的模型。 從許多方面來說,GNMT是多年來最前沿的NLP研究中最容易獲得和最實用的成果之一。
This was just a peek into the fascinating machine learning behind Google Translate. You can read the full-length paper here and visit the interface for yourself here.
這只是對Google Translate背后有趣的機器學習的一瞥。 你可以閱讀全長紙這里參觀的界面為自己在這里 。
翻譯自: https://towardsdatascience.com/breaking-down-the-innovative-deep-learning-behind-google-translate-355889e104f1
打破學習的玻璃墻
總結
以上是生活随笔為你收集整理的打破学习的玻璃墙_打破Google背后的创新深度学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 编写分段函数子函数_编写自己的函数
- 下一篇: 向量 矩阵 张量_张量,矩阵和向量有什么