嵌入式系统分类及其应用场景_词嵌入及其应用简介
嵌入式系統分類及其應用場景
Before I give you an introduction on Word Embeddings, take a look at the following examples and ask yourself what is common between them:
在向您介紹Word Embeddings之前,請看一下以下示例并問問自己它們之間的共同點是什么:
You have probably guessed this right. All of these applications deal with large amounts of Text. Now obviously it’s a waste of resources to spend manual labor in these applications which have millions of sentences or documents containing text.
您可能已經猜對了。 所有這些應用程序都處理大量的Text 。 現在,顯然,在具有數百萬個包含文本的句子或文檔的應用程序中花費體力勞動是浪費資源。
So what can we do now? We could feed this to Machine Learning and Deep Learning models and let them learn and figure out these applications. But most of these models can’t be fed textual data since they can’t interpret it in a human sense. Generally, they require numerical representations to perform any task. This is where Word Embeddings comes into use.
那么,我們現在該怎么辦? 我們可以將其提供給機器學習和深度學習模型,讓他們學習并找出這些應用程序。 但是,這些模型中的大多數不能提供文本數據,因為它們無法以人類的方式進行解釋。 通常,它們需要數字表示形式才能執行任何任務。 這就是Word Embeddings啟用的地方。
In this article we are going to address the following:
在本文中,我們將解決以下問題:
什么是詞嵌入? (What are Word Embeddings?)
Word Embeddings are a numerical vector representation of the text in the corpus that maps each word in the corpus vocabulary to a set of real valued vectors in a pre-defined N-dimensional space.
詞嵌入是語料庫中文本的數字矢量表示,它將語料庫詞匯表中的每個單詞映射到預定義的N維空間中的一組實值向量。
These real valued vector-representation for each word in the corpus vocabulary are learned through supervised techniques such as neural network models trained on tasks such as sentiment analysis and document classification or through unsupervised techniques such as statistical analysis of documents.
語料庫詞匯表中每個單詞的這些實值向量表示法是通過監督技術(例如對諸如情感分析和文檔分類等任務訓練的神經網絡模型) 或通過諸如文獻統計分析之類的無監督技術來學習的 。
The Word Embeddings try to capture the semantic, contextual and syntactic meaning of each word in the corpus vocabulary based on the usage of these words in sentences. Words that have similar semantic and contextual meaning also have similar vector representations while at the same time each word in the vocabulary will have a unique set of vector representation.
單詞嵌入嘗試根據句子中這些單詞的用法來捕獲語料庫詞匯中每個單詞的語義,上下文和句法含義。 具有相似語義和上下文含義的單詞也具有相似的矢量表示形式,同時詞匯表中的每個單詞將具有一組唯一的矢量表示形式。
Source: https://www.tensorflow.org/tutorials/representation/word2vec資料來源: https : //www.tensorflow.org/tutorials/representation/word2vecThe above image displays examples of words in vocabulary having similar contextual, semantic and syntactic meaning to be mapped in a 3-Dimensional Vector Space. In the above picture example of Verb Tense, we can observe that vector differences between word pairs: (walking & walked) and (swimming & swam) is roughly equal.
上圖顯示了詞匯中具有相似上下文,語義和句法含義的單詞的示例,這些單詞要映射到3維向量空間中。 在上面的動詞時態圖片示例中,我們可以觀察到單詞對之間的向量差:( 行走和行走)和( 游泳和游泳)大致相等。
為什么我們確切地偏愛詞嵌入? (Why exactly do we prefer Word Embeddings?)
Certain questions might have popped into your mind by now.
某些問題可能現在已經浮現在您的腦海。
Lets address these questions one by one.
讓我們一個接一個地解決這些問題。
用數字表示單詞的最簡單方法是什么,為什么還不夠? (What is the simplest way to represent words numerically and why isn’t that sufficient?)
The simplest ways to represent words numerically is to One-Hot-Encode unique word in a corpus of text. We can understand this better with an example. Suppose my corpus has only two documents:
用數字表示單詞的最簡單方法是對文本語料庫中的唯一單詞進行一次熱編碼 。 我們可以通過一個例子更好地理解這一點。 假設我的語料庫只有兩個文檔:
* The King takes his Queen for dinner.* The husband takes his wife for dinner.
* 國王帶女王用餐。 * 丈夫帶妻子吃飯。
You might notice that these two documents have the same contextual meaning. When we apply one hot encoding to the documents, here’s what happens:
您可能會注意到,這兩個文檔具有相同的上下文含義。 當我們對文檔應用一種熱編碼時,會發生以下情況:
We first construct an exhaustive vocabulary — {“The”, “King”, “husband”, “takes”, “his”, “Queen”, “wife”, “for”, “dinner”}.There are nine unique words in the document text, so each word will be represented as a vector with a length of 9. The vector will consist a "1" that corresponds to the position of the word in the vocabulary with a "0" everywhere else. Here is what those vectors look like:
文檔文本中有9個唯一的單詞,因此每個單詞將被表示為一個長度為9的向量。該向量將包含一個“ 1”,該詞與單詞在詞匯表中的位置相對應,并且到處都是“ 0”其他。 這些向量如下所示:
The - [1,0,0,0,0,0,0,0,0]King - [0,1,0,0,0,0,0,0,0]
husband - [0,0,1,0,0,0,0,0,0]
takes - [0,0,0,1,0,0,0,0,0]
his - [0,0,0,0,1,0,0,0,0]
Queen - [0,0,0,0,0,1,0,0,0]
wife - [0,0,0,0,0,0,1,0,0]
for - [0,0,0,0,0,0,0,1,0]
dinner - [0,0,0,0,0,0,0,0,1]
Lets address some disadvantages of this method.
讓我們解決這種方法的一些缺點。
1. Scalability Issue- The above example contained only 2 sentences and only 9 words in vocabulary. But in a real-world scenario, we will have millions of sentences and millions of words in vocabulary. You can imagine how the dimensions of one-hot-encoded vectors for each word will explode in millions. This will lead to scalability issues when it is fed to our models, and in turn will lead to inefficiency in time and computational resources.
1. 可伸縮性問題 -上面的示例僅包含2個句子和9個單詞。 但是在現實世界中,我們的詞匯中將有數百萬個句子和數百萬個單詞。 您可以想象每個單詞的一鍵編碼矢量的維數將爆炸成百萬。 當將其提供給我們的模型時,這將導致可伸縮性問題,進而導致時間和計算資源的效率低下。
2. Sparsity Issue- Given that we will have 0’s everywhere except for single 1 at the correct location, the models have a very hard time learning this data, therefore your model may not generalize well over the test data.
2. 稀疏性問題-假設除了正確的單個1之外,到處都是0,因此模型很難學習此數據,因此您的模型可能無法很好地概括測試數據。
3. No Context Captured- Since one-hot-encoding blindly creates vectors without taking into account the shared dependencies and context in which each word of vocabulary lies, we lose the contextual and semantic information. In our example, that means that we lose context between similar word pairs — “king” and “queen” are similar to “husband” and “wife”.
3. 沒有捕獲上下文-由于單次熱編碼盲目地創建矢量,而沒有考慮每個詞匯所處的共享依賴關系和上下文,因此我們丟失了上下文和語義信息。 在我們的示例中,這意味著我們在相似的單詞對之間失去了上下文- “國王”和“女王”類似于“丈夫”和“妻子” 。
如果詞嵌入非常復雜,為什么我們更喜歡它們而不是簡單的方法? (If Word Embeddings are so complex, why do we prefer them to simpler methods?)
There are many other simpler methods than Word Embeddings, such as Term-Frequency Matrix, TF-IDF Matrix and Co-occurrence Matrix. But even these methods face one or more issues in terms of scalability, sparsity and contextual dependency.
除了詞嵌入以外,還有許多其他更簡單的方法,例如術語頻率矩陣,TF-IDF矩陣和共現矩陣。 但是,即使這些方法在可伸縮性,稀疏性和上下文相關性方面也面臨一個或多個問題。
Therefore, we prefer Word Embeddings since it resolves all the issues mentioned above. The embeddings maps each word to a N-Dimensional space where N ranges from 50–1000 in contrast to a Million-Dimensional Space. Thus we resolve scalability issues. Since each vector in the embeddings is densely populated in contrast to a vector containing 0’s everywhere, we have also resolved the sparsity issues. Thus the model can now learn better and generalize well. Finally, these vectors are learned in a way that captures the shared context and dependencies among the words.
因此,我們更喜歡Word Embeddings,因為它可以解決上述所有問題。 嵌入將每個單詞映射到一個N維空間,其中N與50維空間相比,N的范圍是50-1000。 因此,我們解決了可伸縮性問題。 與每個位置都包含0的向量相比,嵌入中的每個向量都被密集地填充,因此我們也解決了稀疏性問題。 因此,該模型現在可以更好地學習并且可以很好地概括。 最后,以捕獲單詞之間共享上下文和依存關系的方式學習這些向量。
不同類型的單詞嵌入 (Different Types of Word Embeddings)
In this section, we will be reviewing the following State-Of-The-Art (SOTA) Word Embeddings:
在本節中,我們將審閱以下最新技術(SOTA)詞嵌入:
This is not an exhaustive list, but a great place to start with. There are many other SOTA Word Embeddings such as Bert (developed by Jacob Devlin at Google) and GPT (developed at OpenAI) that have also made advanced breakthroughs in NLP applications.
這不是一個詳盡的清單,而是一個很好的起點。 還有許多其他的SOTA單詞嵌入,例如Bert (由Google的Jacob Devlin開發)和GPT (由OpenAI開發)在NLP應用中也取得了先進的突破。
Word2Vec (Word2Vec)
Word2Vec is an algorithm developed by Tomas Mikolov, et al. at Google in 2013. The algorithm was built on the idea of the distributional hypothesis. The distributional hypothesis suggests that words occurring in similar linguistic contexts will also have similar semantic meaning. Word2Vec uses this concept to map words having similar semantic meaning geometrically close to each other in a N-Dimensional vector space.
Word2Vec是Tomas Mikolov等人開發的算法。 該算法于2013年在Google提出。該算法基于分布假設的思想。 分布假設表明,在相似的語言環境中出現的單詞也將具有相似的語義。 Word2Vec使用此概念將具有相似語義的單詞映射到N維向量空間中彼此幾何接近的單詞。
Word2Vec uses the approach of training a group of shallow, 2-layer neural networks to reconstruct the linguistic context of words. It takes in a large corpus of text as an input and produces a vector space with dimensions in the order of hundreds. Each unique word in the corpus vocabulary is assigned a unique corresponding vector in the space.
Word2Vec使用訓練一組淺的兩層神經網絡的方法來重建單詞的語言環境。 它以大量的文本集作為輸入,并產生尺寸為數百個的向量空間。 語料庫詞匯表中的每個唯一單詞在空間中分配有一個唯一的對應向量。
It can be implemented using either of the two techniques: Common Bag of Words(CBOW) or Skip Gram.
可以使用以下兩種技術之一來實現它: 通用單詞袋(CBOW)或跳過 語法 。
a) Common Bag of WordsThis technique uses the shallow 2-layer neural network to predict the probability of a word given the context. A context can be a single word or a group of words. The following diagram illustrates the concept:
a) 常用單詞袋該技術使用淺層2層神經網絡預測給定上下文的單詞概率。 上下文可以是單個單詞或一組單詞。 下圖說明了該概念:
word2vec Parameter Learning Explained by Xin Rong鑫榮解釋的word2vec參數學習The input will be the context of words each of them being one-hot-encoded and fed to the network and the output is the probability distributions of each word in the vocabulary.
輸入將是單詞的上下文,每個單詞都經過一次熱編碼并饋送到網絡,輸出是單詞中每個單詞的概率分布。
b) Skip GramThe skip gram is the flipped version of CBOW. We feed the model a single word and the model tries to predict the words surrounding it. The input is the one-hot-encoded vector of the word and the output is a series of probability distributions of each word in the vocabulary. For example, “I am going for a walk”.
b) 跳越克跳過克是CBOW的翻轉形式。 我們給模型一個單詞,模型試圖預測周圍的單詞。 輸入是單詞的單次編碼矢量,輸出是詞匯中每個單詞的一系列概率分布。 例如, “我要去散步”。
The vocabulary is — [“I”, “am”, “going”, “for”, “a”, “walk”]. Length of Vocabulary V=6For setting the number of surrounding words that model tries to predict, we define Context Window.
為了設置模型嘗試預測的周圍單詞的數量,我們定義了“上下文窗口”。
Let Context Window C=4Let the input be one of the words in the vocabulary. We will input the one-hot-encoded representation of this word of dimension V and the model is expected produce an output of series of probability distributions of each word with the output dimension being C*V.
讓輸入成為詞匯表中的單詞之一。 我們將輸入這個維數為V的單詞的單次熱編碼表示,并且期望該模型產生每個單詞的概率分布系列的輸出,且輸出維數為C * V。
word2vec Parameter Learning Explained by Xin Rong鑫榮解釋的word2vec參數學習Word2Vec was a major breakthrough in the field of Word Embeddings because it was able to capture relations in algebraic representations that were never captured before. For example, If we took words such as “King”, “Queen”, “man”, “wife” and mapped these words into the vector space, we found out out that the vector distance between “King” and “Queen” was the same as the vector distance between “man” and “woman”, which could allow us to produce outputs like the following:
Word2Vec是詞嵌入領域的一項重大突破,因為它能夠捕獲以前從未捕獲的代數表示形式中的關系。 例如,如果我們使用諸如“ King”,“ Queen”,“ man”,“ wife”之類的詞并將這些詞映射到向量空間中,我們發現“ King”與“ Queen”之間的向量距離為與“男人”和“女人”之間的向量距離相同,這可以使我們產生如下輸出:
手套 (Glove)
While the Word2Vec technique relies on local statistics (local context surrounding the word) to derive local semantics of a word, the Glove technique goes one step further by combining local statistics with global statistics such as Latent Semantic Analysis (Word Co-occurrence Matrix) to capture the global semantic relationships of a word. Glove was developed by Pennington, et al. at Stanford.
雖然Word2Vec技術依賴于本地統計信息 (單詞周圍的本地上下文)來得出單詞的本地語義,但Glove技術卻通過將本地統計信息與諸如潛在語義分析 (單詞共現矩陣)之類的全局統計信息相結合而進一步向前發展了一步。捕獲單詞的全局語義關系。 手套由Pennington等人開發。 在斯坦福。
For example, consider the co-occurrence probabilities for target words ice and steam with various probe words from the vocabulary. Here are some actual probabilities from a 6 billion word corpus:
例如,考慮目標詞“ 冰”和“ 蒸汽”與來自詞匯表的各種探測詞的共現概率。 這是一個來自60億個單詞的語料庫的實際概率:
GloVe: Global Vectors for Word Representation — Jeffrey PenningtonGloVe:單詞表示的全局向量 — Jeffrey PenningtonFrom the above table, we can observe that ice co-occurs with solid more frequently than it does with gas, and steam co-occurs with gas more frequently than with solid. Both words co-occur with their shared property water frequently, and both co-occur with the unrelated word fashion infrequently. From the ratio of probabilities, we can observe that non-discriminative words(water & fashion) have a ratio of probabilities approximately equal to 1, whereas discriminative words(solid and gas) have either high ratio of probabilities or very low ratio of probabilities. Using this method, the ratio of probabilities encodes some crude form of meaning associated with the abstract concept of thermodynamic phases. These ratios of probabilities are then encoded in the form of vector differences in N-Dimensional space.
從上表中可以看出,與氣體相比, 冰與固體的共生頻率更高,而與氣體相比, 蒸汽與氣體的共生頻率更高。 這兩個詞經常與它們共有的水一起出現,而兩個詞卻很少與不相關的詞時尚一起出現。 從概率的比率中,我們可以觀察到非歧視性詞( 水和時尚)的概率比率大約等于1,而區分性詞( 實心和氣體)的概率比率很高或非常低。 使用這種方法,概率比可以編碼與熱力學相的抽象概念相關的某種粗略形式的含義。 然后以N維空間中矢量差的形式對這些概率比率進行編碼。
Source: https://nlp.stanford.edu/projects/glove/資料來源: https : //nlp.stanford.edu/projects/glove/In the above image, we notice that the vector differences between word pairs such as man & woman and king & queen are roughly equal. The distinguishing factor between these word pairs is gender. As well as this pattern, we can also observe many other interesting patterns in the above visualization from Glove.
在上圖中,我們注意到男人和女人以及國王和王后等詞對之間的向量差大致相等。 這些詞對之間的區別因素是性別。 除了這種模式,我們還可以在上述Glove的可視化圖中觀察到許多其他有趣的模式。
艾莫 (ELMo)
Before we jump into ELMo, consider this example:
在進入ELMo之前,請考慮以下示例:
Her favorite fruit to eat is a date.
她最喜歡吃的水果是 約會 。
Joe took Alexandria out on a date.
喬 約會 了亞歷山大 。
We can observe that date has different meanings in different contexts. Word embeddings such as Glove and Word2Vec will produce the same vector for the word date in both the sentences. Hence, our models would fail to distinguish between the polysemous (having different meaning and senses) words. These word embeddings just cannot grasp the context in which the word was used.
我們可以觀察到, 日期在不同的上下文中具有不同的含義。 諸如Glove和Word2Vec的詞嵌入將為兩個句子中的詞日期產生相同的向量。 因此,我們的模型將無法區分多義詞 (具有不同的含義和感覺)。 這些詞嵌入只是無法掌握使用該詞的上下文。
ELMo resolves this issue by taking in the whole sentence as an input rather than a particular word and generating unique ELMo vectors for the same word used in different contextual sentences.
ELMo通過將整個句子(而不是特定單詞)作為輸入,并為用于不同上下文句子中的同一單詞生成唯一的ELMo向量, 從而解決了此問題。
It was developed by NLP researchers (Peters et. al., 2017, McCann et. al., 2017, and Peters et. al., 2018 in the ELMo paper) at Allen Institute of AI.
它是由NLP研究人員( Peters等人,2017 , McCann等人,2017和Peters等人,2018在ELMo論文中 )在艾倫AI研究所開發的。
Source: The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) by Jay Allamar資料來源: 插圖BERT,ELMo和co。 Jay Allamar的《 NLP如何破解傳輸學習》ELMo uses Bi-directional LSTM, which is pre-trained on a large text corpus to produce word vectors. It works by training to predict the next word given a sequence of words. This task is also known as Language Modeling.
ELMo使用雙向LSTM,該雙向LSTM在大型文本語料庫上進行了預訓練以生成單詞向量。 它通過訓練來預測給定單詞序列的下一個單詞。 此任務也稱為語言建模。
Source: The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) by Jay Allamar資料來源: 插圖BERT,ELMo和co。 Jay Allamar的《 NLP如何破解傳輸學習》ELMo representations are:
ELMo表示為:
Contextual: The ELMo vectors produced for a word depends on the context of the sentence the word is being used in.
上下文 :為一個單詞生成的ELMo向量取決于該單詞所使用的句子的上下文。
Character based: ELMo representations are purely character based, allowing the network to use morphological clues to form robust representations for out-of-vocabulary tokens unseen in training.
基于字符 :ELMo表示完全基于字符,允許網絡使用形態學線索來形成訓練中未曾見過的詞匯外標記的可靠表示。
詞嵌入的應用 (Applications of Word Embeddings)
Word Embeddings have played a huge role across the complete spectrum of NLP applications. The following are some of the famous applications that use Word Embeddings:
詞嵌入在整個NLP應用程序中都發揮了巨大作用。 以下是一些使用Word Embeddings的著名應用程序:
Word Embeddings have been integral in improving Document Search and Information Retrieval. An intuitive approach is to calculate Word Centroid Similarity. The representation for each document is the centroid of its respective word vectors. Since word vectors carry semantic information of the words, one could assume that the centroid of the word vectors within a document encodes its meaning to some extent. At query time, the centroid of the query’s word vectors is computed. The cosine similarity to the centroids of the (matching) documents is used as a measure of relevance. This speeds up the process and resolves the issue of search keywords needing to be exactly the same as those in the document.— Vec4ir by Lukas Galke
詞嵌入已成為改善文檔搜索和信息檢索不可或缺的一部分。 一種直觀的方法是計算單詞質心相似度 。 每個文檔的表示形式是其各個單詞向量的質心。 由于單詞向量攜帶單詞的語義信息,因此可以假設文檔中單詞向量的質心在某種程度上對其含義進行了編碼。 在查詢時,將計算查詢詞向量的質心。 與(匹配)文檔的質心的余弦相似度用作相關性的度量。 這樣可以加快流程,并解決需要與文檔中的搜索關鍵字完全相同的搜索關鍵字的問題。— Lukas Galke的Vec4ir
Word Embeddings have also improved Language Translation System. Facebook had recently released Multi-Lingual Word Embeddings (fastText) which has word vectors for 157 languages trained on Wikipedia and Crawl. Given a training data for example, Text Corpus having two language formats — Original Language: Japanese; Converted Language: English, we can feed the word vectors of the text corpus of these languages to a Deep Learning model say, Seq2Seq model and let it learn accordingly. During the evaluation-phase you can feed the test Japanese text corpus to this learned Seq2Seq and evaluate the results. fastText is considered one of the most efficient SOTA baselines.
單詞嵌入還改善了語言翻譯系統。 Facebook最近發布了多語言單詞嵌入( fastText ),其中包含Wikipedia和Crawl上訓練的157種語言的單詞向量。 例如,給定訓練數據,文本語料庫具有兩種語言格式-原始語言: 日語 ; 轉換后的語言: 英語,我們可以將這些語言的文本語料庫的單詞向量輸入到深度學習模型(即Seq2Seq模型)中,并據此進行學習。 在評估階段,您可以將測試的日語文本語料庫饋入此學習到的Seq2Seq并評估結果。 fastText被認為是最有效的SOTA基準之一。
翻譯自: https://medium.com/compassred-data-blog/introduction-to-word-embeddings-and-its-applications-8749fd1eb232
嵌入式系統分類及其應用場景
總結
以上是生活随笔為你收集整理的嵌入式系统分类及其应用场景_词嵌入及其应用简介的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 腾讯MOO音乐被曝面临下架 或成为QQ音
- 下一篇: hotelling变换_基于Hotell