ansys电力变压器模型_变压器模型……一切是如何开始的?
ansys電力變壓器模型
Transformer models have revolutionised the field of Natural Language Processing but, how did it all start? To understand current state-of-the-art architectures and genuinely appreciate why these models became a breakthrough in the field, we must go even further in time, where NLP as we know it started: when we first introduced Neural Networks in NLP.
變壓器模型徹底改變了自然語言處理領域,但是,這一切是如何開始的呢? 為了了解當前的最新體系結構并真正理解為什么這些模型成為該領域的突破,我們必須走得更遠,我們所知道的NLP始于此:當我們在NLP中首次引入神經網絡時。
The introduction of Neural models to NLP found ways to overcome challenges that traditional methods couldn’t solve. One of the most remarkable advances were Sequence-to-Sequence models: Such models generate an output sequence by predicting one word at a time. Sequence-to-Sequence models encode the source text to reduce ambiguity and achieve context-awareness.
將神經模型引入NLP發現了克服傳統方法無法解決的挑戰的方法。 最引人注目的進步之一是序列到序列模型:此類模型通過一次預測一個單詞來生成輸出序列。 序列到序列模型對源文本進行編碼,以減少歧義并實現上下文感知。
In any language task, context plays an essential role. To understand what words mean, we have to know something about the situation where they are used. Seq2Seq models achieve context by looking at a token level: previous word/sentences to generate the next words/sentences. The introduction to this representation of context embedded in space had multiple advantages such as avoiding data sparsity due to similar context data being mapped close to each other and providing a way to generate synthetic data.
在任何語言任務中,上下文都起著至關重要的作用。 要理解單詞的含義,我們必須了解使用它們的情況。 Seq2Seq模型通過查看令牌級別來實現上下文:前一個單詞/句子生成下一個單詞/句子。 嵌入到空間中的上下文表示的介紹具有多個優點,例如避免了由于相似的上下文數據彼此靠近映射而導致的數據稀疏性,并提供了一種生成合成數據的方法。
However, context in language is very sophisticated. Most times, you can’t find context by only focusing on the previous sentence. There is a need for long range dependencies to achieve context awareness. Seq2Seq models work with Recurrent Neural Networks: LSTM or GRU. These networks have memory mechanisms to regulate the flow of information when processing sequences to achieve a “long term memory.” Despite this, if a sequence is long enough, they’ll have a hard time carrying information from earlier time steps to later ones.
但是,語言中的上下文非常復雜。 多數情況下,僅關注上一個句子就無法找到上下文。 為了實現上下文感知,需要長期依賴 。 Seq2Seq模型可用于遞歸神經網絡:LSTM或GRU。 這些網絡具有存儲機制,可在處理序列以實現“長期存儲”時調節信息流。 盡管如此,如果一個序列足夠長,他們將很難將信息從較早的時間步長傳送到較晚的時間步長。
RNNs will fall short when trying to process entire paragraphs of text. They suffer from the vanishing gradient problem. Gradients are values used to update the weights of a neural network and thus, learn. The vanishing gradient problem occurs when the gradient shrinks as it backpropagates through time. If a gradient value becomes extremely small, it doesn’t contribute much to learning. Moreover, RNNs’ topology is very time-consuming because, for every backpropagation step, the network needs to see the entire sequence of words.
嘗試處理文本的整個段落時,RNN將不足。 他們遭受梯度消失的困擾。 梯度是用于更新神經網絡的權重并因此進行學習的值。 當梯度隨著時間向后傳播而縮小時,就會出現消失的梯度問題。 如果梯度值變得非常小,則對學習沒有太大貢獻。 此外,RNN的拓撲非常耗時,因為對于每個反向傳播步驟,網絡都需要查看整個單詞序列。
CNN logarithmic path (image by author)CNN對數路徑(作者提供的圖片)As a way to try to fix these problems, the use of Convolutional Neural Networks was introduced in NLP. Using convolutions to create a logarithmic path. The network can “observe” the entire sequence in a log number of convolutional layers. However, this raised a new challenge: positional bias. How do we make sure that the positions we are observing in the text are the ones that give more insights? Why focus on position X of the sequence and not X-1?
為了解決這些問題,NLP中引入了卷積神經網絡。 使用卷積創建對數路徑。 網絡可以對數卷積層“觀察”整個序列。 但是,這帶來了新的挑戰:位置偏差。 我們如何確保在文本中觀察到的位置能夠提供更多的見解? 為什么要關注序列的X而不是X-1?
Besides, the challenge it's not only to find a way of encoding a large amount of text sequences but also to be able to determine which parts of that text are essential to gain context-awareness. Not all the text is equally important to gain understanding. To address this, the attention mechanism was introduced in Seq2Seq models.
此外,挑戰在于,不僅要找到一種編碼大量文本序列的方法,而且還要能夠確定該文本的哪些部分對于獲得上下文感知至關重要。 并非所有文本對理解都同樣重要。 為了解決這個問題,在Seq2Seq模型中引入了注意力機制。
Attention mechanism is inspired in the visual attention animals have were they focus on specific parts of their visual inputs to compute adequate responses. Attention used in Seq2Seq architectures seeks to give more contextual information to the decoder. At every decoding step, the decoder is informed how much “attention” it should give to each input word.
注意機制的靈感來自動物的視覺注意,因為它們專注于視覺輸入的特定部分以計算適當的響應。 Seq2Seq體系結構中使用的注意力旨在為解碼器提供更多上下文信息。 在每個解碼步驟中,都會告知解碼器應給予每個輸入字多少“注意力”。
Attention in Seq2Seq models (image by author)Seq2Seq模型中的注意事項(作者提供的圖片)Despite the improvements in context awareness, there was still a substantial opportunity to improve. The most significant drawback of these methods is the complexity of these architectures.
盡管上下文意識有所改善,但仍有很大的機會可以改進。 這些方法的最大缺點是這些體系結構的復雜性。
This is where the transformer model came into the picture. The transformer model introduces the idea of instead of adding another complex mechanism (attention) to an already complex Seq2Seq model; we can simplify the solution by forgetting about everything else and just focusing on attention.
這就是變壓器模型出現的地方。 轉換器模型引入了一種想法,而不是在已經很復雜的Seq2Seq模型中添加另一種復雜的機制(注意); 我們可以通過忽略其他所有內容而只關注注意力來簡化解決方案。
This model removes recurrence, it only uses matrices multiplications. It processes all the inputs at once without having to process it in sequential manner. To avoid losing order, it uses positional embeddings that provide information about the position in the sequence of each element. And despite removing recurrence it still provides an encoder-decoder architecture such as the one seen in Seq2Seq models
該模型消除了重復發生,僅使用矩陣乘法。 它可以一次處理所有輸入,而不必依次處理。 為了避免丟失順序,它使用位置嵌入來提供有關每個元素序列中位置的信息。 盡管消除了重復現象,它仍然提供了一種編碼器-解碼器架構,例如Seq2Seq模型中所見的架構
So, after seeing all the challenges we face with previous models, let’s dive deep into what the transformer model solves in comparison to Seq2Seq models.
因此,在了解了先前模型所面臨的所有挑戰之后,讓我們深入了解與Seq2Seq模型相比,變壓器模型所要解決的問題。
變壓器技術深入探討 (Transformer technical deep dive)
Transformer architecture (image by author)變壓器架構(作者提供圖片)While RNN fell short when we needed to process entire paragraphs to gain context, transformers are able to identify long-range dependencies achieving context-awareness. We also saw that RNN by themselves struggle in determining which parts of the text give more information, to do so they needed to add an extra layer, a bidirectional RNN to implement the attention mechanism. On the contrary, the transformer works only with attention so that it can determine the essential parts of context at different levels
當我們需要處理整個段落以獲取上下文時,RNN不夠用,而轉換器可以識別實現上下文感知的遠程依賴項。 我們還看到,RNN自己在努力確定文本的哪些部分提供更多信息,為此,他們需要添加一個額外的層,即雙向RNN以實現注意力機制。 相反,轉換器僅注意工作,以便可以確定不同級別上下文的基本部分
Another critical difference is that the transformer model removes recurrence. By eliminating the recurrence, the number of sequential operations is reduced, and the computational complexity is decreased. In RNNs for every backpropagation step, the network needs to see the entire sequence of words. In the transformer, all the input is processed at once decreasing computational complexity. This also brings a new advantage, we can now parallelise training. Being able to split training examples into several tasks to process independently boosts the training efficiency.
另一個重要的區別是,變壓器模型消除了重復現象。 通過消除重復,減少了順序操作的次數,并降低了計算復雜度。 在RNN的每個反向傳播步驟中,網絡都需要查看整個單詞序列。 在變壓器中,所有輸入都會被立即處理,從而降低了計算復雜度。 這也帶來了新的優勢,我們現在可以并行化培訓。 能夠將訓練示例分解為多個任務以獨立處理,從而提高了訓練效率。
So how does the model keep the sequence order without using recurrence?
那么模型如何在不使用遞歸的情況下保持序列順序呢?
Using Positional embeddings. The model takes a sequence of n word embeddings. To model position information, a positional embedding is added to each word embedding.
使用位置嵌入。 該模型采用n個單詞嵌入的序列。 為了建模位置信息,將位置嵌入添加到每個單詞嵌入。
Positional embeddings (image by author)位置嵌入(作者提供的圖片)Positional embeddings are created using sine and cosine functions with different dimensions. Words are encoded with the pattern created by the combination of these functions; this results in a continuous binary encoding of positions in a sequence.
使用不同尺寸的正弦和余弦函數創建位置嵌入。 單詞通過這些功能的組合創建的模式進行編碼。 這導致序列中位置的連續二進制編碼。
The transformer model uses multihead attention to encode the input embeddings, when doing so, it attends inputs in a forward and backward matter so the order in the sequence is lost. Because of this, it relies on positional embeddings that we just explained.
變形器模型使用多頭注意力對輸入嵌入進行編碼,這樣做時,它會以向前和向后的方式參與輸入,因此順序中的順序會丟失。 因此,它依賴于我們剛剛解釋過的位置嵌入。
The transformer has three different attention mechanisms: the encoder attention, the encoder-decoder attention and the decoder attention. So how does the attention mechanism works? It is basically a vector multiplication, where depending on the angle of the vector one can determine the importance of each value. If the angles of the vectors are close to 90 degrees, then the dot product will be close to zero, but if the vectors point to the same direction, the dot product will return a greater value.
變壓器具有三種不同的注意機制:編碼器注意,編碼器-解碼器注意和解碼器注意。 那么注意力機制是如何工作的呢? 從本質上講,它是一個向量乘法,根據向量的角度,可以確定每個值的重要性。 如果向量的角度接近90度,則點積將接近零,但是如果向量指向相同的方向,則點積將返回更大的值。
Each key has a value associated, and for every new input vector, we can determine how much does this vector relates to the value vectors, and select the closest term using a softmax function.
每個鍵都有一個關聯的值,對于每個新的輸入向量,我們可以確定該向量與值向量有多少關系,并使用softmax函數選擇最接近的項。
Multi-Head Attention (image by author)多頭注意力(作者提供的圖片)Transformers have a multihead attention; we can think of it as filters in CNN’s, each one learns to pay attention to a specific group of words. One can learn to identify short-range dependencies while others learn to identify long-range dependencies.This improves the context-awareness, we can understand what terms refer to when it’s not clear; for example, with words such as pronouns.
變壓器需要多頭關注; 我們可以將其視為CNN中的過濾器,每個人都學會注意特定的一組單詞。 一個人可以學會識別短程依賴關系,而其他人則可以學會識別長程依賴關系。這提高了上下文意識,我們可以理解不清楚時指的是什么術語。 例如,帶有代詞之類的單詞。
The transformer architecture facilitates the creation of powerful models trained on massive datasets. Even though it is not feasible for everyone to train these models. We can now leverage of transfer learning to use these pre-trained language modes and fine-tune them for our specific tasks.
轉換器架構有助于在大量數據集上訓練強大的模型。 即使每個人訓練這些模型都不可行。 現在,我們可以利用遷移學習來使用這些預先訓練的語言模式,并針對我們的特定任務對其進行微調。
Transformers models have revolutionised the field. They have excelled RNN-based architectures in a wide range of tasks, and they will continue to create tremendous impact in the area of NLP.
變壓器模型徹底改變了這一領域。 他們在很多任務上都超越了基于RNN的體系結構,并且將繼續在NLP領域產生巨大影響。
翻譯自: https://towardsdatascience.com/transformer-models-how-did-it-all-start-2e5b385ddd93
ansys電力變壓器模型
總結
以上是生活随笔為你收集整理的ansys电力变压器模型_变压器模型……一切是如何开始的?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mongdb 群集_通过对比群集分配进行
- 下一篇: 浓缩摘要_浓缩咖啡的收益递减