深度学习背后的数学_深度学习背后的简单数学
深度學習背后的數學
Deep learning is one of the most important pillars in machine learning models. It is based on artificial neural networks. Deep learning is extremely popular because of its rich applications in the areas of image recognition, speech recognition, natural language processing (NLP), bioinformatics, drug designing, and many more. Although there are rich and efficient libraries and packages are being offered by major tech companies on Deep Learning, which are ready to use with much background knowledge. Still, it is worth understanding the little but impressive mathematics behind these models, especially the rule that works inside artificial neural networks (ANNs). There are many ways to understand the working of ANNs but we will begin with a very basic example of data fitting which perfectly explains the working of neural networks.
深度學習是機器學習模型中最重要的Struts之一。 它基于人工神經網絡。 由于深度學習在圖像識別,語音識別,自然語言處理(NLP),生物信息學,藥物設計等領域的廣泛應用,因此深度學習非常受歡迎。 盡管大型技術公司在深度學習方面提供了豐富而高效的庫和程序包,這些庫和程序包已準備好使用,并且具有很多背景知識。 盡管如此,仍然值得理解這些模型背后的一點點但令人印象深刻的數學,尤其是在人工神經網絡(ANN)中起作用的規則。 有很多方法可以理解ANN的工作原理,但我們將從一個非常基本的數據擬合示例入手,該示例完美地說明了神經網絡的工作原理。
Suppose, some data of land fertility for a region is given to us, see figure 1, where the circles represent the fertile land and the crosses represent the infertile land. Obviously this data is available for finite sites in the given region. If we wish to know the characteristics of the land at any random point in the region, we may like to have a mathematical transformation which picks the input data is a location of the site and maps it onto either circle or cross. i.e. if the land is fertile it will map it on the circle ( category A)or else it will map onto the cross (category B). So, the idea is to utilize the given data and provide information about those points where information is not available. Mathematically this what we call curve fitting. It is possible by creating a transformation rule which returns every point in R2 to either circle(fertile) or cross (infertile). There may be many ways to construct such transformations and its an open area for the users and researchers. Here, we will use a magical function called Sigmoid function. The sigmoid function is like a step function but it is continuous and differential, which makes it very interesting and important. It’s mathematical expression is
假設,我們獲得了某個地區的土地肥力數據,請參見圖1,其中圓圈代表肥沃的土地,而十字代表貧瘠的土地。 顯然,此數據可用于給定區域中的有限站點。 如果我們想知道該區域中任意隨機點的土地特征,我們可能希望進行數學轉換,以選擇輸入數據為站點的位置并將其映射到圓形或十字形上。 也就是說,如果土地肥沃,則將其映射到圓上(類別A),否則將映射到十字架上(類別B)。 因此,其想法是利用給定的數據并提供有關信息不可用的那些點的信息。 在數學上,這就是我們所說的曲線擬合。 通過創建轉換規則,可以將R2中的每個點返回到圓(可育)或十字(不育)。 可能有很多方法可以構建這樣的轉換,并為用戶和研究人員提供一個開放區域。 在這里,我們將使用一個稱為Sigmoid函數的神奇函數。 S形函數類似于階躍函數,但它是連續且微分的,這使其非常有趣且重要。 它的數學表達式是
Figure 2圖2Figure 1 shows the graph of the sigmoid functions, sometimes it is called a logistic function. Actually it is a smoothed version of the step function. It is a widely used function in ANN, the probable reason behind this may be its similar nature to the real neurons in the brain. When there is enough input (x is large) it gives output 1, otherwise, it remains inactive.
圖1顯示了S型函數圖,有時也稱為邏輯函數。 實際上,它是階躍函數的平滑版本。 它是ANN中廣泛使用的功能,其背后的可能原因可能是其與大腦中真實神經元的相似性質。 當有足夠的輸入(x很大)時,它給出輸出1,否則,它保持不活動狀態。
The steepness and the transitional nature of the sigmoid function in its current form may not be helpful for every situation, therefore we play with its steepness and transition simply by scaling and shifting the argument. e.g. if we draw
S形函數當前形式的陡峭性和過渡性質可能無法在每種情況下都有用,因此,我們僅通過縮放和移動參數即可解決其陡峭性和過渡性。 例如,如果我們畫
then it looks like
然后看起來
Figure 3圖3It shows we can handle the steepness and the transition of the sigmoid function by choosing the suitable value of the a and b.
它表明我們可以通過選擇a和b的合適值來處理S型函數的陡度和過渡。
This shifting and scaling in neural networks are called the weighting and biasing of the input. Therefore here ‘x’ is the input ‘a’ is the weight and ‘b’ is the bias. The optimal values of ‘a’ and ‘b’ are of extreme importance to develop any efficient neural network model. To be clear, the whole thing explained here is a single input structure, i.e. ‘x’ is a scalar.
神經網絡中的這種移動和縮放稱為輸入的加權和偏置。 因此,這里的“ x”是輸入,“ a”是權重 ,“ b”是偏差 。 “ a”和“ b”的最佳值對于開發任何有效的神經網絡模型都至關重要。 為了清楚起見,這里解釋的整個事情都是一個單一的輸入結構,即“ x”是一個標量。
Now we will use linear algebra to scale this concept for more than one inputs, i.e. instead of taking x as a single input, we can take ‘X’ as a vector. Here we are planning to define the sigmoid function
現在,我們將使用線性代數將這個概念擴展到多個輸入,即,我們可以將“ X”作為向量,而不是將x作為單個輸入。 在這里,我們計劃定義S形函數
This definition is important to understand, as it picks the components of ‘X’ (input vector)and maps componentwise using the sigmoid function. Now to introduce the weight and bias into the input which is vector now, we need to replace ‘a’ by a weight matrix ‘W’ and ‘b’ by the bias vector ‘B’. Therefore this scaled system will change into
該定義很重要,因為它選擇了“ X”(輸入向量)的分量并使用S型函數對各個分量進行映射。 現在要將權重和偏差引入到現在為矢量的輸入中,我們需要將權重矩陣“ W ”替換為“ a”,將偏置矢量“ B ”替換為“ b”。 因此,此縮放系統將變為
Here ‘W’ is the weight matrix of order m x m, ‘X’ is the input vector of length’ and ‘B’ is the bias vector of length m. Now the recursive use of the defined sigmoid function will lead us to the magical world of neurons, layers, input, and output. Let us try to understand with an example by taking an input vector of length 2, say X=[x1, x2], bias B=[b1,b2] and weight matrix W=[w11 w12; w21 w22]. Here X is the input layer or neuron or simply input which works as:
這里“ W”是m×m階的權重矩陣,“ X”是長度的輸入向量,“ B”是長度m的偏差向量。 現在,遞歸使用已定義的S型函數將使我們進入神經元,層,輸入和輸出的神奇世界。 讓我們通過一個長度為2的輸入矢量來嘗試理解一個例子,比如說X = [x1,x2],偏差B = [b1,b2]和權重矩陣W = [w11 w12; w21 w22]。 這里X是輸入層或神經元或簡單地輸入,其工作方式如下:
Figure 4圖4After this first operation, we achieved a new layer which is:
進行完第一個操作后,我們獲得了一個新層:
Here the cross arrows represent that, in the creation of the new neurons x1 and x2 both are involved for all the components. This whole procedure helped us to create new neurons from the existing neurons (input data). This simple idea can be scaled two any finite number of input vectors (in the above example it was a vector of length two) say ‘m’, in that case, we can write the same sigmoid function as
此處的十字箭頭表示,在創建新神經元x1和x2時,所有組件都參與其中。 整個過程幫助我們從現有的神經元(輸入數據)創建新的神經元。 這個簡單的想法可以縮放成兩個任意有限數量的輸入向量(在上面的示例中是長度為2的向量),例如m,在這種情況下,我們可以編寫與
To connect everything once again, so far we just implemented the sigmoid function on the given input vector (in terms of ANN it is called neurons) and created another layer of neurons as explained above. Now we will again use the sigmoid function on the newly created neurons, here we can play with the weight matrix and it will enhance the number of neurons in the next layer. For example, while applying the sigmoid function on the new layer let’s choose the weight matrix of order 3 by 2 and a bias vector of length three and it will produce three new neurons in the next layer. This flexibility in the choice of order of weight matrix provides us the desired number of neurons in the next layer. Once we get the second layer of neurons apply the sigmoid function again on the second layer, you will get third, keep on recursively using this idea and one can have as many as layers. Since these layers work as intermediate layers that is why they are popular as hidden layers, and the recursive use of the sigmoid function (or any activation function) takes us to deep down in learning the data, probably this is the reason we call it as deep learning. if we repeated this whole process four times we will have four layers in the neural network model and the following mathematical function and layers.
為了再次連接所有內容,到目前為止,我們僅在給定的輸入矢量上實現了S型函數(就ANN而言,它稱為神經元),并如上所述創建了另一層神經元。 現在,我們將再次在新創建的神經元上使用S形函數,在這里我們可以處理權重矩陣,它將增加下一層中神經元的數量。 例如,在新層上應用S形函數時,讓我們選擇3乘2的權重矩陣和長度為3的偏差矢量,它將在下一層中產生三個新的神經元。 權重矩陣順序選擇的靈活性為我們在下一層中提供了所需的神經元數量。 一旦我們獲得第二層神經元在第二層上再次應用S形函數,您將獲得第三層,繼續使用此思想遞歸,其中一層可以多達多層。 由于這些層充當中間層,這就是為什么它們被用作隱藏層的原因,并且對Sigmoid函數(或任何激活函數)的遞歸使用使我們深入學習數據,這也許就是我們稱其為深度學習 。 如果我們將整個過程重復四次,我們將在神經網絡模型中具有四層,并具有以下數學函數和層。
Finally, we got a mathematical function from which actually fits the given data. Here, how many hidden layers and how many neurons one has to create, it totally depends on the user. Naturally, the more hidden layers and intermediate neurons will return more complex F(X). Nevermind, let come back to our F(X) again if one wishes to count how may weight coefficients and bias components are used, they are 23 in numbers. All these 23 parameters are required to be optimized to get the best fit of the given data.
最后,我們得到了一個數學函數,可以從中實際擬合給定的數據。 在這里,一個人必須創建多少個隱藏層和多少個神經元,這完全取決于用戶。 自然地,更多的隱藏層和中間神經元將返回更復雜的F(X)。 沒關系,如果希望計算加權系數和偏差分量的使用方式,請再次返回F(X),它們的數量為23。 所有這23個參數都需要進行優化,以最佳地擬合給定數據。
If we reconnect with our example of the fertile land classifier, if the value of F(X) is close to 1, X will be mapped into the category A (fertile land), if F(X) is close to 0, X will be mapped into the category B(infertile land). But, in reality, we will establish a breaking rule which will be helpful to classify the data. If one carefully see in figure 1, there are 20 data points in the data, these data shall be used as a target output to improve to train the model. Here training the model means to get the optimal values of all 23 parameters which provide the best fit of the given data. Since there are two types of target data category A (circles) and category B (crosses). Let x(i), i=1,…,20 are the data points whose images are either circle or cross in figure 1. Now we classify
如果我們重新結合肥沃的土地分類器的示例,如果F(X)的值接近1,則 X將映射到類別A(肥沃的土地),如果F(X)接近0,則X將被劃為B類(不育土地)。 但是,實際上,我們將建立一條打破規則,這將有助于對數據進行分類。 如果仔細觀察圖1,則數據中有20個數據點,這些數據將用作改進模型訓練的目標輸出。 在這里訓練模型意味著獲得所有23個參數的最優值,這些參數提供了給定數據的最佳擬合。 由于目標數據有兩種類型,類別A(圓圈)和類別B(十字)。 令x(i),i = 1,…,20是圖1中圖像為圓形或十字形的數據點。
Here (x(i),y(x(i))) are the given data points (See figure 1). Now, these y(x(i)), i=1,..,20 shall be used as target vectors to get the optimal values of all parameters (weights and bias). We define the following cost function (objective function)
這里(x(i),y(x(i)))是給定的數據點(見圖1)。 現在,這些y(x(i)),i = 1,..,20將被用作目標向量,以獲得所有參數(權重和偏差)的最佳值。 我們定義以下成本函數(目標函數)
If one carefully see this function it has two important aspects, firstly it uses the given data points (y(x(i))) and the function(F(x)) created from the recursive operation of the sigmoid function. F(x) involves all the weights and biases which are still unknown. the other multiples 1/20 is used to normalize the function and 1/2 is used for the differentiation point of view. But they do not matter to us from the optimization point of view(why ?). Now our objective is to find the minimum value of this cost function, ideally, this has to be zero, but in reality, it can not be zero. Finally, the values of all the weights and biases for which this cost function is minimum are the optimal values of weights and biases. To determine the optimal values of these parameters (weights and biases), are actually termed as training the neural network. Now how to get these optimal values, for this one needs to use some optimization algorithm to minimize the cost function, these algorithms may be gradient descent, stochastic gradient, etc. How these algorithms work and how can we minimize this cost function, it’s another day's story.
如果仔細看過這個函數,它有兩個重要方面,首先,它使用給定的數據點(y(x(i)))和從S型函數的遞歸操作創建的函數(F(x))。 F(x)包含所有未知的權重和偏差。 其他倍數1/20用于歸一化函數,而1/2用于微分觀點。 但是從優化的角度來看,它們對我們來說并不重要(為什么?)。 現在,我們的目標是找到此成本函數的最小值,理想情況下,它必須為零,但實際上不能為零。 最后,此成本函數最小的所有權重和偏差的值是權重和偏差的最佳值。 為了確定這些參數的最佳值(權重和偏差),實際上稱為訓練神經網絡 。 現在如何獲得這些最優值,為此,需要使用某種優化算法來最小化成本函數,這些算法可能是梯度下降,隨機梯度等。這些算法如何工作以及如何最小化該成本函數,這是另一種方法一天的故事。
After minimizing the cost function, we will have the optimal values of the parameters which can be put in F(x) and we can have the value of F(x) for every input x. If F(x) is near to 1 then x will fall in category A, if x is near to zero then x will fall in category B. Our classifier is ready to use. Even we can draw a boundary line on the data set which classifies both the categories.
在最小化成本函數之后,我們將獲得可放入F(x)的參數的最佳值,并且對于每個輸入x均可獲得F(x)的值。 如果F(x)接近1,則x屬于A類;如果x接近零,則x屬于B類。我們的分類器已準備就緒。 甚至我們都可以在數據集上劃出一條邊界線來對這兩個類別進行分類。
In this study, we used sigmoid function to train the model but in general, there are many more activation functions that may be used in a similar way.
在這項研究中,我們使用了S型函數來訓練模型,但是總的來說,還有更多的激活函數可以類似的方式使用。
Congratulations you learned the fundamental mathematics and its execution behind deep learning-based classifiers.
恭喜您學習了基于深度學習的分類器背后的基礎數學及其執行。
Higham, Catherine F., and Desmond J. Higham. “Deep learning: An introduction for applied mathematicians.” SIAM Review 61.4 (2019): 860–891.
Higham,Catherine F.和Desmond J. Higham。 “深度學習:面向應用數學家的介紹?!?SIAM評論 61.4(2019):860–891。
翻譯自: https://medium.com/artifical-mind/simple-mathematics-behind-deep-learning-c38152c8b534
深度學習背后的數學
總結
以上是生活随笔為你收集整理的深度学习背后的数学_深度学习背后的简单数学的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: “美版头条”BuzzFeed 股价连续大
- 下一篇: 于和伟获评国家一级演员 本人幽默回应:恭