机器学习中激活函数和模型_探索机器学习中的激活和丢失功能
機器學習中激活函數和模型
In this post, we’re going to discuss the most widely-used activation and loss functions for machine learning models. We’ll take a brief look at the foundational mathematics of these functions and discuss their use cases, benefits, and limitations.
在本文中,我們將討論用于機器學習模型的最廣泛使用的激活和損失函數。 我們將簡要介紹這些功能的基礎數學,并討論它們的用例,好處和局限性。
Without further ado, let’s get started!
事不宜遲,讓我們開始吧!
Source:https://missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/資料來源: https : //missinglink.ai/guides/neural-network-concepts/7-types-neural-network-activation-functions-right/什么是激活功能? (What is an Activation Function?)
To learn complex data patterns, the input data of each node in a neural network passes through a function the limits and defines that same node’s output value. In other words, it takes in the output signal from the previous node and converts it into a form interpretable by the next node. This is what an activation function allows us to do.
為了學習復雜的數據模式,神經網絡中每個節點的輸入數據都要通過一個函數限制,并定義相同節點的輸出值。 換句話說,它接收來自前一個節點的輸出信號,并將其轉換為下一個節點可解釋的形式。 這就是激活功能允許我們執行的操作。
需要激活功能 (Need for an Activation function)
Restricting value: The activation function keeps the values from the node restricted within a certain range, because they can become infinitesimally small or huge depending on the multiplication or other operations they go through in various layers (i.e. the vanishing and exploding gradient problem).
限制值:激活函數將節點的值限制在一定范圍內,因為根據它們在各個層中經歷的乘法或其他運算(即消失和爆炸梯度問題),它們的值可能無限小或很大。
Add non-linearity: In the absence of an activation function, the operations done by various functions can be considered as stacked over one another, which ultimately means a linear combination of operations performed on the input. Thus, a neural network without an activation function is essentially a linear regression model.
增加非線性:在沒有激活功能的情況下,可以將各種功能完成的操作視為彼此疊加,這最終意味著對輸入執行的操作的線性組合。 因此, 沒有激活函數的神經網絡本質上是線性回歸模型 。
激活功能的類型 (Types of Activation functions)
Various types of activation functions are listed below:
下面列出了各種類型的激活功能:
乙狀結腸激活功能 (Sigmoid Activation function)
The sigmoid function was traditionally used for binary classification problems (goes along the lines of “if x≤0.5, y=0 else y=1”). But, it tends to cause vanishing gradients problem, and if the values are too close to 0 or +1, the curve or gradient is almost flat and thus the learning would be too slow.
S型函數通常用于二進制分類問題(沿“如果x≤0.5,y = 0,否則y = 1”)。 但是,它往往會導致梯度消失的問題 ,并且,如果值太接近0或+1,則曲線或梯度幾乎是平坦的,因此學習會太慢。
It’s also computationally expensive, since there are a lot of complex mathematical operations involved.
由于涉及許多復雜的數學運算,因此它在計算上也很昂貴。
Tanh激活功能 (Tanh Activation Function)
The tanh function was also traditionally used for binary classification problems (goes along the lines of “if x≤0, y=0 else y=1”).
傳統上,tanh函數也用于二進制分類問題(沿“如果x≤0,y = 0,否則y = 1”的線)。
It’s different than sigmoid in the sense that it’s zero-centred, and thus restricts input values between -1 and +1. It’s even more computationally expensive than sigmoid since there are a lot of complex mathematical operations involved, which need to be performed for every input and iteration, repeatedly.
它在以零為中心的意義上不同于Sigmoid,因此將輸入值限制在-1和+1之間。 它比Sigmoid的計算成本更高,因為其中涉及許多復雜的數學運算,每個輸入和迭代都需要重復執行這些運算。
ReLU激活功能 (ReLU Activation Function)
ReLU is a famous, widely-used non-linear activation function, which stands for Rectified Linear Unit (goes along the lines of “if x≤0, y=0 else y=1”).
ReLU是一個著名的,廣泛使用的非線性激活函數,代表整流線性單位(沿“如果x≤0,y = 0,否則y = 1”的線)。
Thus, it’s only activated when the values are positive. ReLU is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations.
因此,僅當值為正時才激活它。 ReLU在計算上比tanh和Sigmoid 便宜 ,因為它涉及更簡單的數學運算。
But it faces what’s known as the “dying ReLU problem”—that is, when inputs approach zero, or are negative, the gradient of the function becomes zero, and thus the model learns slowly. ReLU is considered a go-to function if one is new to activation function or is unsure about which one to choose.
但是它面臨著所謂的“即將死去的ReLU問題”,也就是說,當輸入接近零或為負時,函數的梯度變為零,因此模型學習緩慢 。 如果對激活功能不熟悉或不確定要選擇哪個功能,則ReLU被認為是一項入門功能。
泄漏ReLU激活功能 (Leaky-ReLU Activation function)
The leaky ReLU function proposes a solution to the dying ReLU problem. It has a small positive slope in the negative plane, so it enables the model to learn, even for negative input values.
泄漏的ReLU函數為即將死去的ReLU問題提出了解決方案。 它在負平面上具有小的正斜率,因此即使對于負輸入值,它也使模型能夠學習。
Leaky ReLUs are widely used with generative adversarial networks. Parametric leaky ReLUs use a value “alpha”, which is usually around 0.1, to determine the slope of the function in the negative plane.
泄漏的ReLU被廣泛用于生成對抗網絡 。 參數泄漏ReLU使用值“ alpha”(通常約為0.1)來確定函數在負平面中的斜率。
Softmax激活功能 (Softmax Activation Function)
Source:https://medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86d資料來源: https : //medium.com/data-science-bootcamp/understand-the-softmax-function-in-minutes-f3a59641e86dThe softmax function is a function that helps us represent inputs in terms of a discrete probability distribution. According to the formula, we need to apply an exponential function to each element of the output layer and normalize the values to ensure their summation is 1. The output class is the one with the highest confidence score.
softmax函數是幫助我們以離散概率分布表示輸入的函數。 根據公式,我們需要對輸出層的每個元素應用指數函數,并對值進行歸一化以確保其總和為1。輸出類是置信度得分最高的類。
This function is mostly used as the last layer in classification problems—especially multi-class classification problems—where the model ultimately outputs a probability for each of the available classes, and the most probable one is chosen as the answer.
此功能通常用作分類問題(尤其是多分類問題)的最后一層,在該問題中,模型最終為每個可用分類輸出概率,然后選擇最有可能的一個作為答案。
The future of machine learning is on the edge. Subscribe to the Fritz AI Newsletter to discover the possibilities and benefits of embedding ML models inside mobile apps.
機器學習的未來處于邊緣。 訂閱Fritz AI新聞簡報,發現將ML模型嵌入移動應用程序的可能性和好處 。
什么是損失函數? (What is a Loss Function?)
Source: https://www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/來源: https : //www.analyticsvidhya.com/blog/2019/08/detailed-guide-7-loss-functions-machine-learning-python-code/To understand how well/poorly our model is working, we monitor the value of loss functions for several iterations. It helps us measure the accuracy of our model and understand how our model behaves for certain inputs. Thus, it can be considered as the error or deviation in the prediction from the correct classes or values. The larger the value of the loss function, the further our model strays from making the correct prediction.
為了了解我們的模型工作的好壞,我們監視損失函數的值進行多次迭代。 它可以幫助我們評估模型的準確性,并了解模型在某些輸入下的行為。 因此, 可以將其視為與正確的類或值的預測中的誤差或偏差 。 損失函數的值越大,我們的模型就越無法做出正確的預測。
損失函數的類型 (Types of loss functions)
Depending on the type of learning task, loss functions can be broadly classified into 2 categories:
根據學習任務的類型,損失函數可以大致分為兩類:
Regression loss functions
回歸損失函數
Classification loss functions
分類損失函數
回歸損失函數 (Regression Loss Functions)
In this sub-section, we’ll discuss some of the more widely-used regression loss functions:
在本小節中,我們將討論一些更廣泛使用的回歸損失函數:
平均絕對誤差或L1損耗(MAE) (Mean Absolute Error or L1 Loss (MAE))
The mean absolute error is the average of absolute differences between the values predicted by the model and the actual values. There’s an issue with MAE though—if some values are underestimated (negative value of error) and some are almost equally overestimated (positive value of error), they might cancel each other out, and we may get the wrong idea about the net error.
平均絕對誤差是模型預測的值與實際值之間的絕對差的平均值。 但是,MAE存在一個問題-如果某些值被低估(錯誤的負值),而有些值幾乎同樣被高估了(錯誤的正值),它們可能會相互抵消,并且我們可能會得出關于凈錯誤的錯誤觀念。
均方誤差或L2損耗(MSE) (Mean Squared Error or L2 Loss(MSE))
The mean squared error is the average of the squared differences between the values predicted by the model and the actual values. Squaring the error also helps us avoid the nullification issue faced by MAE.
均方誤差是模型預測的值與實際值之間的平方差的平均值。 對錯誤進行平方也有助于我們避免MAE面臨的無效問題。
MSE is also used to emphasize the error terms in cases where the input and output values have small scales. Thus, due to squaring the error terms, large errors have relatively greater influence when using MSE than smaller errors.
在輸入和輸出值的比例較小的情況下,MSE還用于強調誤差項。 因此,由于對誤差項進行平方,因此使用MSE時,大誤差比小誤差具有更大的影響。
However, this can be a gamble when there are a lot of outliers in your data. Since the outliers would have greater weight due to higher error values being squared, it can make the error or loss function biased. Thus, outlier eradication should be performed before applying MSE.
但是,當數據中有很多異常值時,這可能是一場賭博。 由于較高的誤差值會導致異常值具有更大的權重,因此會使誤差或損失函數產生偏差。 因此,應在應用MSE之前執行離群值消除。
胡貝爾損失 (Huber loss)
Huber loss is an absolute error, and as you can see from the formula above, it becomes quadratic as the error grows smaller and smaller. In the above formula, y is the expected value, x? ? is the predicted value, and t? is a user-defined hyper-parameter.
Huber損耗是絕對誤差,從上式可以看出,隨著誤差變得越來越小,它變為二次方。 在上式中,y是期望值,x?是預測值,t?是用戶定義的超參數。
分類損失函數 (Classification Loss Functions)
In this sub-section, we’ll discuss some of the more widely-used loss functions for classification tasks:
在本小節中,我們將討論用于分類任務的一些更廣泛使用的損失函數:
交叉熵損失 (Cross-Entropy loss)
This loss is also called log loss. To understand cross-entropy loss, let’s first understand what entropy is. Entropy refers to the disorder or uncertainty in data. The larger the entropy value, the higher the level of disorder.
此丟失也稱為對數丟失。 為了了解交叉熵損失,讓我們首先了解什么是熵。 熵是指數據的混亂或不確定性。 熵值越大,無序程度越高。
As you can see in the above formula, the entropy is basically the negative summation of the product of the probability of occurrence of an event with its log over all possible outcomes. Thus, cross entropy as a loss function signifies reducing entropy or uncertainty for the class to be predicted.
如上式所示,熵基本上是事件發生概率與所有可能結果的對數的乘積的負和。 因此,作為損失函數的交叉熵表示要預測的類別的熵或不確定性降低。
Cross-entropy loss is therefore defined as the negative summation of the product of the expected class and the natural log of the predicted class over all possible classes. The negative sign is used because the positive log of numbers < 1 returns negative values, which is confusing to work with while evaluating model performance.
因此,交叉熵損失定義為在所有可能類別上,預期類別與預測類別的自然對數之積的負和。 使用負號是因為數字<1的正對數會返回負值,這在評估模型性能時會造成混淆。
For example, if the problem at hand is binary classification, the value of y can be 0 or 1. In such a case, the above loss formula reduces to:
例如,如果眼前的問題是二進制分類,則y的值可以為0或1。在這種情況下,上述損耗公式可簡化為:
-(ylog(p)+(1?y)log(1?p))
-(ylog(p)+(1-y)log(1-p))
where p is the value of predicted probability that an observation O is of class C.
其中p是觀測值O屬于C類的預測概率的值。
Thus, the loss function over the complete set of samples would be:
因此,整個樣本集的損失函數為:
鉸鏈損失 (Hinge loss)
Hinge loss helps in penalizing the wrongly-predicted values, as well as the values that were correctly predicted but with a lower probability score. Hinge loss is primarily used with Support Vector Machines (SVMs), since it supports the formation of a large-margin classifier by penalizing wrongly-predicted values, as well as the correctly-predicted ones with low probability.
鉸鏈損失有助于懲罰錯誤預測的值以及正確預測但具有較低概率分數的值。 鉸鏈損耗主要與支持向量機(SVM)一起使用,因為它通過懲罰錯誤預測的值以及低概率正確預測的值來支持大利潤分類器的形成。
結論 (Conclusion)
In this post we discussed about various activation functions like sigmoid, tanh, ReLU, leaky-ReLU and softmax, along with their primary use cases. These are the most widely-used activation functions and are essential for developing efficient neural networks.
在這篇文章中,我們討論了各種激活函數,例如Sigmoid,tanh,ReLU,leaky-ReLU和softmax,以及它們的主要用例。 這些是使用最廣泛的激活函數,對于開發有效的神經網絡至關重要。
We also discussed a few major loss functions like mean squared error, mean absolute error, huber loss, cross-entropy loss, and hinge loss.
我們還討論了一些主要的損失函數,例如均方誤差,平均絕對誤差,huber損失,交叉熵損失和鉸鏈損失。
I hope this article has helped you learn and understand more about these fundamental ML concepts. All feedback is welcome. Please help me improve!
我希望本文能幫助您學習和了解有關這些基本ML概念的更多信息。 歡迎所有反饋。 請幫我改善!
Until next time!😊
直到下次!😊
Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.
編者注: 心跳 是由貢獻者驅動的在線出版物和社區,致力于探索移動應用程序開發和機器學習的新興交集。 我們致力于為各行各業的開發人員和工程師提供支持和啟發。
Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.
Heartbeat在編輯上是獨立的,由以下機構贊助和發布 Fritz AI ,一種機器學習平臺,可幫助開發人員教設備看,聽,感知和思考。 我們向貢獻者付款,并且不出售廣告。
If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Fritz AI Newsletter), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.
如果您想做出貢獻,請繼續我們的 呼吁捐助者 。 您還可以注冊以接收我們的每周新聞通訊(《 深度學習每周》 和《 Fritz AI新聞通訊》 ),并加入我們 Slack ,然后繼續關注Fritz AI Twitter 提供了有關移動機器學習的所有最新信息。
翻譯自: https://heartbeat.fritz.ai/exploring-activation-and-loss-functions-in-machine-learning-39d5cb3ba1fc
機器學習中激活函數和模型
總結
以上是生活随笔為你收集整理的机器学习中激活函数和模型_探索机器学习中的激活和丢失功能的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 熔池 沉积_用于3D打印的AI(第3部分
- 下一篇: macos上的硬盘检测工具_如何在Mac