交叉熵求导
. 輸入為z向量,z=[z1,z2,...,zn]z=[z_{1},z_{2},...,z_{n}]z=[z1?,z2?,...,zn?],維度為(1,n)輸出s=[e1∑k=1nek,e2∑k=1nek,...,en∑k=1nek]s=[\frac{e^{1}}{\sum_{k=1}^{n}e^{k}},\frac{e^{2}}{\sum_{k=1}^{n}e^{k}},...,\frac{e^{n}}{\sum_{k=1}^{n}e^{k}}]s=[∑k=1n?eke1?,∑k=1n?eke2?,...,∑k=1n?eken?],
維度為(1,n)
2. 經過softmax函數, si=ei∑k=1neks_{i}=\frac{e^{i}}{\sum_{k=1}^{n}e^{k}}si?=∑k=1n?ekei?
3. Softmax Loss損失函數定義為L, L=?∑k=1nyiln?(si)L=-\sum_{k=1}^{n}y_{i}\ln \left ( s_{i}\right )L=?∑k=1n?yi?ln(si?),L是一個標量,維度為(1,1)
其中y向量為模型的Label,維度也是(1,n),為已知量,一般為onehot形式。
我們假設第 j 個類別是正確的,則y=[0,0,…1,…,0],只有yj=1y_{j}=1yj?=1,其余yj=0y_{j}=0yj?=0
L=?yjln?(sj)==?ln?(sj)L=-y_{j}\ln \left ( s_{j}\right )==-\ln \left ( s_{j}\right )L=?yj?ln(sj?)==?ln(sj?)
我們的目標是求 標量L對向量 Z 的導數?L?Z\frac{\partial L}{\partial Z}?Z?L?
由鏈式法則,?L?z=?L?s??s?z\frac{\partial L}{\partial z}=\frac{\partial L}{\partial s}\cdot\frac{\partial s}{\partial z}?z?L?=?s?L???z?s?
其中s和z均為維度為(1,n)的向量。
?L?s=[0,0,...,?1sj,0,...,0],dim=[1?n]\frac{\partial L}{\partial s}=[0,0,...,-\frac{1}{s_{j}},0,...,0] ,dim=[1*n]?s?L?=[0,0,...,?sj?1?,0,...,0],dim=[1?n]
?s?z=\frac{\partial s}{\partial z}=?z?s?=如下,dim=[n*n]
?s?z=[s1?[1?s1]?s1?s2?s1?s3...?s1?sj...?s1?sn?s2?s1s2?[1?s2]?s2?s2....?s2?sj...?s2?sn?s3?s1?s3?s2s3?[1?s3]...?s3?sj...?s3?sn..................?sj?s1?sj?s2?sj?s3...sj?[1?sj]...?sj?sn..................?sn?s1?sn?s2?sn?s3....?sn?sj...sn?[1?sn]]\frac{\partial s}{\partial z}=\begin{bmatrix} s_{1}*[1- s_{1}]& -s_{1}* s_{2}& -s_{1}* s_{3}& ... & -s_{1}* s_{j}&...&-s_{1}* s_{n}& \\ -s_{2}* s_{1}& s_{2}*[1- s_{2}] & -s_{2}* s_{2}& ....&-s_{2}* s_{j}&...&-s_{2}* s_{n} \\ -s_{3}* s_{1}& -s_{3}* s_{2}& s_{3}* [1-s_{3}] & ...&-s_{3}* s_{j}&...&-s_{3}* s_{n} \\ ...& ... & ...& ...& ...& ...& \\ -s_{j}* s_{1}& -s_{j}* s_{2}& -s_{j}* s_{3}& ...&s_{j}* [1-s_{j}]&...&-s_{j}* s_{n} \\ ...& ... & ...& ...& ...& ...& \\ -s_{n}*s_{1}& -s_{n}*s_{2}& - s_{n}*s_{3}& ....& - s_{n}*s_{j}&...&s_{n}*[1-s_{n} ]& \end{bmatrix} ?z?s?=???????????s1??[1?s1?]?s2??s1??s3??s1?...?sj??s1?...?sn??s1???s1??s2?s2??[1?s2?]?s3??s2?...?sj??s2?...?sn??s2???s1??s3??s2??s2?s3??[1?s3?]...?sj??s3?...?sn??s3??.......................??s1??sj??s2??sj??s3??sj?...sj??[1?sj?]...?sn??sj??.....................??s1??sn??s2??sn??s3??sn??sj??sn?sn??[1?sn?]?????????????
[1*n] ?L?s\frac{\partial L}{\partial s}?s?L?的矩陣左乘n*n的矩陣?s?z\frac{\partial s}{\partial z}?z?s?
?L?z=?L?s??s?z=[s1,s2,...,sj?1,...,sn]=s?y\frac{\partial L}{\partial z}=\frac{\partial L}{\partial s}\cdot\frac{\partial s}{\partial z}=[s_{1},s_{2},...,s_{j}-1,...,s_{n}]=s-y?z?L?=?s?L???z?s?=[s1?,s2?,...,sj??1,...,sn?]=s?y
主要鏈接
在線latex
一個國外的小哥的推導
總結
- 上一篇: ROC 曲线和 AUC 值
- 下一篇: python 快速排序