人工神经网络_制作属于自己的人工神经网络
在本文中,我已經實現了具有Dropout和L2正則化的人工神經網絡的完全向量化代碼。
在本文中,我實現了一個在多個數據集上測試的人工神經網絡的完全向量化python代碼。此外,并對Dropout和L2正則化技術進行了實現和詳細說明。
強烈建議通過人工神經網絡的基本工作,正向傳播和反向傳播。
本文分為10個部分:
1.簡介
人工神經網絡是監督深度學習中最簡單和最基本的概念之一。它可用于執行多個任務,如二進制或分類。它看起來很容易理解和實現。在編寫這樣一個網絡的過程中,會出現一些小問題,這些問題會導致很大的錯誤,并幫助您理解以前忽略的概念。因此,在本文中,我嘗試實現一個人工神經網絡,它可能會幫助您節省正確編碼和理解主題的每個概念所需的工作時間。
2.先決條件
我假設你知道神經網絡是什么以及它們是如何學習的。如果你對Python和像numpy這樣的庫很熟悉的話,這將很容易理解。另外,還需要對線性代數和微積分知識有很好的了解,以便于輕松地理解正向和反向傳播部分。此外,我強烈建議您閱讀Andrew Ng在Coursera上的課程視頻(https://www.coursera.org/ ; https://www.deeplearning.ai/ )。
3.導入我們的庫
現在,我們可以開始對神經網絡進行編碼了。第一件事是需要導入我們實現網絡所需的所有庫。
# Importing the librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport warningsimport timewarnings.filterwarnings('ignore');import osimport sys我們將使用pandas導入和清理我們的數據集。Numpy 是執行矩陣代數和復雜計算的最重要的庫。
4.激活函數及求導
在本文后面的部分中,我們將需要激活函數來執行正向傳播。另外,我們還需要在反向傳播過程中對激活函數求導。
所以,讓我們編寫一些激活函數。
def sigmoid(z) : """ Reutrns the element wise sigmoid function. """ return 1./(1 + np.exp(-z))def sigmoid_prime(z) : """ Returns the derivative of the sigmoid function. """ return sigmoid(z)*(1-sigmoid(z))def ReLU(z) : """ Reutrns the element wise ReLU function. """ return (z*(z > 0))def ReLU_prime(z) : """ Returns the derivative of the ReLU function. """ return 1*(z>=0)def lReLU(z) : """ Reutrns the element wise leaky ReLU function. """ return np.maximum(z/100,z)def lReLU_prime(z) : """ Returns the derivative of the leaky ReLU function. """ z = 1*(z>=0) z[z==0] = 1/100 return zdef tanh(z) : """ Reutrns the element wise hyperbolic tangent function. """ return np.tanh(z)def tanh_prime(z) : """ Returns the derivative of the tanh function. """ return (1-tanh(z)**2)# A dictionary of our activation functionsPHI = {'sigmoid':sigmoid, 'relu':ReLU, 'lrelu':lReLU, 'tanh':tanh}# A dictionary containing the derivatives of our activation functionsPHI_PRIME = {'sigmoid':sigmoid_prime, 'relu':ReLU_prime, 'lrelu':lReLU_prime, 'tanh':tanh_prime}我們有四種最流行的激活函數。首先是常規的sigmoid激活函數。
我們有ReLU或“ 線性整流函數 ”。我們將主要使用這個激活函數。注意,我們將保持ReLU 0的導數在點0處。
我們還有一個ReLU的擴展版本叫做Leaky ReLU。它的工作原理與ReLU類似,可以在某些數據集上提供更好的結果(不一定是全部)。
我們還有tanh(雙曲正切)激活函數。它也被廣泛使用,并且幾乎總是優于sigmoid。
另外,PHI和PHI_PRIME是分別包含激活函數及其導數的python字典。
5.我們的神經網絡類
在本節中,我們將創建并初始化我們的神經網絡類。首先,我們將決定在初始化期間使用哪些參數。我們需要:
記住這一點,讓我們開始編寫神經網絡的類:
class NeuralNet : """ This is a class for making Artificial Neural Networks. L2 and Droupout are the default regularization methods implemented in this class. It takes the following parameters: 1. layers : A python list containing the different number of neurons in each layer. (containing the output layer) Eg - [64,32,16,16,1] 2. X : Matrix of features with rows as features and columns as different examples. 3. y : Numpy array containing the ouputs of coresponding examples. 4. ac_funcs : A python list containing activation function of each layer. Eg - ['relu','relu','lrelu','tanh','sigmoid'] 5. init_method : Meathod to initialize weights of the network. Can be 'gaussian','random','zeros'. 6. loss_func : Currently not implemented 7. W : Weights of a pretrained neural network with same architecture. 8. W : Biases of a pretrained neural network with same architecture. """現在我們有了一個正確記錄的神經網絡類,我們可以繼續初始化網絡的其他變量。
def __init__(self, layers, X, y, ac_funcs, init_method='gaussian', loss_func='b_ce', W=np.array([]), B=np.array([])) : """ Initialize the network. """ # Store the layers of the network self.layers = layers # ---- self.W = None self.B = None # Store the number of examples in the dataset as m self.m = X.shape[1] # Store the full layer list as n self.n = [X.shape[0], *layers] # Save the dataset self.X = X # Save coresponding output self.y = y # List to store the cost of the model calculated during training self.cost = [] # Stores the accuracy obtained on the test set. self.acc = 0 # Activation function of each layer self.ac_funcs = ac_funcs self.loss = loss_func # Inittialize the weights by provided methods if not provided.我們將使用' self.m'來存儲數據集中示例的數量。' self.n '將存儲每層中神經元數量的信息。' self.ac_funcs '是每層的激活函數的python列表。' self.cost '將在我們訓練網絡時存儲成本函數的記錄值。' self.acc '將在訓練后的數據集上存儲記錄的精度。在初始化網絡的所有變量之后,讓我們進一步初始化網絡的權重和偏差。
6.初始化權重和偏差
我們知道權重不能初始化為零,因為每個神經元的假設變得相同而網絡永遠不會學習。因此,我們必須有一些方法來使我們的神經網絡學習。我們可以使用高斯正態分布來獲得隨機值。由于這些分布的均值為零,因此權重集中在零并且非常小。因此,網絡開始非??焖儆行У貙W習。我們可以使用np.random.randn()函數從正態分布中生成隨機值。
# Inittialize the weights by provided methods if not provided. if len(W) and len(B) : self.W = W self.B = B else : if init_method=='gaussian': self.W = [np.random.randn(self.n[nl], self.n[nl-1]) for nl in range(1,len(self.n))] self.B = [np.zeros((nl,1), 'float32') for nl in self.layers] elif init_method == 'random': self.W = [np.random.rand(self.n[nl], self.n[nl-1]) for nl in range(1,len(self.n))] self.B = [np.random.rand(nl,1) for nl in self.layers] elif init_method == 'zeros': self.W = [np.zeros((self.n[nl], self.n[nl-1]), 'float32') for nl in range(1,len(self.n))] self.B = [np.zeros((nl,1), 'float32') for nl in self.layers]我們已將權重初始化為正態分布中的隨機值。偏差已初始化為零。
7.正向傳播
首先,讓我們理解沒有正則化的正向傳播。
我們用Z表示每個神經元從一層到另一層的連接。一旦我們計算了Z,我們將激活函數f應用于Z值以獲得每層中每個神經元的激活y。這是簡單的正向傳播。Dropout是一種提高神經網絡泛化能力和魯棒性的神奇技術。所以,讓我們首先了解一下Dropout正則化。
Dropout正則化的本質
Dropout,顧名思義,是指神經網絡中的一些神經元“失活”,并對其余的神經元進行訓練。
為了提高性能,我們可以訓練幾十個和幾百個具有不同超參數值的神經網絡,獲得所有網絡的輸出并取其平均值來獲得最終結果。這個過程在計算上非常昂貴并且實際上不能實現。因此,我們需要一種以更優化和計算成本更低的方式執行類似操作的方法。Dropout正則化以非常便宜和簡單的方式完成相似的事情。事實上,Dropout是優化性能的簡單方法,近年來受到了廣泛的關注,并且幾乎在許多其他深度學習模型中無處不在。
要實現Dropout,我們將使用以下方法:
我們將首先從伯努利分布中提取隨機值,如果概率高于某個閾值則保持神經元不變,然后執行常規正向傳播。注意,我們不會在預測新數據集上的值或測試時間期間應用dropout。
實現Dropout的代碼
我們使用keep_prob作為每層神經元存活的概率。我們只保留概率高于存活概率或keep_prob的神經元。假設,它的值是0.8。這意味著我們將使每一層中20%的神經元失活,并訓練其余80%的神經元。注意,我們在每次迭代后停用隨機選擇的神經元。這有助于神經元學習在更大的數據集上泛化的特征。
def _feedForward(self, keep_prob): """ Forward pass """ z = [];a = [] z.append(np.dot(self.W[0], self.X) + self.B[0]) a.append(PHI[self.ac_funcs[0]](z[-1])) for l in range(1,len(self.layers)): z.append(np.dot(self.W[l], a[-1]) + self.B[l]) # a.append(PHI[self.ac_funcs[l]](z[l])) _a = PHI[self.ac_funcs[l]](z[l]) a.append( ((np.random.rand(_a.shape[0],1) < keep_prob)*_a)/keep_prob ) return z,a我們首先初始化將存儲Z和A值的列表。我們首先在Z中附加第一層的線性值,然后在A中附加第一層神經元的激活。PHI是一個python字典,包含我們之前編寫的激活函數。類似地使用for循環計算所有其他層的Z和A的值。注意,我們沒有在輸入層應用dropout。我們最終返回Z和A的計算值。
8.成本函數
我們將使用標準二進制/分類交叉熵成本函數。
def _cost_func(self, a, _lambda): """ Binary Cross Entropy Cost Function """ return ( (-1/self.m)*np.sum(np.nan_to_num(self.y*np.log(a) + (1-self.y)*np.log(1-a))) + (_lambda/(2*self.m))*np.sum([np.sum(i**2) for i in self.W]) ) def _cost_derivative(self, a) : """ The derivative of cost w.r.t z """ return (a-self.y)我們用L2正則化對我們的成本函數進行了編譯。lambda參數稱為“ 懲罰參數 ”。它有助于使權重值不會迅速增加,從而更好地形成。這里,' a'包含輸出層的激活值。我們還有函數_cost_derivative來計算成本函數對輸出層激活的導數。我們稍后會在反向傳播期間使用它。
9.反向傳播
以下是我們需要執行反向傳播的一些公式。
我們將在深度神經網絡上實現這一點。右邊的公式是完全向量化的。一旦理解了這些公式,我們就可以繼續對它們進行編譯。
def startTraining(self, epochs, alpha, _lambda, keep_prob=0.5, interval=100): """ Start training the neural network. It takes the followng parameters : 1. epochs : Number of epochs for which you want to train the network. 2. alpha : The learning rate of your network. 3. _lambda : L2 regularization parameter or the penalization parameter. 4. keep_prob : Dropout regularization parameter. The probability of neurons to deactivate. Eg - 0.8 means 20% of the neurons have been deactivated. 5. interval : The interval between updates of cost and accuracy. """ start = time.time() for i in range(epochs+1) : z,a = self._feedForward(keep_prob) delta = self._cost_derivative(a[-1]) for l in range(1,len(z)) : delta_w = np.dot(delta, a[-l-1].T) + (_lambda)*self.W[-l] delta_b = np.sum(delta, axis=1, keepdims=True) delta = np.dot(self.W[-l].T, delta)*PHI_PRIME[self.ac_funcs[-l-1]](z[-l-1]) self.W[-l] = self.W[-l] - (alpha/self.m)*delta_w self.B[-l] = self.B[-l] - (alpha/self.m)*delta_b delta_w = np.dot(delta, self.X.T ) + (_lambda)*self.W[0] delta_b = np.sum(delta, axis=1, keepdims=True) self.W[0] = self.W[0] - (alpha/self.m)*delta_w self.B[0] = self.B[0] - (alpha/self.m)*delta_b我們將epochs、alpha(學習率)、_lambda、keep_prob和interval作為函數的參數來實現反向傳播。
我們從正向傳播開始。然后我們將成本函數的導數計算為delta?,F在,對于每一層,我們計算delta_w和delta_b,其中包含成本函數對網絡的權重和偏差的導數。然后我們根據各自的公式更新delta,權重和偏差。在將最后一層的權重和偏差更新到第二層之后,我們更新第一層的權重和偏差。我們這樣做了幾次迭代,直到權重和偏差的值收斂。
重要提示:此處可能出現的一個重大錯誤是在更新權重和偏差后更新delta 。這樣做可能會導致非常糟糕的梯度漸變消失/爆炸問題。
我們的大部分工作都在這里完成,但是我們仍然需要編譯可以預測新數據集結果的函數。因此,作為我們的最后一步,我們將編寫一個函數來預測新數據集的標簽。
10.預測新數據集的標簽
這一步非常簡單。我們只需要執行正向傳播但不需要Dropout正則化。我們不需要在測試時應用Dropout正則化,因為我們需要所有層的所有神經元來為我們提供適當的結果,而不僅僅是一些隨機值。
def predict(self, X_test) : """ Predict the labels for a new dataset. Returns probability. """ a = PHI[self.ac_funcs[0]](np.dot(self.W[0], X_test) + self.B[0]) for l in range(1,len(self.layers)): a = PHI[self.ac_funcs[l]](np.dot(self.W[l], a) + self.B[l]) return a我們將返回輸出層的激活作為結果。
整個代碼
以下是您自己實現人工神經網絡的完整代碼。我在培訓時添加了一些代碼來打印網絡的成本和準確性。除此之外,一切都是一樣的。
# Importing the librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport warningsimport timewarnings.filterwarnings('ignore');import osimport sys# Importing our datasetos.chdir("C:/Users/Hilak/Desktop/INTERESTS/Machine Learning A-Z Template Folder/Part 3 - Classification/Section 14 - Logistic Regression");training_set = pd.read_csv("Social_Network_Ads.csv");# Splitting our dataset into matrix of features and output values.X = training_set.iloc[:, 1:4].valuesy = training_set.iloc[:, 4].values# Encoding our object features.from sklearn.preprocessing import LabelEncoder, OneHotEncoderle_x = LabelEncoder()X[:,0] = le_x.fit_transform(X[:,0])ohe = OneHotEncoder(categorical_features = [0])X = ohe.fit_transform(X).toarray()# Performing Feature scalingfrom sklearn.preprocessing import StandardScalerss = StandardScaler()X[:,2:4] = ss.fit_transform(X[:, 2:4])# Splitting the dataset into train and test set.from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)X_train = X_train.TX_test = X_test.T# # Alternate Dataset for test purposes. Not used in the example shown# os.chdir("C:甥敳獲HilakDesktopINTERESTSMachine Learning A-Z Template FolderPart 8 - Deep LearningSection 39 - Artificial Neural Networks (ANN)");# dataset = pd.read_csv('Churn_Modelling.csv')# X = dataset.iloc[:, 3:13].values# y = dataset.iloc[:, 13].values# # Encoding categorical data# from sklearn.preprocessing import LabelEncoder, OneHotEncoder# labelencoder_X_1 = LabelEncoder()# X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])# labelencoder_X_2 = LabelEncoder()# X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])# onehotencoder = OneHotEncoder(categorical_features = [1])# X = onehotencoder.fit_transform(X).toarray()# X = X[:, 1:]# # Splitting the dataset into the Training set and Test set# from sklearn.model_selection import train_test_split# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)# X_test, X_CV, y_test, y_CV = train_test_split(X, y, test_size = 0.5)# # Feature Scaling# from sklearn.preprocessing import StandardScaler# sc = StandardScaler()# X_train = sc.fit_transform(X_train)# X_test = sc.transform(X_test)# X_train = X_train.T# X_test = X_test.T# X_CV = X_CV.Tdef sigmoid(z) : """ Reutrns the element wise sigmoid function. """ return 1./(1 + np.exp(-z))def sigmoid_prime(z) : """ Returns the derivative of the sigmoid function. """ return sigmoid(z)*(1-sigmoid(z))def ReLU(z) : """ Reutrns the element wise ReLU function. """ return (z*(z > 0))def ReLU_prime(z) : """ Returns the derivative of the ReLU function. """ return 1*(z>=0)def lReLU(z) : """ Reutrns the element wise leaky ReLU function. """ return np.maximum(z/100,z)def lReLU_prime(z) : """ Returns the derivative of the leaky ReLU function. """ z = 1*(z>=0) z[z==0] = 1/100 return zdef tanh(z) : """ Reutrns the element wise hyperbolic tangent function. """ return np.tanh(z)def tanh_prime(z) : """ Returns the derivative of the tanh function. """ return (1-tanh(z)**2)# A dictionary of our activation functionsPHI = {'sigmoid':sigmoid, 'relu':ReLU, 'lrelu':lReLU, 'tanh':tanh}# A dictionary containing the derivatives of our activation functionsPHI_PRIME = {'sigmoid':sigmoid_prime, 'relu':ReLU_prime, 'lrelu':lReLU_prime, 'tanh':tanh_prime}class NeuralNet : """ This is a class for making Artificial Neural Networks. L2 and Droupout are the default regularization methods implemented in this class. It takes the following parameters: 1. layers : A python list containing the different number of neurons in each layer. (containing the output layer) Eg - [64,32,16,16,1] 2. X : Matrix of features with rows as features and columns as different examples. 3. y : Numpy array containing the ouputs of coresponding examples. 4. ac_funcs : A python list containing activation function of each layer. Eg - ['relu','relu','lrelu','tanh','sigmoid'] 5. init_method : Meathod to initialize weights of the network. Can be 'gaussian','random','zeros'. 6. loss_func : Currently not implemented 7. W : Weights of a pretrained neural network with same architecture. 8. W : Biases of a pretrained neural network with same architecture. """ def __init__(self, layers, X, y, ac_funcs, init_method='gaussian', loss_func='b_ce', W=np.array([]), B=np.array([])) : """ Initialize the network. """ # Store the layers of the network self.layers = layers # ---- self.W = None self.B = None # Store the number of examples in the dataset as m self.m = X.shape[1] # Store the full layer list as n self.n = [X.shape[0], *layers] # Save the dataset self.X = X # Save coresponding output self.y = y # List to store the cost of the model calculated during training self.cost = [] # Stores the accuracy obtained on the test set. self.acc = 0 # Activation function of each layer self.ac_funcs = ac_funcs self.loss = loss_func # Inittialize the weights by provided methods if not provided. if len(W) and len(B) : self.W = W self.B = B else : if init_method=='gaussian': self.W = [np.random.randn(self.n[nl], self.n[nl-1]) for nl in range(1,len(self.n))] self.B = [np.zeros((nl,1), 'float32') for nl in self.layers] elif init_method == 'random': self.W = [np.random.rand(self.n[nl], self.n[nl-1]) for nl in range(1,len(self.n))] self.B = [np.random.rand(nl,1) for nl in self.layers] elif init_method == 'zeros': self.W = [np.zeros((self.n[nl], self.n[nl-1]), 'float32') for nl in range(1,len(self.n))] self.B = [np.zeros((nl,1), 'float32') for nl in self.layers] def startTraining(self, epochs, alpha, _lambda, keep_prob=0.5, interval=100): """ Start training the neural network. It takes the followng parameters : 1. epochs : Number of epochs for which you want to train the network. 2. alpha : The learning rate of your network. 3. _lambda : L2 regularization parameter or the penalization parameter. 4. keep_prob : Dropout regularization parameter. The probability of neurons to deactivate. Eg - 0.8 means 20% of the neurons have been deactivated. 5. interval : The interval between updates of cost and accuracy. """ start = time.time() for i in range(epochs+1) : z,a = self._feedForward(keep_prob) delta = self._cost_derivative(a[-1]) for l in range(1,len(z)) : delta_w = np.dot(delta, a[-l-1].T) + (_lambda)*self.W[-l] delta_b = np.sum(delta, axis=1, keepdims=True) delta = np.dot(self.W[-l].T, delta)*PHI_PRIME[self.ac_funcs[-l-1]](z[-l-1]) self.W[-l] = self.W[-l] - (alpha/self.m)*delta_w self.B[-l] = self.B[-l] - (alpha/self.m)*delta_b delta_w = np.dot(delta, self.X.T ) + (_lambda)*self.W[0] delta_b = np.sum(delta, axis=1, keepdims=True) self.W[0] = self.W[0] - (alpha/self.m)*delta_w self.B[0] = self.B[0] - (alpha/self.m)*delta_b if not i%interval : aa = self.predict(self.X) if self.loss == 'b_ce': aa = aa > 0.5 self.acc = sum(sum(aa == self.y)) / self.m cost_val = self._cost_func(a[-1], _lambda) self.cost.append(cost_val) elif self.loss == 'c_ce': aa = np.argmax(aa, axis = 0) yy = np.argmax(self.y, axis = 0) self.acc = np.sum(aa==yy)/(self.m) cost_val = self._cost_func(a[-1], _lambda) self.cost.append(cost_val) sys.stdout.write(f'Epoch[{i}] : Cost = {cost_val:.2f} ; Acc = {(self.acc*100):.2f}% ; Time Taken = {(time.time()-start):.2f}s') print('') return None def predict(self, X_test) : """ Predict the labels for a new dataset. Returns probability. """ a = PHI[self.ac_funcs[0]](np.dot(self.W[0], X_test) + self.B[0]) for l in range(1,len(self.layers)): a = PHI[self.ac_funcs[l]](np.dot(self.W[l], a) + self.B[l]) return a def _feedForward(self, keep_prob): """ Forward pass """ z = [];a = [] z.append(np.dot(self.W[0], self.X) + self.B[0]) a.append(PHI[self.ac_funcs[0]](z[-1])) for l in range(1,len(self.layers)): z.append(np.dot(self.W[l], a[-1]) + self.B[l]) # a.append(PHI[self.ac_funcs[l]](z[l])) _a = PHI[self.ac_funcs[l]](z[l]) a.append( ((np.random.rand(_a.shape[0],1) < keep_prob)*_a)/keep_prob ) return z,a def _cost_func(self, a, _lambda): """ Binary Cross Entropy Cost Function """ return ( (-1/self.m)*np.sum(np.nan_to_num(self.y*np.log(a) + (1-self.y)*np.log(1-a))) + (_lambda/(2*self.m))*np.sum([np.sum(i**2) for i in self.W]) ) def _cost_derivative(self, a) : """ The derivative of cost w.r.t z """ return (a-self.y) @property def summary(self) : return self.cost, self.acc, self.W,self.B def __repr__(self) : return f''# Initializing our neural networkneural_net_sigmoid = NeuralNet([32,16,1], X_train, y_train, ac_funcs = ['relu','relu','sigmoid'])# Staring the training of our network.neural_net_sigmoid.startTraining(5000, 0.01, 0.2, 0.5, 100)# Predicting on new dataset using our trained network.preds = neural_net_sigmoid.predict(X_test)preds = preds > 0.5acc = (sum(sum(preds == y_test)) / y_test.size)*100# Accuracy (metric of evaluation) obtained by the network.print(f'Test set Accuracy ( r-t-s ) : {acc}%')# Plotting our cost vs epochs relationshipsigmoid_summary = neural_net_sigmoid.summaryplt.plot(range(len(sigmoid_summary[0])), sigmoid_summary[0], label='Sigmoid Cost')plt.title('Cost')plt.show()# Comparing our results with the library keras.from keras.models import Sequentialfrom keras.layers import DenseX_train, X_test = X_train.T, X_test.Tclassifier = Sequential()classifier.add(Dense(input_dim=4, units = 32, kernel_initializer="uniform 創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的人工神经网络_制作属于自己的人工神经网络的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 万份收益和基金净值的区别,都可以计算收益
- 下一篇: 国债怎么买?国债投资技巧