暹罗网络目标跟踪_暹罗网络的友好介绍
暹羅網絡目標跟蹤
In the modern Deep learning era, Neural networks are almost good at every task, but these neural networks rely on more data to perform well. But, for certain problems like face recognition and signature verification, we can’t always rely on getting more data, to solve this kind of tasks we have a new type of neural network architecture called Siamese Networks.
在現代深度學習時代,神經網絡幾乎可以勝任每項任務,但是這些神經網絡需要更多的數據才能表現良好。 但是,對于諸如人臉識別和簽名驗證之類的某些問題,我們不能總是依靠獲取更多數據來解決這類任務,我們擁有一種新型的神經網絡架構,稱為暹羅網絡。
It uses only a few numbers of images to get better predictions. The ability to learn from very little data made Siamese networks more popular in recent years. In this article, we will explore what it is and how to develop a signature verification system with Pytorch using Siamese Networks.
它僅使用少量圖像來獲得更好的預測。 從很少的數據中學習的能力使得暹羅網絡近年來變得越來越流行。 在本文中,我們將探討它是什么以及如何使用Pytorch使用Siamese Networks開發簽名驗證系統。
什么是連體網絡!? (What are Siamese Networks!?)
SignetSignet中使用的連體網絡A Siamese Neural Network is a class of neural network architectures that contain two or more identical subnetworks. ‘identical’ here means, they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both sub-networks. It is used to find the similarity of the inputs by comparing its feature vectors, so these networks are used in many applications
暹羅神經網絡是一類神經網絡體系結構,其中包含兩個或多個相同的子網絡 。 “ 相同”在這里是指它們具有相同的配置和相同的參數和權重。 參數更新反映在兩個子網中。 它用于通過比較其特征向量來查找輸入的相似性,因此這些網絡被用于許多應用中
Traditionally, a neural network learns to predict multiple classes. This poses a problem when we need to add/remove new classes to the data. In this case, we have to update the neural network and retrain it on the whole dataset. Also, deep neural networks need a large volume of data to train on. SNNs, on the other hand, learn a similarity function. Thus, we can train it to see if the two images are the same (which we will do here). This enables us to classify new classes of data without training the network again.
傳統上,神經網絡會學習預測多個類別。 當我們需要向數據添加/刪除新類時,這會帶來問題。 在這種情況下,我們必須更新神經網絡并在整個數據集中對其進行重新訓練。 而且,深度神經網絡需要大量的數據進行訓練。 另一方面,SNN學習相似性函數。 因此,我們可以訓練它以查看兩個圖像是否相同(我們將在此處執行)。 這使我們能夠分類新的數據類別,而無需再次訓練網絡。
暹羅網絡的優缺點: (Pros and Cons of Siamese Networks:)
The main advantages of Siamese Networks are,
暹羅網絡的主要優勢是:
More Robust to class Imbalance: With the aid of One-shot learning, given a few images per class is sufficient for Siamese Networks to recognize those images in the future
更加穩健的班級失衡:借助單次學習,每個班級只有幾張圖像就足以讓暹羅網絡將來識別這些圖像
Nice to an ensemble with the best classifier: Given that its learning mechanism is somewhat different from Classification, simple averaging of it with a Classifier can do much better than average 2 correlated Supervised models (e.g. GBM & RF classifier)
很高興與擁有最佳分類器的合奏:鑒于其學習機制與分類有所不同,因此使用分類器對它進行簡單平均可以比平均2個相關監督模型(例如GBM和RF分類器)做得更好
Learning from Semantic Similarity: Siamese focuses on learning embeddings (in the deeper layer) that place the same classes/concepts close together. Hence, can learn semantic similarity.
從語義相似性中學習:暹羅語專注于學習嵌入(在更深的層次中),這些嵌入將相同的類/概念放在一起。 因此,可以學習語義相似性 。
The downsides of the Siamese Networks can be,
暹羅網絡的缺點可能是,
Needs more training time than normal networks: Since Siamese Networks involves quadratic pairs to learn from (to see all information available) it is slower than normal classification type of learning(pointwise learning)
比普通網絡需要更多的培訓時間:由于暹羅網絡涉及二次學習(以查看所有可用信息),因此比正常分類學習(逐點學習)慢
Doesn’t output probabilities: Since training involves pairwise learning, it won’t output the probabilities of the prediction, but the distance from each class
不輸出概率:由于訓練涉及成對學習,因此它不會輸出預測的概率,但會輸出每個類的距離
暹羅網絡中使用的損耗函數: (Loss functions used in Siamese Networks:)
Contrastive Loss, Image created by Author對比損失,作者創作的圖像Since training of Siamese networks involves pairwise learning usual, Cross entropy loss cannot be used in this case, mainly two loss functions are mainly used in training these Siamese networks, they are
由于暹羅網絡的訓練通常涉及成對學習,因此在這種情況下不能使用交叉熵損失,主要是在訓練這些暹羅網絡時主要使用兩個損失函數,它們是
Triplet loss is a loss function where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized.
三重損失是一種損失函數,其中將基線(錨定)輸入與正(真實)輸入和負(虛假)輸入進行比較。 從基線(錨)輸入到正(真實)輸入的距離最小,并且從基線(錨)輸入到負(虛假)輸入的距離最大。
In the above equation, alpha is a margin term used to “stretch” the distance differences between similar and dissimilar pairs in the triplet, fa, fa, fn are the feature embeddings for the anchor, positive and negative images.
在上述公式中,alpha是用于“拉伸”三元組中相似和不相似對之間的距離差異的余量項,fa,fa,fn是錨點,正像和負像的特征嵌入。
During the training process, an image triplet (anchor image, negative image, positive image)(anchor image, negative image, positive image) is fed into the model as a single sample. The idea behind this is that distance between the anchor and positive images should be smaller than that between the anchor and negative images.
在訓練過程中,將三元組圖像(錨圖像,負圖像,正圖像)(錨圖像,負圖像,正圖像)作為單個樣本輸入到模型中。 這背后的想法是錨點和正像之間的距離應小于錨點和負像之間的距離。
Contrastive Loss: is a popular loss function used highly nowadays, It is a distance-based loss as opposed to more conventional error-prediction losses. This loss is used to learn embeddings in which two similar points have a low Euclidean distance and two dissimilar points have a large Euclidean distance.
對比損失 :是當今流行的損失函數,它是基于距離的損失 ,而不是傳統的誤差預測損失 。 該損失用于學習其中兩個相似點的歐氏距離較小而兩個不相似點的歐氏距離較大的嵌入。
And we defined Dw which is just the Euclidean distance as :
我們將Dw等于歐幾里德距離定義為:
Gw is the output of our network for one image.
Gw是我們的網絡針對一幅圖像的輸出。
使用暹羅網絡進行簽名驗證: (Signature verification with Siamese Networks:)
Siamese Network for Signature Verification, Image created by Author暹羅簽名驗證網絡,作者創建的圖像As Siamese networks are mostly used in verification systems such as face recognition, signature verification, etc…, Let’s implement a signature verification system using Siamese neural networks on Pytorch
由于暹羅網絡主要用于面部識別,簽名驗證等驗證系統中,因此讓我們在Pytorch上使用暹羅神經網絡來實現簽名驗證系統
數據集和預處理數據集: (Dataset and Preprocessing the Dataset:)
Signatures in ICDAR dataset, Image created by AuthorICDAR數據集中的簽名,作者創建的圖像We are going to use the ICDAR 2011 dataset which consists of the signatures of the dutch users both genuine and fraud, and the dataset itself is separated as train and folders, inside each folder, it consists of users folder separated as genuine and forgery, also the labels of the dataset is available as CSV files, you can download the dataset from here
我們將使用ICDAR 2011數據集,該數據集由荷蘭用戶的真實簽名和欺詐簽名組成,數據集本身分為火車和文件夾,在每個文件夾內,它由分別由真實和偽造的用戶文件夾組成,數據集的標簽以CSV文件形式提供,您可以從此處下載數據集
Now to fed this raw data into our neural network, we have to turn all the images into tensors and add the labels from the CSV files to the images, to do this we can use the custom dataset class from Pytorch, here is how our full code will look like
現在要將這些原始數據輸入到我們的神經網絡中,我們必須將所有圖像轉換為張量,然后將CSV文件中的標簽添加到圖像中,為此,我們可以使用Pytorch中的自定義數據集類,以下是完整的代碼看起來像
Now after preprocessing the dataset, in PyTorch we have to load the dataset using Dataloader class, we will use the transforms function to reduce the image size into 105 pixels of height and width for computational purposes
現在,在對數據集進行預處理之后,在PyTorch中,我們必須使用Dataloader類加載數據集,我們將使用transforms函數將圖像大小縮小為高度和寬度的105個像素,以進行計算
神經網絡架構: (Neural Network Architecture:)
Now let’s create a neural network in Pytorch, we will use the neural network architecture which will be similar, as described in the Signet paper
現在讓我們在Pytorch中創建一個神經網絡,我們將使用類似于Signet論文所述的神經網絡架構。
In the above code, we have created our network as follows, The first convolutional layers filter the 105*105 input signature image with 96 kernels of size 11 with a stride of 1 pixel. The second convolutional layer takes as input the(response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5. The third and fourth convolutional layers are connected to one another without any intervention of pooling or normalization of layers. The third layer has 384 kernels of size 3 connected to the (normalized, pooled, and dropout) output of the second convolutional layer. The fourth convolutional layer has 256 kernels of size 3 This leads to the neural network learning fewer lower level features for smaller receptive fields and more features for higher-level or more abstract features. The first fully connected layer has 1024 neurons, whereas the second fully connected layer has 128 neurons. This indicates that the highest learned feature vector from each side of SigNet has a dimension equal to 128, so where is the other network?
在上面的代碼中,我們按如下方式創建了我們的網絡:第一個卷積層使用96個大小為11的內核(跨度為1個像素)過濾105 * 105輸入簽名圖像。 第二個卷積層將第一個卷積層的(響應歸一化和池化)輸出作為輸入,并使用256個大小為5的內核對其進行過濾。第三和第四個卷積層彼此連接,而無需任何池化或歸一化干預層。 第三層具有384個大小為3的內核,這些內核連接到第二個卷積層的(標準化,合并和丟失)輸出。 第四卷積層具有大小為3的256個內核。這導致神經網絡針對較小的接收場學習較少的較低層特征,而對于較高層或更多抽象特征學習更多特征。 第一完全連接層具有1024個神經元,而第二完全連接層具有128個神經元。 這表明從SigNet的每個側面學習的最高特征向量的維數等于128,那么另一個網絡在哪里?
Since the weights are constrained to be identical for both networks, we use one model and feed it two images in succession. After that, we calculate the loss value using both the images and then backpropagate. This saves a lot of memory and also computational efficiency.
由于兩個網絡的權重均被限制為相同,因此我們使用一個模型,并連續為其提供兩個圖像。 之后,我們同時使用圖像和反向傳播來計算損耗值。 這樣可以節省大量內存并節省計算效率。
損失函數: (Loss Function:)
For this task, we will use Contrastive Loss, which learns embeddings in which two similar points have a low Euclidean distance and two dissimilar points have a large Euclidean distance, In Pytorch the implementation of Contrastive Loss will be as follows,
對于此任務,我們將使用“對比損失”,它學習了兩個相似點的歐幾里得距離較小而兩個不同點的歐幾里德距離較大的嵌入,在Pytorch中,對比損失的實現如下:
培訓網絡: (Training the Network:)
The training process of a Siamese network is as follows:
暹羅語網絡的培訓過程如下:
- Initialize the network, loss function, and Optimizer(we will be using Adam for this project) 初始化網絡,損失函數和優化器(我們將在項目中使用Adam)
- Pass the first image of the image pair through the network. 將圖像對的第一張圖像通過網絡。
- Pass the second image of the image pair through the network. 將圖像對的第二個圖像通過網絡。
- Calculate the loss using the outputs from the first and second images. 使用第一張圖片和第二張圖片的輸出計算損耗。
- Back propagate the loss to calculate the gradients of our model. 反向傳播損失以計算模型的梯度。
- Update the weights using an optimizer 使用優化器更新權重
- Save the model 保存模型
The model was trained for 20 epochs on google colab for an hour, the graph of the loss over time is shown below.
該模型在Google colab上訓練了20個小時,歷時一個小時,其損失隨時間變化的圖表如下所示。
Graph of loss over time時間損失圖測試模型: (Testing the model:)
Now let’s test our signature verification system on the test dataset,
現在讓我們在測試數據集上測試我們的簽名驗證系統,
- Load the test dataset using DataLoader class from Pytorch 使用來自Pytorch的DataLoader類加載測試數據集
- Pass the image pairs and the labels 傳遞圖像對和標簽
- Find the euclidean distance between the images 找出圖像之間的歐式距離
- Based on the euclidean distance print the output 基于歐氏距離打印輸出
The predictions were as follows,
預測如下
結論: (Conclusion:)
In this article, we discussed how Siamese networks are different from normal deep learning networks and implemented a Signature verification system using Siamese networks, you can find the entire code here
在本文中,我們討論了暹羅網絡與普通深度學習網絡的區別,并使用暹羅網絡實現了簽名驗證系統,您可以在此處找到完整的代碼
翻譯自: https://towardsdatascience.com/a-friendly-introduction-to-siamese-networks-85ab17522942
暹羅網絡目標跟蹤
總結
以上是生活随笔為你收集整理的暹罗网络目标跟踪_暹罗网络的友好介绍的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: realtek高清晰音频管理器没有声音设
- 下一篇: U盘格式化不了怎么办?