立即学习AI:03-使用卷积神经网络进行马铃薯分类
今天學(xué)習(xí)AI (LEARN AI TODAY)
This is the 3rd story in the Learn AI Today series! These stories, or at least the first few, are based on a series of Jupyter notebooks I’ve created while studying/learning PyTorch and Deep Learning. I hope you find them as useful as I did!
這是《 今日學(xué)習(xí)AI》中的第三個(gè)故事 系列! 這些故事,或者至少是前幾篇小說(shuō),是基于我在學(xué)習(xí)/學(xué)習(xí)PyTorch和Deep Learning時(shí)創(chuàng)建的一系列Jupyter筆記本的 。 希望您發(fā)現(xiàn)它們和我一樣有用!
If you have not already, make sure to check the previous story!
如果您還沒(méi)有,請(qǐng)確保檢查以前的故事!
您將從這個(gè)故事中學(xué)到什么: (What you will learn in this story:)
- Potatoes Are Not All the Same 土豆不是都一樣
- Using Kaggle Datasets 使用Kaggle數(shù)據(jù)集
- How Convolutional Neural Networks Work 卷積神經(jīng)網(wǎng)絡(luò)如何工作
- Using fastai2 to Make Your Life Easier 使用fastai2讓您的生活更輕松
1. Kaggle數(shù)據(jù)集 (1. Kaggle Datasets)
Kaggle Datasets page is a good place to start if you want to find a public dataset. There are almost 50 thousand datasets on Kaggle, a number growing every day as users create and upload new datasets to share with the world.
如果您要查找公共數(shù)據(jù)集,則Kaggle數(shù)據(jù)集頁(yè)面是一個(gè)不錯(cuò)的起點(diǎn)。 Kaggle上有近5萬(wàn)個(gè)數(shù)據(jù)集 ,隨著用戶創(chuàng)建和上傳新數(shù)據(jù)集以與世界共享,這一數(shù)據(jù)每天都在增長(zhǎng)。
After having this idea of creating a Potato Classifier for this lesson, I quickly found this dataset that contains 4 classes of potatoes and also a lot of other fruits and vegetables.
有這種想法創(chuàng)造這一課土豆分類之后,我很快就發(fā)現(xiàn)這個(gè)數(shù)據(jù)集,其中包含4類土豆,也有很多其他的水果和蔬菜。
fruits 360 dataset.水果360數(shù)據(jù)集中的圖像樣本。2.卷積神經(jīng)網(wǎng)絡(luò)(CNN) (2. Convolutional Neural Networks (CNNs))
The building blocks for computer vision are the Convolutional Neural Networks. These networks usually combine several layers of kernel convolution operations and downscaling.
卷積神經(jīng)網(wǎng)絡(luò)是計(jì)算機(jī)視覺(jué)的基礎(chǔ)。 這些網(wǎng)絡(luò)通常結(jié)合了幾層內(nèi)核卷積運(yùn)算和縮減規(guī)模。
The animation below is a great visualization of the kernel convolution operations. The kernel, which is a small matrix, usually 3x3, moves over the entire image. Instead of calling it an image let’s refer to it as the input feature map to be more general.
下面的動(dòng)畫(huà)很好地展示了內(nèi)核卷積操作。 內(nèi)核是一個(gè)很小的矩陣,通常為3x3,在整個(gè)圖像上移動(dòng)。 與其將其稱為圖像,不如將其稱為輸入要素圖,以使其更為通用。
Theano documentation.Theano文檔中的卷積示例。At each step, the values of the kernel 3x3 matrix are multiplied elementwise to the corresponding values of the input feature map (blue matrix in the animation above) and the sum of those 9 products is the value for the output, resulting on the green matrix in the animation. The numbers in the kernel are parameters of the model to be learned. That way the model can learn to identify spatial patterns that are the basis of computer vision. By having multiple layers and gradually downscaling the images, the patterns learned by each convolutional layer are more and more complex. To get a deeper intuition of CNNs I recommend this story by Irhum Shafkat.
在每個(gè)步驟中,將內(nèi)核3x3矩陣的值逐元素地乘以輸入特征圖的相應(yīng)值(上面的動(dòng)畫(huà)中的藍(lán)色矩陣),并且這9個(gè)乘積之和是輸出的值,產(chǎn)生綠色矩陣在動(dòng)畫(huà)中。 內(nèi)核中的數(shù)字是要學(xué)習(xí)的模型的參數(shù)。 這樣,模型就可以學(xué)習(xí)識(shí)別作為計(jì)算機(jī)視覺(jué)基礎(chǔ)的空間模式。 通過(guò)具有多層并逐漸縮小圖像的尺寸, 每個(gè)卷積層學(xué)習(xí)的模式變得越來(lái)越復(fù)雜。 為了更深入地了解CNN,我推薦Irhum Shafkat 講的這個(gè)故事 。
The idea of CNNs has been around since the 80s but it started to gain momentum in 2012 when the winners of ImageNet competition used such approach and ‘crushed’ the competition. Their paper describing the solution has the following abstract:
CNN的想法自80年代開(kāi)始就出現(xiàn)了,但是在2012年ImageNet競(jìng)賽的獲勝者使用這種方法并“壓垮”了比賽時(shí),它就開(kāi)始流行。 他們描述解決方案的論文摘要如下:
“We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.”
“我們訓(xùn)練了一個(gè)大型的深度卷積神經(jīng)網(wǎng)絡(luò),將ImageNet LSVRC-2010競(jìng)賽中的120萬(wàn)個(gè)高分辨率圖像分類為1000個(gè)不同的類別。 在測(cè)試數(shù)據(jù)上,我們實(shí)現(xiàn)了前1個(gè)和前5個(gè)錯(cuò)誤率分別為37.5%和17.0%,這比以前的最新技術(shù)要好得多。 該神經(jīng)網(wǎng)絡(luò)具有6000萬(wàn)個(gè)參數(shù)和65萬(wàn)個(gè)神經(jīng)元,它由五個(gè)卷積層組成,其中一些層是最大卷積層,以及三個(gè)完全連接的層,最終具有1000路softmax。 為了使訓(xùn)練更快,我們使用了非飽和神經(jīng)元和卷積運(yùn)算的非常高效的GPU實(shí)現(xiàn)。 為了減少全連接層的過(guò)度擬合,我們采用了一種新近開(kāi)發(fā)的正則化方法,稱為“丟包”,這種方法被證明非常有效。 我們還在ILSVRC-2012競(jìng)賽中輸入了該模型的變體,并獲得了最高的前5名測(cè)試錯(cuò)誤率15.3%,而第二名則達(dá)到了26.2%。”
A top 5 error rate of 15.3% compared to 26.2% for the second-best entry is a huge breakthrough. Fast forward to today and the current top result for top 5 accuracy is 98.7% (error rate 1.3%).
前五名的錯(cuò)誤率為15.3%,而第二名的錯(cuò)誤率為26.2%,這是一個(gè)巨大的突破。 快進(jìn)到今天, 當(dāng)前排名 前5位的準(zhǔn)確性 最高的結(jié)果 是98.7%(錯(cuò)誤率1.3%)。
Let’s now code a very simple CNN with just two convolutional layers and use it to create a potato classifier!
現(xiàn)在,讓我們編寫(xiě)一個(gè)只有兩個(gè)卷積層的非常簡(jiǎn)單的CNN,并使用它來(lái)創(chuàng)建馬鈴薯分類器!
The first convolutional layer nn.Conv2d has 3 input channels and 32 output channels with a kernel size of 3x3. The number of input channels of 3 corresponds to the RGB image channels. The output channels number is just a choice.
nn.Conv2d積層nn.Conv2d具有3個(gè)輸入通道和32個(gè)輸出通道 , 內(nèi)核大小為3x3 。 輸入通道數(shù)為3時(shí),對(duì)應(yīng)于RGB圖像通道。 輸出通道號(hào)只是一個(gè)選擇。
The second convolutional layer has 32 input channels to match the number of outputs channels of the previous layer and 64 output channels.
第二個(gè)卷積層具有32個(gè)輸入通道以匹配上一層的輸出通道數(shù)和64個(gè)輸出通道 。
Notice in lines 9 and 10 that after the convolutional layers I apply a F.max_pool2d and a F.relu . The max-pooling operation will simply downscale the image by selecting the maximum value of each 2x2 pixels. That way the resulting image has half the size. The ReLU is a non-linear activation function, as I mentioned in lesson 1 of this series.
注意,在第9和10行中,在卷積層之后,我應(yīng)用了F.max_pool2d和F.relu 。 通過(guò)選擇每個(gè)2x2像素的最大值, 最大合并操作將簡(jiǎn)單地縮小圖像的比例。 這樣,生成的圖像只有一半大小。 正如我在本系列的第1課中提到的那樣, ReLU是一種非線性激活函數(shù)。
After two convolutions and max poolings with the size of 2, the resulting feature map has 1/4 the size of the original image. I will be working with 64x64 images therefore this will result in a feature map of 16x16. I could add more of these convolutional layers but at some point when the feature map is already quite small, usually, the next step is to use an Average Pooling to reduce the feature map to 1x1 simply by computing the average. Notice that as we have 64 channels the resulting tensor will have a shape of (batch-size, 64, 1, 1) that then is reshaped to to (batch-size, 64) before applying the final linear layer.
經(jīng)過(guò)兩次卷積和大小為2的最大池化后,所得特征圖的大小為原始圖像的1/4。 我將使用64x64圖像,因此這將導(dǎo)致16x16的特征圖。 我可以添加更多的這些卷積層,但是在某個(gè)時(shí)候特征圖已經(jīng)很小的時(shí)候,通常,下一步是使用平均池通過(guò)簡(jiǎn)單地計(jì)算平均值將特征圖減少到1x1。 請(qǐng)注意,由于我們具有64個(gè)通道,因此生成的張量將具有(batch-size, 64, 1, 1) 64,1,1 (batch-size, 64, 1, 1)的形狀,然后在應(yīng)用最終線性層之前將其重塑為(batch-size, 64) 。
The final linear layer has an input size of 64 and an output size equal to the number of classes to predict. For this case, it will be 4 types of potatoes.
最終的線性層的輸入大小為64, 輸出大小等于要預(yù)測(cè)的類數(shù) 。 對(duì)于這種情況,將是4種土豆。
Note: A good way to understand how everything work is to use the Python debugger. You can import pdb and include pdb.set_trace() right in the forward method. Then you can move step by step and check the shapes of each layer to give you a better intuition or help to debug problems.
注意:了解一切工作原理的一個(gè)好方法是使用Python調(diào)試器 。 您可以import pdb并在forward方法中包含pdb.set_trace() 。 然后,您可以逐步移動(dòng)并檢查每一層的形狀,以使您擁有更好的直覺(jué)或幫助調(diào)試問(wèn)題。
3.使用fastai2使您的生活更輕松 (3. Using fastai2 to Make Your Life Easier)
It doesn’t worth wasting your time coding every step of the deep learning pipeline when there are tools that can make your life easier. That’s why in this story I’ll use fastai2 library to do most of the work. Nevertheless, I will use the basic CNN model defined in the previous section. Note that fastai2 uses PyTorch and makes customization of every step easy, making it useful for both beginner and advanced deep learning practitioners and researchers.
如果有可以使您的生活更輕松的工具,那么浪費(fèi)您的時(shí)間在深度學(xué)習(xí)管道的每個(gè)步驟上進(jìn)行編碼是不值得的。 這就是為什么在本故事中,我將使用fastai2庫(kù)來(lái)完成大部分工作。 不過(guò),我將使用上一節(jié)中定義的基本CNN模型。 請(qǐng)注意,fastai2使用PyTorch并使每個(gè)步驟的自定義變得容易, 這對(duì)于初學(xué)者和高級(jí)深度學(xué)習(xí)從業(yè)人員以及研究人員都非常有用。
The following 12 lines of code are the entire Deep Learning pipeline in fastai2, using the BasicCNN defined in the previous section! You can find here the notebook with all the code for this lesson.
以下12行代碼是使用上一節(jié)中定義的BasicCNN組成的fastai2整個(gè)深度學(xué)習(xí)管道! 您可以在此處找到帶有本課程所有代碼的筆記本。
Lines 1 — 6: The fastai DataBlock is defined. I covered the topic of fastai DataBlock in this and this stories. The ImageBlock and CategoryBlock indicate that the dataloaders will have an input of image type and a target of categorical type.
第1至6行:定義了fastai DataBlock 。 我渾身fastai數(shù)據(jù)塊的話題這個(gè)和這個(gè)故事。 ImageBlock和CategoryBlock指示數(shù)據(jù)加載器將具有圖像類型的輸入和類別類型的目標(biāo)。
Lines 2 and 3: The get_x and get_y are the arguments where the function to process the input and targets is given. In this case, I will be reading from a pandas dataframe the columns ‘file’ (with the path to each image file) and ‘id’ with the type of potato.
第2行和第3行: get_x和get_y是給出處理輸入和目標(biāo)的函數(shù)的參數(shù)。 在這種情況下,我將從熊貓數(shù)據(jù)框中讀取“文件”列(帶有每個(gè)圖像文件的路徑)和“ id”列以及馬鈴薯的類型。
Line 4: The splitter is the argument where you can tell how to split the data into train and validation sets. Here I used RandomSplitter that by default selects 20% of the data randomly to create the validation set.
第4行: splitter是一個(gè)參數(shù),您可以在其中告訴如何將數(shù)據(jù)拆分為訓(xùn)練集和驗(yàn)證集。 在這里,我使用RandomSplitter ,默認(rèn)情況下會(huì)隨機(jī)選擇20%的數(shù)據(jù)來(lái)創(chuàng)建驗(yàn)證集。
Line 5: A transformation is added to resize the images to 64x64.
第5行:添加了一種轉(zhuǎn)換,以將圖像調(diào)整為64x64。
Line 6: Normalization and image augmentations are included. Notice that I’m using the default augmentations. One nice thing about fastai is that most of the time you can use the default and it works. This is very good for learning because you don’t need to understand all the details before you start doing interesting work.
第6行:包括標(biāo)準(zhǔn)化和圖像增強(qiáng)。 請(qǐng)注意,我正在使用默認(rèn)擴(kuò)充。 關(guān)于fastai的一件好事是,大多數(shù)時(shí)候您都可以使用默認(rèn)值并且它可以工作。 這對(duì)學(xué)習(xí)非常有好處,因?yàn)樵陂_(kāi)始有趣的工作之前,您不需要了解所有細(xì)節(jié)。
Line 8: The dataloaders object is created. (The train_df is the dataframe with file and id columns, check the full code here).
第8行:創(chuàng)建了dataloaders對(duì)象。 ( train_df是具有file和id列的數(shù)據(jù)train_df ,請(qǐng)?jiān)诖颂幉榭赐暾a)。
Line 9: Creating an instance of the BasicCNN model with a number of classes of 4 (notice that dls.c indicates the number of classes automatically).
第9行:創(chuàng)建具有4個(gè)類的類的BasicCNN模型的實(shí)例(注意dls.c自動(dòng)指示類的數(shù)量)。
Line 10: The fastai Learner object is defined. This is where you indicate the model, loss function, optimizer and validation metrics. The loss function I will use is nn.CrossEntropyLoss that as covered in the previous lesson is the first choice for classification problems with more than 2 categories.
第10行:定義了fastai 學(xué)習(xí)者對(duì)象。 您可以在此處指示模型,損失函數(shù),優(yōu)化器和驗(yàn)證指標(biāo)。 我將使用的損失函數(shù)是nn.CrossEntropyLoss ,如上一課所述,它是解決2個(gè)以上類別的分類問(wèn)題的首選。
Line 12: The model is trained for 30 epochs using a once-cycle learning rate schedule (the learning rate increases fast up to the lr_max and then gradually decreases) and a weight decay of 0.01.
第12行:使用一次周期學(xué)習(xí)率計(jì)劃(學(xué)習(xí)率快速增加到lr_max ,然后逐漸減小)并權(quán)重衰減為0.01,對(duì)模型訓(xùn)練了30個(gè)紀(jì)元。
After training for 30 epochs I got a validation accuracy of 100% with this simple CNN model! This is what the training and validation loss looks like a train progresses:
在訓(xùn)練了30個(gè)紀(jì)元后,我使用此簡(jiǎn)單的CNN模型獲得了100%的驗(yàn)證準(zhǔn)確性! 這是訓(xùn)練和驗(yàn)證損失看起來(lái)像火車(chē)前進(jìn)的樣子:
Train and validation loss evolution over the training. Image by the author.在培訓(xùn)過(guò)程中進(jìn)行培訓(xùn)和驗(yàn)證損失的演變。 圖片由作者提供。And that’s it! If you followed along with the code you can now identify among 4 types of potatoes very accurately. And most importantly, nothing in this example is specific about potatoes! You can apply a similar approach to virtually anything you want to classify!
就是這樣! 如果遵循了代碼,您現(xiàn)在可以非常準(zhǔn)確地在4種土豆中進(jìn)行識(shí)別。 最重要的是,在此示例中,沒(méi)有什么是土豆特有的! 您可以對(duì)幾乎所有您想要分類的東西都應(yīng)用類似的方法!
家庭作業(yè) (Homework)
I can show you a thousand examples but you will learn the most if you can make one or two experiments by yourself! The complete code for this story is available on this notebook.
我可以向您展示一千個(gè)示例,但如果您自己進(jìn)行一兩個(gè)實(shí)驗(yàn),您將學(xué)到最多的知識(shí)! 有關(guān)此故事的完整代碼,請(qǐng)參閱此筆記本 。
- As in the previous lesson, try to play with the learning rate, number of epochs, weight decay and the size of the model. 與上一課一樣,嘗試發(fā)揮學(xué)習(xí)率,歷元數(shù),權(quán)重衰減和模型的大小。
Instead of the BasicCNN model, try using a Resnet34 pretrained on ImageNet (take a look at fastai cnn_learner ) How do results compare? You can try larger image sizes and activate the GPU on the Kaggle kernel to make training faster! (Kaggle provides you with 30h/week of GPU usage for free)
代替BasicCNN模型,嘗試使用在ImageNet上經(jīng)過(guò)預(yù)訓(xùn)練的Resnet34(看看fastai cnn_learner )結(jié)果如何比較? 您可以嘗試更大的圖像尺寸,并在Kaggle內(nèi)核上激活GPU,以加快訓(xùn)練速度! (Kaggle免費(fèi)為您提供每周30小時(shí)的GPU使用量)
- Train now the model using all fruits and vegetables in the dataset and take a look of the results. The dataset also includes a test set that you can use to further test the trained model! 現(xiàn)在使用數(shù)據(jù)集中的所有水果和蔬菜訓(xùn)練模型并查看結(jié)果。 數(shù)據(jù)集還包含一個(gè)測(cè)試集,可用于進(jìn)一步測(cè)試訓(xùn)練后的模型!
And as always, if you create interesting notebooks with nice animations as a result of your experiments, go ahead and share them on GitHub, Kaggle or write a Medium story!
而且,像往常一樣,如果您通過(guò)實(shí)驗(yàn)創(chuàng)建了帶有精美動(dòng)畫(huà)的有趣筆記本,請(qǐng)繼續(xù)在GitHub,Kaggle上共享它們,或撰寫(xiě)一個(gè)中型故事!
結(jié)束語(yǔ) (Final remarks)
This ends the third story in the Learn AI Today series!
到此為止,《今日學(xué)習(xí)AI》系列的第三個(gè)故事!
Please consider joining my mailing list in this link so that you won’t miss any of my upcoming stories!
請(qǐng)考慮通過(guò)此鏈接加入我的郵件列表 這樣您就不會(huì)錯(cuò)過(guò)任何我即將發(fā)表的故事!
I will also be listing the new stories at learn-ai-today.com, the page I created for this learning journey, and at this GitHub repository!
我還將在learning-ai-today.com ,為此學(xué)習(xí)旅程創(chuàng)建的頁(yè)面以及此GitHub存儲(chǔ)庫(kù)中列出新故事!
And in case you missed it before, this is the link for the Kaggle notebook with the code for this story!
萬(wàn)一您之前錯(cuò)過(guò)了它, 這是Kaggle筆記本的鏈接以及此故事的代碼 !
Feel free to give me some feedback in the comments. What did you find most useful or what could be explained better? Let me know!
請(qǐng)隨時(shí)在評(píng)論中給我一些反饋。 您覺(jué)得最有用的是什么? 讓我知道!
You can read more about my Deep Learning journey on the following stories!
您可以在以下故事中閱讀有關(guān)我的深度學(xué)習(xí)之旅的更多信息!
Thanks for reading! Have a great day!
謝謝閱讀! 祝你有美好的一天!
翻譯自: https://towardsdatascience.com/learn-ai-today-03-potato-classification-using-convolutional-neural-networks-4481222f2806
總結(jié)
以上是生活随笔為你收集整理的立即学习AI:03-使用卷积神经网络进行马铃薯分类的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: opencv 检测几何图形_使用Open
- 下一篇: 项目中对象存储(OSS、COS、OBS、