卷积网络和卷积神经网络_卷积神经网络的眼病识别
卷積網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)
關(guān)于這個(gè)項(xiàng)目 (About this project)
This project is part of the Algorithms for Massive Data course organized by the University of Milan, that I recently had the chance to attend. The task is to develop the Deep Learning model able to recognize eye diseases, from eye-fundus images using the TensorFlow library. An important requirement is to make the training process scalable, so create a data pipeline able to handle massive amounts of data points. In this article, I summarize my findings on convolutional neural networks and methods of building efficient data pipelines using the Tensorflow dataset object. Entire code with reproducible experiments is available on my Github repository: https://github.com/GrzegorzMeller/AlgorithmsForMassiveData
該項(xiàng)目是我最近有幸參加的由米蘭大學(xué)組織的“海量數(shù)據(jù)算法”課程的一部分。 任務(wù)是開發(fā)使用TensorFlow庫從眼底圖像識(shí)別眼睛疾病的深度學(xué)習(xí)模型。 一個(gè)重要的要求是使培訓(xùn)過程具有可擴(kuò)展性,因此創(chuàng)建一個(gè)能夠處理大量數(shù)據(jù)點(diǎn)的數(shù)據(jù)管道。 在本文中,我總結(jié)了有關(guān)卷積神經(jīng)網(wǎng)絡(luò)和使用Tensorflow數(shù)據(jù)集對(duì)象構(gòu)建有效數(shù)據(jù)管道的方法的發(fā)現(xiàn)。 我的Github存儲(chǔ)庫中提供了具有可重復(fù)實(shí)驗(yàn)的整個(gè)代碼: https : //github.com/GrzegorzMeller/AlgorithmsForMassiveData
介紹 (Introduction)
Early ocular disease detection is an economic and effective way to prevent blindness caused by diabetes, glaucoma, cataract, age-related macular degeneration (AMD), and many other diseases. According to World Health Organization (WHO) at present, at least 2.2 billion people around the world have vision impairments, of whom at least 1 billion have a vision impairment that could have been prevented[1]. Rapid and automatic detection of diseases is critical and urgent in reducing the ophthalmologist’s workload and prevents vision damage of patients. Computer vision and deep learning can automatically detect ocular diseases after providing high-quality medical eye fundus images. In this article, I show different experiments and approaches towards building an advanced classification model using convolutional neural networks written using the TensorFlow library.
早期眼病檢測(cè)是預(yù)防由糖尿病,青光眼,白內(nèi)障,年齡相關(guān)性黃斑變性(AMD)和許多其他疾病引起的失明的經(jīng)濟(jì)有效方法。 根據(jù)世界衛(wèi)生組織(WHO)的目前,全世界至少有22億人有視力障礙,其中至少有10億人本來可以預(yù)防[1]。 快速和自動(dòng)檢測(cè)疾病對(duì)于減輕眼科醫(yī)生的工作量并防止患者視力損害至關(guān)重要。 提供高質(zhì)量的醫(yī)學(xué)眼底圖像后,計(jì)算機(jī)視覺和深度學(xué)習(xí)可以自動(dòng)檢測(cè)眼部疾病。 在本文中,我展示了使用使用TensorFlow庫編寫的卷積神經(jīng)網(wǎng)絡(luò)構(gòu)建高級(jí)分類模型的不同實(shí)驗(yàn)和方法。
數(shù)據(jù)集 (Dataset)
Ocular Disease Intelligent Recognition (ODIR) is a structured ophthalmic database of 5,000 patients with age, color fundus photographs from left and right eyes, and doctors’ diagnostic keywords from doctors. This dataset is meant to represent the ‘‘real-life’’ set of patient information collected by Shanggong Medical Technology Co., Ltd. from different hospitals/medical centers in China. In these institutions, fundus images are captured by various cameras in the market, such as Canon, Zeiss, and Kowa, resulting in varied image resolutions. Annotations were labeled by trained human readers with quality control management[2]. They classify patients into eight labels including normal (N), diabetes (D), glaucoma (G), cataract (C), AMD (A), hypertension (H), myopia (M), and other diseases/abnormalities (O).
眼病智能識(shí)別(ODIR)是一個(gè)結(jié)構(gòu)化的眼科數(shù)據(jù)庫,包含5,000名年齡的患者,左眼和右眼的彩色眼底照片以及醫(yī)生的醫(yī)生診斷關(guān)鍵字。 該數(shù)據(jù)集旨在代表由上工醫(yī)療技術(shù)有限公司從中國(guó)不同醫(yī)院/醫(yī)療中心收集的“真實(shí)”患者信息集。 在這些機(jī)構(gòu)中,眼底圖像由市場(chǎng)上的各種相機(jī)(例如佳能,蔡司和Kowa)捕獲,從而產(chǎn)生不同的圖像分辨率。 注釋由經(jīng)過培訓(xùn)的人類讀者進(jìn)行質(zhì)量控制管理來標(biāo)記[2]。 他們將患者分為八個(gè)標(biāo)簽,包括正常(N),糖尿病(D),青光眼(G),白內(nèi)障(C),AMD(A),高血壓(H),近視(M)和其他疾病/異常(O) 。
After preliminary data exploration I found the following main challenges of the ODIR dataset:
經(jīng)過初步的數(shù)據(jù)探索,我發(fā)現(xiàn)了ODIR數(shù)據(jù)集的以下主要挑戰(zhàn):
· Highly unbalanced data. Most images are classified as normal (1140 examples), while specific diseases like for example hypertension have only 100 occurrences in the dataset.
·高度不平衡的數(shù)據(jù)。 大多數(shù)圖像被歸類為正常圖像(1140個(gè)示例),而特定疾病(例如高血壓)在數(shù)據(jù)集中僅出現(xiàn)100次。
· The dataset contains multi-label diseases because each eye can have not only one single disease but also a combination of many.
·數(shù)據(jù)集包含多標(biāo)簽疾病,因?yàn)槊恐谎劬Σ粌H可以患有一種疾病,而且可以患有多種疾病。
· Images labeled as “other diseases/abnormalities” (O) contain images associated to more than 10 different diseases stretching the variability to a greater extent.
·標(biāo)記為“其他疾病/異常”(O)的圖像包含與10多種不同疾病相關(guān)的圖像,這些圖像在更大程度上擴(kuò)展了變異性。
· Very big and different image resolutions. Most images have sizes of around 2976x2976 or 2592x1728 pixels.
·非常大且不同的圖像分辨率。 大多數(shù)圖像的大小約為2976x2976或2592x1728像素。
All these issues take a significant toll on accuracy and other metrics.
所有這些問題都會(huì)對(duì)準(zhǔn)確性和其他指標(biāo)造成重大損失。
數(shù)據(jù)預(yù)處理 (Data Pre-Processing)
Firstly, all images are resized. In the beginning, I wanted to resize images “on the fly”, using TensorFlow dataset object. Images were resized while training the model. I thought it could prevent time-consuming images resizing at once. Unfortunately, it was not a good decision, execution of one epoch could take even 15 minutes, so I created another function to resize images before creating the TensorFlow dataset object. As a result, data are resized only once and saved to a different directory, thus I could experiment with different training approaches using much faster training execution. Initially, all images were resized to 32x32 pixels size, but quickly I realized that compressing to such a low size, even though it speeds up the training process significantly, loses a lot of important image information, thus accuracy was very low. After several experiments I found that size of 250x250 pixels was the best in terms of compromising training speed and accuracy metrics, thus I kept this size on all images for all further experiments.
首先,調(diào)整所有圖像的大小。 一開始,我想使用TensorFlow數(shù)據(jù)集對(duì)象“即時(shí)”調(diào)整圖像大小。 在訓(xùn)練模型時(shí)調(diào)整圖像大小。 我認(rèn)為這可以防止耗時(shí)的圖像立即調(diào)整大小。 不幸的是,這不是一個(gè)好的決定,一個(gè)紀(jì)元的執(zhí)行甚至可能花費(fèi)15分鐘,因此我創(chuàng)建了另一個(gè)函數(shù)來調(diào)整圖像大小,然后再創(chuàng)建TensorFlow數(shù)據(jù)集對(duì)象。 結(jié)果,數(shù)據(jù)僅調(diào)整一次大小并保存到其他目錄,因此我可以使用更快的訓(xùn)練執(zhí)行速度來嘗試不同的訓(xùn)練方法。 最初,所有圖像的大小都調(diào)整為32x32像素,但是很快我意識(shí)到壓縮到這么小的尺寸,即使它可以顯著加快訓(xùn)練過程,也會(huì)丟失很多重要的圖像信息,因此準(zhǔn)確性非常低。 經(jīng)過幾次實(shí)驗(yàn),我發(fā)現(xiàn)250x250像素的尺寸在降低訓(xùn)練速度和準(zhǔn)確性指標(biāo)方面是最好的,因此我將所有圖片的尺寸都保留下來,以進(jìn)行進(jìn)一步的實(shí)驗(yàn)。
Secondly, images are labeled. There is a problem with images annotations in the data.csv file because the labels relate to both eyes (left and right) at once whereas each eye can have a different disease. For example, if the left eye has a cataract and right eye has normal fundus, the label would be a cataract, not indicating a diagnosis of the right eye. Fortunately, the diagnostic keywords relate to a single eye. Dataset was created in a way to provide to the model as input both left and right eye images and return overall (for both eyes) cumulated diagnosis, neglecting the fact that one eye can be healthy. In my opinion, it does not make sense from a perspective of a final user of such a model, and it is better to get predictions separately for each eye, to know for example which eye should be treated. So, I enriched the dataset by creating a mapping between the diagnostic keywords to disease labels. This way, each eye is assigned to a proper label. Fragment of this mapping, in the form of a dictionary, is presented in the Fig. 1. Label information is added by renaming image names, and more specifically, by adding to the image file name one or more letters corresponding to the specific diseases. I applied this solution because this way I do not need to store any additional data frame with all labels. Renaming files is a very fast operation and in the official TensorFlow documentation, TensorFlow datasets are created simply from files, and label information is retrieved from the file name[3]. Moreover, some images that had annotations not related to the specific disease itself, but to the low quality of the image, like “l(fā)ens dust” or “optic disk photographically invisible” are removed from the dataset as they do not play a decisive role in determining patient’s disease.
其次,圖像被標(biāo)記。 data.csv文件中的圖像注釋存在問題,因?yàn)闃?biāo)簽一次涉及到兩只眼睛(左右),而每只眼睛可能患有不同的疾病。 例如,如果左眼患有白內(nèi)障而右眼具有正常眼底,則標(biāo)簽將是白內(nèi)障,并不表示對(duì)右眼的診斷。 幸運(yùn)的是,診斷關(guān)鍵字與一只眼睛有關(guān)。 數(shù)據(jù)集的創(chuàng)建方式是向模型提供輸入作為左眼和右眼圖像,然后返回整體(對(duì)于雙眼)累積的診斷,而忽略了一只眼睛可以健康的事實(shí)。 我認(rèn)為,從這種模型的最終用戶的角度來看,這是沒有意義的,最好分別為每只眼睛進(jìn)行預(yù)測(cè),以了解例如應(yīng)治療哪只眼睛。 因此,我通過在診斷關(guān)鍵字與疾病標(biāo)簽之間創(chuàng)建映射來豐富了數(shù)據(jù)集。 這樣,每只眼睛都被分配了一個(gè)適當(dāng)?shù)臉?biāo)簽。 該映射的片段以字典的形式呈現(xiàn)在圖1中。通過重命名圖像名稱來添加標(biāo)簽信息,更具體地說,是通過在圖像文件名稱中添加一個(gè)或多個(gè)對(duì)應(yīng)于特定疾病的字母來添加標(biāo)簽信息。 我之所以應(yīng)用此解決方案,是因?yàn)檫@樣一來,我不需要存儲(chǔ)帶有所有標(biāo)簽的任何其他數(shù)據(jù)框。 重命名文件是一項(xiàng)非常快速的操作,在TensorFlow官方文檔中,僅從文件創(chuàng)建TensorFlow數(shù)據(jù)集,并從文件名中檢索標(biāo)簽信息[3]。 此外,一些注釋與特定疾病本身無關(guān),但與圖像質(zhì)量低下有關(guān)的圖像(例如“鏡頭塵”或“照相上看不見的光盤”)會(huì)從數(shù)據(jù)集中刪除,因?yàn)樗鼈冊(cè)趫D像處理中不起決定性作用。確定患者的疾病。
Fig. 1: Fragment of dictionary mapping specific diagnostic keyword with a disease label圖1:字典片段映射帶有疾病標(biāo)簽的特定診斷關(guān)鍵字Thirdly, the validation set is created by randomly selecting 30% of all available images. I chose 30% because this dataset is relatively small (only 7000 images in total), but I wanted to make my validation representative enough, not to have a bias when evaluating model, related to the fact, that many image variants or classes could not have their representation in the validation set. The ODIR dataset provides testing images, but unfortunately, no labeling information is provided to them in the data.csv file, thus I could not use available testing images to evaluate the model.
第三,通過隨機(jī)選擇所有可用圖像的30%來創(chuàng)建驗(yàn)證集。 我選擇30%是因?yàn)樵摂?shù)據(jù)集相對(duì)較小(總共僅7000張圖像),但是我想使我的驗(yàn)證具有足夠的代表性,而在評(píng)估模型時(shí)不要有偏見,這與事實(shí)有關(guān),即許多圖像變體或類不能在驗(yàn)證集中具有它們的表示形式。 ODIR數(shù)據(jù)集提供測(cè)試圖像,但是不幸的是,在data.csv文件中沒有為它們提供標(biāo)簽信息,因此我無法使用可用的測(cè)試圖像來評(píng)估模型。
Next, data augmentation on minority classes was applied on the training set to balance the dataset. Random zoom, random rotation, flip left-right, flip top-bottom were applied. In the beginning, I used the TensorFlow dataset object for applying data augmentation “on the fly” while training the model[4] in order to keep my solution as scalable as possible. Unfortunately, it lacks many features like random rotation, therefore I performed data augmentation before creating the TensorFlow dataset object using other libraries for image processing like OpenCV. In the beginning, I also considered enhancing all images by applying contrast-limited adaptive histogram equalization (CLAHE) in order to increase the visibility of local details of an image, but since it was adding a lot of extra noise to the images (especially to the background, which originally is black) I decided not to follow that direction. Examples of data augmentation using my function written using PIL and OpenCV libraries is presented in Fig. 2.
接下來,將少數(shù)群體類別的數(shù)據(jù)增強(qiáng)應(yīng)用于訓(xùn)練集以平衡數(shù)據(jù)集。 應(yīng)用了隨機(jī)縮放,隨機(jī)旋轉(zhuǎn),左右翻轉(zhuǎn),上下翻轉(zhuǎn)。 最初,我在訓(xùn)練模型時(shí)使用TensorFlow數(shù)據(jù)集對(duì)象“動(dòng)態(tài)”應(yīng)用數(shù)據(jù)增強(qiáng)[4],以使我的解決方案盡可能地可擴(kuò)展。 不幸的是,它缺少許多功能,例如隨機(jī)旋轉(zhuǎn),因此我在使用其他庫(如OpenCV)創(chuàng)建TensorFlow數(shù)據(jù)集對(duì)象之前執(zhí)行了數(shù)據(jù)擴(kuò)充。 剛開始時(shí),我還考慮過通過應(yīng)用對(duì)比度限制的自適應(yīng)直方圖均衡化(CLAHE)來增強(qiáng)所有圖像,以提高圖像局部細(xì)節(jié)的可見度,但是由于這樣做會(huì)給圖像增加很多額外的噪音(尤其是背景(本來是黑色的))我決定不遵循這個(gè)方向。 使用PIL和OpenCV庫編寫的函數(shù)進(jìn)行數(shù)據(jù)擴(kuò)充的示例如圖2所示。
Fig. 2: Exemplary data augmentation results圖2:示例性數(shù)據(jù)擴(kuò)充結(jié)果Finally, the TensorFlow dataset object is created. It is developed very similarly to the one presented in official TensorFlow documentation for loading images[5]. Since the library is complicated, and not easy to use for TensorFlow beginners, I would like to share here a summary of my findings on building scalable and fast input pipelines. The tf.data API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system. The tf.data API introduces a tf.data.Dataset abstraction that represents a sequence of elements, in which each element consists of one or more components. For example, in my image pipeline, an element is a single training example, with a pair of tensor components representing the image and its label[6]. With the idea of creating mini-batches, TensorFlow introduces the so-called iterative learning process which is feeding to the model some portion of data (not entire dataset), training, and repeating with another portion, which are called batches. Batch size defines how many examples will be extracted at each training step. After each step, weights are updated. I selected batch size equal to 32, in order to avoid the overfitting problem. With small batch size, weights keep updating regularly and often. The downside of having a small batch size is that training takes much longer than with the bigger size. One important element of tf.data is the ability of the shuffling dataset. In shuffling, the dataset fills a buffer with elements, then randomly samples elements from this buffer, replacing the selected elements with new elements[7]. It prevents situations when images of the same class will be repetitively filled to the batch, which is not beneficial for training the model.
最后,創(chuàng)建TensorFlow數(shù)據(jù)集對(duì)象。 它的開發(fā)與TensorFlow官方文檔中介紹的用于加載圖像的開發(fā)非常相似[5]。 由于該庫很復(fù)雜,而且對(duì)于TensorFlow初學(xué)者來說不容易使用,因此我想在此分享我在構(gòu)建可擴(kuò)展和快速輸入管道方面的發(fā)現(xiàn)的摘要。 使用tf.data API,您可以從簡(jiǎn)單,可重用的片段中構(gòu)建復(fù)雜的輸入管道。 例如,圖像模型的管道可能會(huì)聚合分布式文件系統(tǒng)中文件中的數(shù)據(jù)。 tf.data API引入了tf.data.Dataset抽象,它表示一系列元素,其中每個(gè)元素由一個(gè)或多個(gè)組件組成。 例如,在我的圖像管道中,一個(gè)元素是一個(gè)單獨(dú)的訓(xùn)練示例,其中有一對(duì)張量分量表示圖像及其標(biāo)簽[6]。 TensorFlow基于創(chuàng)建迷你批的想法,引入了所謂的迭代學(xué)習(xí)過程,該過程將部分?jǐn)?shù)據(jù)(不是整個(gè)數(shù)據(jù)集)饋入模型,進(jìn)行訓(xùn)練并與另一部分重復(fù)進(jìn)行,這稱為批處理。 批次大小定義了每個(gè)訓(xùn)練步驟將提取多少個(gè)示例。 每一步之后,權(quán)重都會(huì)更新。 為了避免過度擬合的問題,我選擇了等于32的批量大小。 批量較小時(shí),重量會(huì)定期且經(jīng)常更新。 批量較小的缺點(diǎn)是培訓(xùn)所需的時(shí)間比批量較大的要長(zhǎng)得多。 tf.data的一個(gè)重要元素是改組數(shù)據(jù)集的功能。 在改組中,數(shù)據(jù)集用元素填充緩沖區(qū),然后從該緩沖區(qū)中隨機(jī)采樣元素,用新元素替換所選元素[7]。 這樣可以防止將相同類別的圖像重復(fù)填充到批次中的情況,這不利于訓(xùn)練模型。
建立卷積神經(jīng)網(wǎng)絡(luò) (Building Convolutional Neural Network)
In deep learning, a convolutional neural network (CNN) is a class of deep neural networks, most commonly applied to analyzing visual imagery[8]. Input layer takes 250x250 RGB images. The first 2D convolution layer shifts over the input image using a window of the size of 5x5 pixels to extract features and save them on a multi-dimensional array, in my example number of filters for the first layer equals 32, so to (250, 250, 32) size cube.
在深度學(xué)習(xí)中,卷積神經(jīng)網(wǎng)絡(luò)(CNN)是一類深度神經(jīng)網(wǎng)絡(luò),最常用于分析視覺圖像[8]。 輸入層可拍攝250x250 RGB圖像。 第一2D卷積層使用5x5像素大小的窗口在輸入圖像上移動(dòng)以提取特征并將其保存在多維數(shù)組中,在我的示例中,第一層的過濾器數(shù)量等于32,因此等于(250, 250,32)尺寸的立方體。
After each convolution layer, a rectified linear activation function (ReLU) is applied. Activation has the authority to decide if neuron needs to be activated or not measuring the weighted sum. ReLU returns the value provided as input directly, or the value 0.0 if the input is 0.0 or less. Because rectified linear units are nearly linear, they preserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make the linear model generalize well[9].
在每個(gè)卷積層之后,都應(yīng)用了整流線性激活函數(shù)(ReLU)。 激活有權(quán)決定是否需要激活神經(jīng)元或不測(cè)量加權(quán)和。 ReLU直接返回作為輸入提供的值,如果輸入等于或小于0.0,則返回值0.0。 由于整流線性單位幾乎是線性的,因此它們保留了許多特性,這些特性使線性模型易于使用基于梯度的方法進(jìn)行優(yōu)化。 它們還保留了許多使線性模型泛化的屬性[9]。
To progressively reduce the spatial size of the input representation and minimize the number of parameters and computation in the network max-pooling layer is added. In short, for each region represented by the filter of a specific size, in my example it is (5, 5), it will take the max value of that region and create a new output matrix where each element is the max of the region in the original input.
為了逐步減小輸入表示的空間大小并最大程度地減少參數(shù)的數(shù)量,并添加了網(wǎng)絡(luò)最大池化層中的計(jì)算。 簡(jiǎn)而言之,對(duì)于由特定大小的過濾器表示的每個(gè)區(qū)域,在我的示例中為(5,5),它將采用該區(qū)域的最大值并創(chuàng)建一個(gè)新的輸出矩陣,其中每個(gè)元素為該區(qū)域的最大值在原始輸入中。
To avoid overfitting problems, two dropouts of 45% layers were added. Several batch normalization layers were added to the model. Batch normalization is a technique for improving the speed, performance, and stability of artificial neural networks[10]. It shifts the distribution of neuron output, so it better fits the activation function.
為避免過度擬合的問題,添加了兩個(gè)45%的濾除層。 幾個(gè)批處理歸一化層已添加到模型中。 批處理規(guī)范化是一種用于提高人工神經(jīng)網(wǎng)絡(luò)的速度,性能和穩(wěn)定性的技術(shù)[10]。 它改變了神經(jīng)元輸出的分布,因此更適合激活功能。
Finally, the “cube” is flattened. No fully connected layers are implemented to keep the simplicity of the network and keep training fast. The last layer is 8 dense because 8 is the number of labels (diseases) present in the dataset. Since we are facing multi-label classification (data sample can belong to multiple instances) sigmoid activation function is applied to the last layer. The sigmoid function converts each score to the final node between 0 to 1, independent of what other scores are (in contrast to other functions like, for example, softmax), that is why sigmoid works best for the multi-label classification problems. Since we are using the sigmoid activation function, we must go with the binary cross-entropy loss. The selected optimizer is Adam with a low learning rate of 0.0001 because of the overfitting problems that I was facing during the training. The entire architecture of my CNN is presented in Fig.3.
最后,“立方體”被展平。 沒有實(shí)現(xiàn)完全連接的層來保持網(wǎng)絡(luò)的簡(jiǎn)單性并保持快速的培訓(xùn)。 最后一層是8致密的,因?yàn)?是數(shù)據(jù)集中存在的標(biāo)記(疾病)的數(shù)量。 由于我們面臨著多標(biāo)簽分類(數(shù)據(jù)樣本可以屬于多個(gè)實(shí)例),因此將S型激活函數(shù)應(yīng)用于最后一層。 sigmoid函數(shù)將每個(gè)分?jǐn)?shù)轉(zhuǎn)換為0到1之間的最終節(jié)點(diǎn),而與其他分?jǐn)?shù)無關(guān)(與諸如softmax之類的其他函數(shù)相反),這就是為什么sigmoid最能解決多標(biāo)簽分類問題。 由于我們使用的是S型激活函數(shù),因此必須考慮二進(jìn)制交叉熵?fù)p失。 所選的優(yōu)化器是Adam,學(xué)習(xí)速度為0.0001,因?yàn)槲以谂嘤?xùn)過程中遇到了過度擬合的問題。 我的CNN的整個(gè)架構(gòu)如圖3所示。
Fig. 3: Model summary圖3:模型摘要實(shí)驗(yàn)與結(jié)果 (Experiments and Results)
For simplicity, I wanted to start my research with easy proof-of-concept experiments, on less challenging and smaller datasets, to test if all previous assumptions were correct. Thus, I started training a simple model to detect if an eye has normal fundus or cataract, training only on images labeled as N (normal) or C (cataract). The results were very satisfactory, using a relatively simple network in 12 epochs my model got 93% on validation accuracy. This already shows that using CNN it is possible to automatically detect eye cataracts! In each next experiment, I was adding to the dataset images of another class. The fourth experiment is performed on the entire ODIR dataset, achieving almost 50% validation accuracy. Results from the experiments are presented in Table 1. As we can clearly see the overall model has low results because it is hard to train it to detect diabetes correctly since the eye with diabetes looks almost the same as the eye with normal fundus. Detecting myopia or cataract is a much easier task because these images vary a lot from each other and from the normal fundus. Illustration of different selected diseases is presented in the Fig. 4.
為簡(jiǎn)單起見,我想從簡(jiǎn)單的概念驗(yàn)證實(shí)驗(yàn)開始研究,以減少挑戰(zhàn)性和縮小數(shù)據(jù)集的方式來測(cè)試所有先前的假設(shè)是否正確。 因此,我開始訓(xùn)練一個(gè)簡(jiǎn)單的模型來檢測(cè)眼睛是否具有正常的眼底或白內(nèi)障,僅對(duì)標(biāo)記為N(正常)或C(白內(nèi)障)的圖像進(jìn)行訓(xùn)練。 結(jié)果非常令人滿意,在12個(gè)時(shí)間段內(nèi)使用相對(duì)簡(jiǎn)單的網(wǎng)絡(luò),我的模型的驗(yàn)證準(zhǔn)確性達(dá)到93%。 這已經(jīng)表明,使用CNN可以自動(dòng)檢測(cè)眼睛白內(nèi)障! 在接下來的每個(gè)實(shí)驗(yàn)中,我都將另一個(gè)類的圖像添加到數(shù)據(jù)集中。 第四個(gè)實(shí)驗(yàn)是在整個(gè)ODIR數(shù)據(jù)集上進(jìn)行的,驗(yàn)證精度幾乎達(dá)到50%。 實(shí)驗(yàn)結(jié)果列于表1。我們可以清楚地看到整個(gè)模型的結(jié)果很低,因?yàn)楹茈y訓(xùn)練它正確地檢測(cè)出糖尿病,因?yàn)樘悄虿⊙叟c眼底正常的眼幾乎一樣。 檢測(cè)近視或白內(nèi)障是一個(gè)容易得多的任務(wù),因?yàn)檫@些圖像彼此之間以及與正常眼底之間存在很大差異。 圖4給出了不同選定疾病的圖示。
Table 1: Experiment results. Legend: N — normal, C- cataract, M — myopia, A — AMD, D — diabetes, ALL — model trained on the entire ODIR dataset表1:實(shí)驗(yàn)結(jié)果。 圖例:N-正常,白內(nèi)障,M-近視,A-AMD,D-糖尿病,ALL-在整個(gè)ODIR數(shù)據(jù)集上訓(xùn)練的模型 Fig. 4: Illustration of different eye diseases. Clearly Diabetes seems to be the most challenging in detecting and cataract is the easiest as varies the most from the normal fundus.圖4:不同眼病的圖示。 顯然,糖尿病似乎是檢測(cè)中最具挑戰(zhàn)性的疾病,而白內(nèi)障最容易發(fā)生,因?yàn)榕c正常眼底的差異最大。For all experiments, the same neural network architecture was used. The only difference is the number of epochs each experiment needed to get to the presented results (some needed to be early stopped, others needed more epochs to learn). Also, for experiments that did not include the entire dataset, softmax activation function, and categorical cross-entropy loss were used since they are multi-class, not multi-label classification problems.
對(duì)于所有實(shí)驗(yàn),使用相同的神經(jīng)網(wǎng)絡(luò)架構(gòu)。 唯一的區(qū)別是每個(gè)實(shí)驗(yàn)達(dá)到提出的結(jié)果所需的時(shí)期數(shù)(有些需要提前停止,其他的則需要更多的時(shí)期來學(xué)習(xí))。 另外,對(duì)于不包含整個(gè)數(shù)據(jù)集的實(shí)驗(yàn),使用softmax激活函數(shù)和分類交叉熵?fù)p失,因?yàn)樗鼈儗儆诙囝惗嵌鄻?biāo)簽分類問題。
關(guān)于模型可伸縮性的最終考慮 (Final Considerations on Model Scalability)
Nowadays, in the world of Big Data, it is crucial to evaluate every IT project, based on its scalability and reproducibility. From the beginning of the implementation of this project, I put a lot of emphasis on the idea, that even though it is a research project, maybe in the future with more data points of eye diseases the model could be re-trained, and certainly will achieve much better results having more images to train on. So, the main goal was to build a universal data pipeline that is able to handle many more datapoints. This goal was mostly achieved by using advanced TensorFlow library, especially with the dataset object, that supports ETL processes (Extract, Transform, Load) on large datasets. Unfortunately, some transformations were needed to be done before creating the TensorFlow dataset object, which are image resizing and augmenting minority classes. Maybe in the future, it will be possible to resize images “on the fly” faster, and more augmentation functions would be added like random rotation, which was already mentioned before. But if we consider having more data points in the future, possibly it would not be necessary to perform any augmentations, as sufficiently enough image variations would be provided. From the perspective of other popular datasets used in deep learning projects, ODIR would be considered as a small one. That is the reason why data points had to be augmented and oversampled in order to achieve sensible results.
如今,在大數(shù)據(jù)世界中,至關(guān)重要的是根據(jù)其可伸縮性和可再現(xiàn)性評(píng)估每個(gè)IT項(xiàng)目。 從這個(gè)項(xiàng)目的實(shí)施開始,我就非常強(qiáng)調(diào)這個(gè)想法,即使這是一個(gè)研究項(xiàng)目,也許將來在有更多眼病數(shù)據(jù)點(diǎn)的情況下,該模型可以重新訓(xùn)練,當(dāng)然可以訓(xùn)練更多圖像,效果會(huì)更好。 因此,主要目標(biāo)是建立一個(gè)能夠處理更多數(shù)據(jù)點(diǎn)的通用數(shù)據(jù)管道。 此目標(biāo)主要是通過使用高級(jí)TensorFlow庫(尤其是與支持大型數(shù)據(jù)集上的ETL流程(提取,轉(zhuǎn)換,加載)的數(shù)據(jù)集對(duì)象)來實(shí)現(xiàn)的。 不幸的是,在創(chuàng)建TensorFlow數(shù)據(jù)集對(duì)象之前需要進(jìn)行一些轉(zhuǎn)換,這些轉(zhuǎn)換是圖像大小調(diào)整和增強(qiáng)少數(shù)類。 也許在將來,可以更快地“即時(shí)”調(diào)整圖像大小,并且將添加更多的增強(qiáng)功能,例如之前已經(jīng)提到的隨機(jī)旋轉(zhuǎn)。 但是,如果我們考慮在將來擁有更多的數(shù)據(jù)點(diǎn),則可能不需要進(jìn)行任何擴(kuò)充,因?yàn)閷⑻峁┳銐虻膱D像變化。 從深度學(xué)習(xí)項(xiàng)目中使用的其他流行數(shù)據(jù)集的角度來看,ODIR將被視為一小部分。 這就是為什么必須對(duì)數(shù)據(jù)點(diǎn)進(jìn)行擴(kuò)充和過采樣才能獲得合理的結(jié)果的原因。
摘要 (Summary)
In this project, I have proved that it is possible to detect various eye diseases using convolutional neural networks. The most satisfying result is detecting cataracts with 93% accuracy. Examining all the diseases at one time, gave significantly lower results. With the ODIR dataset providing all-important variations of a specific disease to the training model was not always possible, which affects the final metrics. Although, I am sure that having a bigger dataset, would increase the accuracy of predictions and finally automate the process of detecting ocular diseases.
在這個(gè)項(xiàng)目中,我證明了可以使用卷積神經(jīng)網(wǎng)絡(luò)檢測(cè)各種眼部疾病。 最令人滿意的結(jié)果是以93%的準(zhǔn)確度檢測(cè)白內(nèi)障。 一次檢查所有疾病,結(jié)果明顯偏低。 利用ODIR數(shù)據(jù)集,不可能總是向訓(xùn)練模型提供特定疾病的所有重要變化,這會(huì)影響最終指標(biāo)。 雖然,我相信擁有更大的數(shù)據(jù)集會(huì)提高預(yù)測(cè)的準(zhǔn)確性,并最終使眼部疾病的檢測(cè)過程自動(dòng)化。
翻譯自: https://towardsdatascience.com/ocular-disease-recognition-using-convolutional-neural-networks-c04d63a7a2da
卷積網(wǎng)絡(luò)和卷積神經(jīng)網(wǎng)絡(luò)
總結(jié)
以上是生活随笔為你收集整理的卷积网络和卷积神经网络_卷积神经网络的眼病识别的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 最后一季,《怪奇物语》第 5 季将在今年
- 下一篇: 了解回归:迈向机器学习的第一步