深度学习实践:计算机视觉_深度学习与传统计算机视觉技术:您应该选择哪个?
深度學習實踐:計算機視覺
計算機視覺 (Computer Vision)
Deep Learning(DL) is undeniably one of the most popular tools used in the field of Computer Vision(CV). It’s popular enough to be deemed as the current de facto standard for training models to be later deployed in CV applications. But is DL the only available option for us to develop CV applications? What about Traditional techniques that have served the CV community for an eternity? Has the time to move ahead & drop working on Traditional CV techniques all together in favor of DL arrived already? In this article, we try to answer some of these questions with comprehensive use-case scenarios in support of both DL & Traditional CV implementations.
不可否認, 深度學習(DL)是計算機視覺(CV)領域中使用最廣泛的工具之一。 它足夠流行,可以被認為是培訓模型的當前事實上的標準,以便以后在CV應用程序中部署。 但是DL是我們開發(fā)CV應用程序的唯一可用選擇嗎? 對于為CV社區(qū)提供永恒服務的傳統(tǒng)技術呢? 是否已經(jīng)到了繼續(xù)推進傳統(tǒng)CV技術并放棄支持DL的時候了? 在本文中,我們嘗試使用全面的用例場景來回答其中的一些問題,以支持DL和傳統(tǒng)CV實現(xiàn)。
歷史的第一位 (A Little Bit Of History First)
The field of Computer Vision started gaining traction dating as far back as the late 1950s till the late 1960s when researchers wanted to teach computers “to be…h(huán)uman”. It was around this time when researchers tried to mimic the human visual system in order to achieve a new stepping stone to endow machines with human intelligence. Thanks to the extensive research being done back then, Traditional CV techniques like Edge Detection, Motion Estimation, Optical Flow were developed.
最早可以追溯到1950年代末直到1960年代末的計算機視覺領域開始受到人們的關注,當時研究人員希望教計算機“ 成為……人類 ”。 大約在這個時候,研究人員試圖模仿人類的視覺系統(tǒng),以便為賦予人類智能的機器提供新的墊腳石。 由于當時進行了廣泛的研究,因此開發(fā)了傳統(tǒng)的CV技術,例如邊緣檢測 , 運動估計 , 光流 。
It wasn’t until the 1980s when Convolutional Neural Networks(CNNs) were developed. Aptly named, the Neocognitron which was the first CNN with the usual multilayered & shared weights Neural Nets we see today. But the popularity of DL skyrocketed only after LeNet was developed jointly by Yann LeCun & his colleagues in the 1990s. It was a pioneering moment since no other previous algorithms could ever achieve such an incredibly high accuracy as CNNs.
直到1980年代, 卷積神經(jīng)網(wǎng)絡(CNN)才被開發(fā)出來。 恰當命名的Neocognitron是我們今天看到的第一個具有通常的多層和共享重量神經(jīng)網(wǎng)絡的 CNN。 但是只有在1990年代Yann LeCun及其同事共同開發(fā)LeNet之后,DL的流行才飛速上升。 這是一個開創(chuàng)性的時刻,因為以前沒有其他算法能夠像CNN一樣達到如此令人難以置信的高精度。
A decade and a half later, CNNs have made such vast developments, it even outperforms a human being with accuracy rates as high as 99% in the popular MNIST dataset![1] No wonder, it’s so easy to believe, CNNs would come down as a messiah for the CV community. But I doubt it happening any time soon.
十年半后,CNN取得了如此巨大的發(fā)展,在流行的MNIST數(shù)據(jù)集中 ,其準確率甚至高達99%,勝過人類![1]難怪,這么容易相信,CNN會下降作為CV社區(qū)的救世主。 但是我懷疑它會很快發(fā)生。
CNNs & DL techniques come with certain caveats which when compared to older Traditional techniques might make the latter appear like a godsend gift for us mortals. And this article should throw some light into those differences. We’ll gradually dive deeper into what those differences are & how do they fare for certain use cases in the following sections.
CNN和DL技術帶有一些警告,與舊的傳統(tǒng)技術相比,它們可能會使后者看起來像是送給我們凡人的天賜禮物。 并且本文應該闡明這些差異。 在接下來的幾節(jié)中,我們將逐步深入探討這些差異是什么以及它們?nèi)绾芜m應某些用例。
差異 (The Differences)
To properly contemplate the differences between the two approaches — Deep Learning & Traditional techniques, I like to give an example of comparing an SUV and/or a hatchback. They’re both four-wheeler & an experienced driver can differentiate the pain-points as well as the ease-of-use for both the vehicles. While the hatchback might be preferable to drop your child to school & bring him/her back, the SUV would be perfect for cross-country travel if comfort & fuel efficiency is your concern. The vice-versa is possible but is probably not advisable, in general.
為了正確考慮兩種方法(深度學習和傳統(tǒng)技術)之間的差異,我想舉一個比較SUV和/或掀背車的示例。 他們都是四輪驅動車,而且經(jīng)驗豐富的駕駛員可以區(qū)分這兩種車輛的痛點以及易用性。 雖然掀背車可能更適合讓您的孩子上學并帶他/她回來,但如果您關注舒適性和燃油效率,那么越野車將是越野旅行的理想之選。 反之亦然,但通常不建議這樣做。
In this context, consider DL techniques the SUV among all ML tools for use in Computer Vision while the hatchback as the Traditional techniques. Similar to the difference in an SUV and the hatchback, DL & other Traditional techniques have merits-demerits too. So what are those differences? Let’s take a look at some of the prominent differences followed by a detailed description of the merits & demerits below.
在這種情況下,在計算機視覺中使用的所有ML工具中,將DL技術視為SUV,將掀背車作為傳統(tǒng)技術。 類似于SUV和掀背車的區(qū)別,DL和其他傳統(tǒng)技術也有優(yōu)缺點。 那有什么區(qū)別呢? 讓我們看一下一些突出的區(qū)別,然后在下面對優(yōu)點和缺點進行詳細描述。
The following infographic feature some of the differences between the two approaches very briefly.
以下信息圖非常簡要地介紹了這兩種方法之間的一些差異。
Comparison between Deep Learning & Traditional Techniques深度學習與傳統(tǒng)技術的比較The infographic makes it so much more clear about why Deep Learning is getting all the attention. A quick glance & you’ll notice with so many tick marks obviously DL has to be the best approach among the two, right? But, is it really the case?
該信息圖使人們更清楚地了解了深度學習為何獲得所有關注。 快速瀏覽一下,您會注意到帶有許多刻度線,顯然DL必須是兩者之中最好的方法,對嗎? 但是,真的是這樣嗎?
Let’s dig deeper & analyze the differences mentioned in the infographic.
讓我們更深入地分析圖表中提到的差異。
Requirement Of Manually Extracting Features From the Image By an Expert: A major drawback of the first few Machine Learning algorithms created during the early 1960s was the requirement of painstaking manual feature extraction of the images. The little bit of automation employed back then also required careful tuning by an expert since algorithms like SVM and/or KNN were used to find the important features. This requirement meant building a dataset with set features for the model to learn from. Hence, DL techniques was a real life-saver for the practitioners, since no longer do they’ve to worry about manually selecting the features for the model to learn from.
手動從圖像中提取特征的要求專家 :1960年代初創(chuàng)建的最初幾種機器學習算法的主要缺點是需要費力地手動提取圖像。 那時使用的一點點自動化也需要專家的仔細調整,因為使用了SVM和/或KNN之類的算法來查找重要特征。 這項要求意味著要建立一個具有設定特征的數(shù)據(jù)集以供模型學習。 因此,DL技術對于從業(yè)者來說是真正的救星,因為他們不再需要擔心手動選擇模型的特征以從中學習。
The Requirement of Heavy Computational Resources: Deep Learning is a computationally heavy task which was why the mid-1950s barely saw any advancements in the field. Fast forward a few decades, with major advancements made in GPU capabilities & other related computational resources, this present time has never been more perfect for advancing DL research. But it comes with a caveat, bigger & better computational resources has a hefty price tag as well which might not be very pocket-friendly for most people & enterprises. Thus in context to easily available resources, the Traditional approach might look like a clear winner against Deep Learning.
大量計算資源的需求 :深度學習是一項計算繁重的任務,這就是1950年代中期在該領域幾乎看不到任何進步的原因。 幾十年來,隨著GPU功能和其他相關計算資源的重大進步,當前這一時代對于推進DL研究而言從未如此完美。 但是需要注意的是,更大,更好的計算資源的價格也很高,對于大多數(shù)人和企業(yè)來說可能不是很方便。 因此,在易于獲得資源的背景下,傳統(tǒng)方法看起來像是深度學習的明顯贏家。
The Need For Huge Labeled Datasets: We live at a point of time when every second thousand & thousands of petabytes of data are being created & stored globally. It must be good news then, right? Well sadly, contrary to popular belief, storing huge amounts of data, especially image data is neither economically viable nor does it portray a sustainable business opportunity. You would be surprised to know, often most enterprises are sitting on a gold mine of a dataset, yet either they don’t have to expertise to benefit from it or the business can’t get rid of them, legally. Hence, finding a useful, labeled & in-context dataset is no easy task which is why most often Deep Learning is an overkill approach for a simple CV solution.
對巨大的標記數(shù)據(jù)集的需求 :我們生活在一個時刻,全球每秒鐘創(chuàng)建和存儲數(shù)以千計的PB數(shù)據(jù)。 那一定是個好消息,對吧? 令人遺憾的是,與普遍的看法相反,存儲大量數(shù)據(jù),尤其是圖像數(shù)據(jù)在經(jīng)濟上既不可行,也不描繪可持續(xù)的商機。 您會驚訝地發(fā)現(xiàn),通常大多數(shù)企業(yè)都坐在數(shù)據(jù)集的金礦上,但是他們要么不必從中受益,要么業(yè)務不能合法地擺脫它們。 因此,要找到有用的 ,帶有標簽且上下文相關的數(shù)據(jù)集并不是一件容易的事,這就是為什么大多數(shù)情況下,深度學習對于簡單的CV解決方案來說是一種過大的方法。
Black Box Models Which Are Difficult to Interpret: Traditional approaches make use of easy to understand & interpret statistical methods like SVM & KNN to find features for resolving common CV problems. While on the other hand, DL involves using very complex layers of Multilayered Perceptrons(MLP). These MLPs extract informative features from the images by activating the relevant areas on the images which are often difficult to interpret. In other words, you’ll have no clue why certain areas of an image were activated while the other wasn’t.
難以解釋的黑匣子模型 :傳統(tǒng)方法利用易于理解和解釋的統(tǒng)計方法(例如SVM和KNN)來找到解決常見CV問題的功能。 另一方面,DL涉及使用非常復雜的多層感知器(MLP)層。 這些MLP通過激活圖像上通常難以解釋的相關區(qū)域來從圖像中提取信息特征。 換句話說,您將不知道為什么圖像的某些區(qū)域被激活而其他區(qū)域沒有被激活。
Small & Easy Enough to Be Shipped and/or Deployed Inside a Microprocessor: Besides being computationally heavy, the model used in a DL approach are huge in size compared to other simple Traditional approaches. These models often vary from sizes of a few hundred megabytes to a gigabyte or two which is absolutely massive. While on the other hand, traditional approaches often output a model in sizes of just a few megabytes.
體積小巧且易于在微處理器內(nèi)運輸和部署 :與其他簡單的傳統(tǒng)方法相比,DL方法中使用的模型除了計算量大之外,其尺寸也很大 。 這些模型的大小通常從幾百兆字節(jié)到一到兩個千兆字節(jié)不等,這絕對是巨大的。 另一方面,傳統(tǒng)方法通常會輸出大小僅為幾兆字節(jié)的模型。
How Accurate Are the Predictions From the Two Approaches: One of the winning factors for DL to completely overshadow the achievements of the Traditional approaches is how extremely accurate the predictions are. It was a massive leap in the late 90s to the early 20s when Yann LeCunn & his colleagues came up with LeNet. It completely blitzed the previous accuracy rates made using Traditional approaches. Ever since then, DL has almost become the de facto go-to tool for any Computer Vision problems.
兩種方法的預測的準確性如何 :DL完全掩蓋傳統(tǒng)方法的成就的獲勝因素之一是預測的準確性。 從90年代末到20年代初,Yann LeCunn及其同事提出了LeNet,這是一次巨大的飛躍。 它完全顛覆了以前使用傳統(tǒng)方法得出的準確率。 從那時起,DL幾乎已成為解決任何計算機視覺問題的事實上的必備工具。
深度學習技術的挑戰(zhàn) (Challenges Of Deep Learning Techniques)
Although both DL & Traditional approaches have their trade-offs depending on certain use cases. Traditional approaches are more well established. DL techniques show promising results with incredibly high accuracy rates though. Without a doubt, DL based techniques are the poster child in the Computer Vision community. But DL techniques have their own set of drawbacks.
盡管DL和傳統(tǒng)方法都需要根據(jù)某些用例進行權衡。 傳統(tǒng)方法更加完善。 DL技術以令人難以置信的高準確率顯示了令人鼓舞的結果。 毫無疑問,基于DL的技術是Computer Vision社區(qū)中的典型代表。 但是DL技術有其自身的缺點。
And in the following section, I describe a few of those challenges faced by the DL techniques comprehensively.
在以下部分中,我將全面描述DL技術面臨的一些挑戰(zhàn)。
With the advent of Cloud Computing services like GCP & other Cloud Machine Learning platforms like Google AI platform, high-performance resources are readily available at a click of a button. But the ease of access comes with a caveat, significant price build-ups. At first glance, a $3/hr high-performant GPU instance doesn’t sound too costly. But the expenditures build up over time as the business grows & DL techniques take a lot of time to train as well.
隨著諸如GCP之類的云計算服務以及諸如Google AI平臺之類的其他云機器學習平臺的出現(xiàn),只需單擊一下按鈕,即可輕松獲得高性能資源。 但是,易于獲得伴隨著警告,大量的價格上漲。 乍一看,每小時3美元的高性能GPU實例聽起來并不太昂貴。 但是隨著業(yè)務的增長,支出會隨著時間而增加,并且DL技術也需要大量時間進行培訓。
There are still certain fields of CV where DL techniques are yet to make any significant developments. Some of those fields include — 3D Vision, 360 Cameras, SLAM, among many others. Until & unless DL techniques make progress towards resolving problems in those sub-fields traditional techniques are here to stay for a long time. [2]
仍然有一些CV領域的DL技術尚未取得重大進展。 其中一些領域包括3D Vision,360相機, SLAM等。 直到&除非DL技術在解決這些子領域的問題方面取得進展,否則傳統(tǒng)技術將在這里停留很長時間。 [2]
Quite surprisingly certain individuals of the community appear to advocate a data-driven approach towards resolving most CV problems. “Just increase the dataset size” is common knowledge in the community as of writing this article. But quite contrary to the opinion, the fundamental problem at the root is quality data to train the models on. There’s a popular saying in the community right now, “Garbage In, Garbage Out”. So until & unless a proper alternative to the data-driven approach is discovered, current DNNs will not perform better than what they’re already capable of.
令人驚訝的是,社區(qū)中的某些人似乎主張采用數(shù)據(jù)驅動的方法來解決大多數(shù)簡歷問題。 撰寫本文時,“ 只是增加數(shù)據(jù)集的大小 ”是社區(qū)中的常識。 但是與觀點完全相反,根本的根本問題是訓練模型所依據(jù)的質量數(shù)據(jù) 。 目前在社區(qū)中有一種流行的說法,“ 垃圾進,垃圾出 ”。 因此,直到&除非找到數(shù)據(jù)驅動方法的適當替代方案,否則當前DNN的性能不會比其已有能力更好。
應對上述挑戰(zhàn)的一些可能解決方案 (Some Possible Solutions To the Aforementioned Challenges)
Hybrid techniques can be leveraged extensively across various fields of implementation by using traditional techniques only on a portion of the computation process, while DNNs can be employed for identification and/or the classification process. In other words, the end-to-end ML job can be divided into CPU-bound jobs & GPU-bound jobs. For example, preprocessing on the CPU while training on the GPU.
可以利用混合技術 通過僅在計算過程的一部分上使用傳統(tǒng)技術,廣泛地跨各種實現(xiàn)領域,而DNN可以用于識別和/或分類過程。 換句話說,端到端ML作業(yè)可以分為CPU綁定作業(yè)和GPU綁定作業(yè)。 例如,在GPU上訓練時在CPU上進行預處理。
As multi-threaded CPUs are becoming more common, I doubt it will take longer for a data pipeline which will make preprocessing before training a breeze by taking advantage of the multi-threaded environment. Besides, it is observed DNNs tend to be more accurate when the input data is preprocessed. Hence, it goes without saying, there’s a need for developing a system of data pipelines to be run on the CPU instead of the GPU.
隨著多線程CPU變得越來越普遍,我懷疑數(shù)據(jù)流水線將花費更長的時間,這將使多線程環(huán)境的優(yōu)勢使預處理工作變得輕而易舉。 此外,可以觀察到在對輸入數(shù)據(jù)進行預處理時,DNN往往更準確。 因此,不用說,有必要開發(fā)一種數(shù)據(jù)流水線系統(tǒng),以便在CPU而非GPU上運行。
Today, Transfer Learning or using a pre-trained model is almost the de facto standard for training a new Image Classifier and/or an Object Detection model. But the caveat is, this kind of model performs even better when the new input data are somewhat similar to that of the pre-trained model. So once again preprocessing on the input data on the CPU & then training with a pre-trained model can significantly reduce Cloud Computing expenditures without any loss in performance.
如今, 轉移學習或使用預訓練模型幾乎已成為訓練新圖像分類器和/或對象檢測模型的事實上的標準。 但是需要注意的是,當新輸入數(shù)據(jù)與預先訓練的模型有些相似時,這種模型的性能會更好。 因此,再次對CPU上的輸入數(shù)據(jù)進行預處理,然后使用預先訓練的模型進行訓練可以顯著減少云計算的支出,而不會造成性能損失。
Employing a data-driven approach for certain business ventures might pay off in the future. But there’s always a logic-driven alternative albeit one which mightn’t sound very attractive. So sticking with age-old tried-and-tested logic-driven techniques cannot go wrong. Worst that could happen is you mightn’t make more money than you’re already earning.
在某些企業(yè)中采用數(shù)據(jù)驅動的方法可能會在將來獲得回報。 但是,總有一種邏輯驅動的替代方案,盡管聽起來可能并不很吸引人。 因此,堅持使用久經(jīng)考驗的邏輯驅動技術不會出錯。 可能發(fā)生的最糟糕的情況是,您賺的錢可能不會比您已經(jīng)賺到的多。
結語! (Wrapping Up!)
The developments made over the two decades in Deep Learning techniques for Computer Vision applications are no doubt enticing. I mean a research paper boasting of beating the human baseline on the MNIST dataset sounds amazing, almost futuristic. No wonder, some entrepreneurs out there with a sky-high vision would talk big about the next big thing with his/her product. But we shouldn’t forget the fact that the Machine Learning research community is facing a reproducibility crisis.[3] Researchers tend to just publish the best experiment out of many which worked as expected.
過去二十年來,用于計算機視覺應用程序的深度學習技術取得了令人矚目的發(fā)展。 我的意思是說吹噓在MNIST數(shù)據(jù)集上擊敗人類基線的研究論文聽起來令人驚嘆,幾乎是未來主義。 難怪,那里的一些具有遠見卓識的企業(yè)家會談論他/她的產(chǎn)品的下一件大事。 但是,我們不應忘記機器學習研究社區(qū)正面臨可再現(xiàn)性危機的事實。[3] 研究人員往往只是發(fā)布許多按預期工作的最佳實驗。
What does it mean for businesses & entrepreneurs looking forward to taking advantage of this supposedly bleeding-edge proofs-of-concept?
對于希望利用這種所謂的前沿概念驗證的企業(yè)和企業(yè)家意味著什么?
Simple, tread carefully.
簡單,小心行事。
Towards the end of the day, you’ll come back to employing Traditional techniques for your product anyway. The tried & tested techniques will almost always suit your needs.
快要結束時,無論如何您都會回到采用傳統(tǒng)技術的方式。 經(jīng)過考驗的技術幾乎總是可以滿足您的需求。
So what’s the lesson here?
那么這是什么教訓?
When in doubt stick to Traditional techniques, Deep Learning has a long way to go & will take another eternity to REALLY production-ready for your business.
如果不確定是否要使用傳統(tǒng)技術,則深度學習還有很長的路要走,并且需要花很長時間才能真正為您的業(yè)務做好生產(chǎn)準備。
[1] Savita Ahlawat, Amit Choudhary, et al, Improved Handwritten Digit Recognition Using Convolutional Neural Networks (CNN) (2018), MDPI
[1] Savita Ahlawat,Amit Choudhary等人,《 使用卷積神經(jīng)網(wǎng)絡(CNN)改進手寫數(shù)字識別》 (2018年),MDPI
[2] Nial O’ Mahony, et al, Deep Learning Vs. Traditional Computer Vision, Institute of Technology Tralee (2019)
[2] Nial O'Mahony等人,《 深度學習與挑戰(zhàn)》 。 傳統(tǒng)計算機視覺 ,特拉利技術學院(2019)
[3] Shlomo Engelson Argamon, People Cause Replication Problems, Not Machine Learning (2019), American Scientist (accessed on 14th August 2020)
[3] Shlomo Engelson Argamon,《 人會引起復制問題,而不是機器學習》 (2019年),美國科學家( 于2020年8月14日訪問 )
翻譯自: https://medium.com/discover-computer-vision/deep-learning-vs-traditional-techniques-a-comparison-a590d66b63bd
深度學習實踐:計算機視覺
總結
以上是生活随笔為你收集整理的深度学习实践:计算机视觉_深度学习与传统计算机视觉技术:您应该选择哪个?的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 如何选取关键字,让你的应用关键词越来越多
- 下一篇: 关于微软Silverlight,你应该知