ai与虚拟现实_将AI推向现实世界
ai與虛擬現(xiàn)實(shí)
If you fit one of these profiles, this article is for you:
如果您適合這些配置文件之一,那么本文適合您:
● You are a data science manager. You’d like to improve your team’s productivity with some best practices.
● 您是數(shù)據(jù)科學(xué)經(jīng)理。 您想通過一些最佳實(shí)踐來提高團(tuán)隊(duì)的生產(chǎn)力。
● You are a data scientist. You’d like to learn what happens downstream: How your model turns into a product.
● 您是一名數(shù)據(jù)科學(xué)家。 您想了解下游發(fā)生了什么:您的模型如何變成產(chǎn)品。
● You are a software architect. You are designing or expanding a platform to support data science use cases.
● 您是一名軟件架構(gòu)師。 您正在設(shè)計(jì)或擴(kuò)展一個(gè)平臺(tái)來支持?jǐn)?shù)據(jù)科學(xué)用例。
I recently completed an online course that I think you should check out. It’s called Full Stack Deep Learning. It covers the full lifecycle of an AI application, from ideation through deployment but it does not cover theory or model fitting. If you are an intermediate data scientist and want to “zoom out” from your niche, this course will show you how the sausage is made, tracking it from one station to the next.
我最近完成了在線課程,我認(rèn)為您應(yīng)該退出該課程。 這就是所謂的全棧深度學(xué)習(xí) 。 它涵蓋了從構(gòu)思到部署的AI應(yīng)用程序的整個(gè)生命周期,但不涵蓋理論或模型擬合。 如果您是中級數(shù)據(jù)科學(xué)家,并且想從自己的細(xì)分市場中“脫穎而出”,那么本課程將向您展示如何制作香腸 ,并從一個(gè)工作站到另一個(gè)工作站進(jìn)行跟蹤。
The course started out as a pricy SF-based bootcamp in 2018 but is now available for free. It features some industry heavyweights, including Tesla’s Andrej Karpahy and fast.ai’s Jeremy Howard. I took it because I wanted to compare my own practices against what the celebrities do. By way of background, I am a partner at Genpact, a consulting company. I help clients transform their processes and sometimes their business using AI. In practice, this means I create a proof of concept (POC) to demonstrate potential value and then run the team that implements the solution to capture that value.
該課程于2018年作為基于SF的價(jià)格昂貴的訓(xùn)練營開始,但現(xiàn)在免費(fèi)提供。 它具有一些行業(yè)重量級人物,包括特斯拉的安德烈·卡帕希(Andrej Karpahy)和fast.ai的杰里米·霍華德(Jeremy Howard)。 我之所以這么做,是因?yàn)槲蚁雽⒆约旱淖龇ㄅc名人的行為進(jìn)行比較。 作為背景,我是咨詢公司Genpact的合伙人。 我?guī)椭蛻羰褂肁I來改變他們的流程,有時(shí)甚至是他們的業(yè)務(wù)。 在實(shí)踐中,這意味著我將創(chuàng)建概念證明(POC)來展示潛在價(jià)值,然后運(yùn)行實(shí)施該解決方案的團(tuán)隊(duì)來獲取該價(jià)值。
“Full Stack Deep Learning” exceeded my expectations. It is organized into six content areas as well as hands-on labs and guest lectures from AI luminaries. Here is what I found most new and useful:
“全棧深度學(xué)習(xí)”超出了我的期望。 它分為六個(gè)內(nèi)容區(qū)域,以及來自AI專家的動(dòng)手實(shí)驗(yàn)室和客座演講。 這是我發(fā)現(xiàn)的最新穎和有用的內(nèi)容:
1.設(shè)置ML項(xiàng)目 (1. Setting up ML Projects)
Photo by Sven Mieke on Unsplash Sven Mieke在Unsplash上拍攝的照片This is an “executive” module that discusses planning, prioritizing, staffing and scheduling AI projects.
這是一個(gè)“執(zhí)行”模塊,討論了AI項(xiàng)目的計(jì)劃,優(yōu)先級劃分,人員配備和日程安排。
I did not find much new here, but it is a good, concise executive overview. Since the course focuses on deep learning as opposed to traditional ML, it brings up three important points:
我在這里沒有發(fā)現(xiàn)太多新內(nèi)容,但這是一個(gè)很好,簡潔的執(zhí)行概述。 由于該課程側(cè)重于深度學(xué)習(xí)而不是傳統(tǒng)ML,因此它提出了三個(gè)重點(diǎn):
● Deep learning (DL) , unlike more traditional machine learning, is “still research.” You should not plan for a 100% success rate
●與更傳統(tǒng)的機(jī)器學(xué)習(xí)不同,深度學(xué)習(xí)(DL)是“仍在研究中”。 您不應(yīng)該計(jì)劃100%的成功率
● If you are “graduating” from “classical” ML to DL, plan on spending a lot more time and money on labeling than you are used to…
●如果您要從“經(jīng)典” ML逐漸“升級”到DL,則計(jì)劃花費(fèi)更多的時(shí)間和金錢來貼標(biāo)簽,這比您過去習(xí)慣的要多得多。
● …but don’t throw out your playbook. In both cases, you are looking for settings where cheap prediction will have a large business impact
●…但不要丟掉您的劇本。 在這兩種情況下,您都在尋找便宜的預(yù)測會(huì)對業(yè)務(wù)產(chǎn)生重大影響的設(shè)置
2.基礎(chǔ)設(shè)施和工具 (2. Infrastructure and Tooling)
Kyle Head on Kyle Head的UnsplashUnsplash圖片This is the module I found most helpful. It sets up a comprehensive framework for developing an AI/ML application, from the lab through production. At each layer or category, it covers key functionality, how it fits with other layers and the major tool choices.
這是我發(fā)現(xiàn)最有用的模塊。 它為從實(shí)驗(yàn)室到生產(chǎn)的AI / ML應(yīng)用程序開發(fā)建立了一個(gè)全面的框架。 在每個(gè)層或類別中,它涵蓋了關(guān)鍵功能,如何與其他層配合以及主要的工具選擇。
Let me emphasize: what makes this course different is how comprehensive the framework is. Most public AI/ML content is focused on model development. Some sources cover just data management or just deployment. Commercial vendors often understate the complexity of the process and skip steps. This is the most “panoramic” picture I’ve seen if you are trying to understand the AI/ML pipeline from alpha to omega.
讓我強(qiáng)調(diào)一下:使本課程與眾不同的是框架的全面程度。 大多數(shù)公共AI / ML內(nèi)容都集中在模型開發(fā)上。 一些資源僅涉及數(shù)據(jù)管理或部署。 商業(yè)供應(yīng)商常常低估了流程的復(fù)雜性,并跳過了步驟。 如果您試圖了解從alpha到omega的AI / ML管道,這是我所見過的最“全景”圖片。
The course is “opinionated” — it sometimes calls “category winners” which is helpful if you’re placing bets. For example, it calls Kubernetes as a winner in the “resource management” category. I agree with most of these calls, but not with all. For example, among cloud providers it picks AWS and pans Azure as having a “bad user experience.” While AWS is excellent, several of our clients (rightly) chose Azure, particularly those that already have a Microsoft stack (Excel, MS SQL, etc.)
該課程是“有針對性的”-有時(shí)稱為“類別優(yōu)勝者”,這對您下注很有幫助。 例如,它稱Kubernetes為“資源管理”類別的贏家。 我同意這些電話中的大多數(shù),但不是全部。 例如,在云提供商中,它選擇AWS并將Pan Azure視為具有“糟糕的用戶體驗(yàn)”。 雖然AWS非常出色,但我們的幾個(gè)客戶(正確地)選擇了Azure,特別是那些已經(jīng)具有Microsoft堆棧的客戶(Excel,MS SQL等)
After setting up the overall framework, this module digs into Development and Training/Evaluation. I found three areas particularly interesting:
在建立了總體框架之后,本模塊將深入研究開發(fā)與培訓(xùn)/評估。 我發(fā)現(xiàn)三個(gè)方面特別有趣:
● Prototyping: I’m always looking for quick and easy ways to create a proofs of concept (POCs) for clients. I need to produce a visually attractive, interactive POC that is easily accessible over a public or semi-public URL. My ideal solution would give me code-level control over the model while not making me code a lot of HTML or Javascript. One-click deployment is a plus. I’ve been using Shiny but would like to do something similar with Python. The course introduced me to streamlit, which I will be investigating further. Also interesting is dash, which is curiously not covered.
● 原型制作 :我一直在尋找快速簡便的方法來為客戶創(chuàng)建概念證明(POC)。 我需要制作一個(gè)視覺上吸引人的交互式POC,可以輕松地通過公共或半公共URL進(jìn)行訪問。 我理想的解決方案將使我能夠?qū)δP瓦M(jìn)行代碼級控制,而又不會(huì)使我編寫大量HTML或Javascript。 一鍵式部署是一個(gè)加號。 我一直在使用Shiny,但想使用Python做類似的事情。 本課程將我介紹給streamlit ,我將對其進(jìn)行進(jìn)一步研究。 有趣的是dash ,奇怪的是沒有涵蓋。
● Experiment Management is an interesting category: It keeps track of how well your model performs under a variety of configuration options (experiments). I coded my own version of this for competing on Kaggle. I didn’t know this was a category with a name. I will be checking out a few of the tools recommended by this course, including Weights and Biases.
● 實(shí)驗(yàn)管理是一個(gè)有趣的類別:它跟蹤模型在各種配置選項(xiàng)(實(shí)驗(yàn))下的性能。 我編寫了自己的版本,以便在Kaggle上競爭。 我不知道這是一個(gè)帶有名稱的類別。 我將檢查本課程推薦的一些工具,包括Weights和Biases 。
● All-in-one: There was a nice, informative comparison between all the all-in-one platforms available. AWS SageMaker and GCP AI look like the best choices at the moment. If pressed, I would bet others will be acquired or copied by the cloud providers.
● 多合一: 在所有可用的多合一平臺(tái)之間進(jìn)行了很好的,信息豐富的比較。 AWS SageMaker和GCP AI看起來是目前的最佳選擇。 如果按下,我敢打賭其他人將被云提供商收購或復(fù)制。
3.數(shù)據(jù)管理 (3. Data Management)
Photo by Vince Veras on Unsplash 文斯·維拉斯 ( Vince Veras)攝于UnsplashThis module discusses how to store and manage datasets related to your pipeline. I did not find much new here. The material on data augmentation was interesting, but mostly applies to computer vision, which I have not done much of.
本模塊討論如何存儲(chǔ)和管理與管道相關(guān)的數(shù)據(jù)集。 我在這里沒有發(fā)現(xiàn)太多新東西。 關(guān)于數(shù)據(jù)增強(qiáng)的材料很有趣,但主要適用于計(jì)算機(jī)視覺,而我并未做太多工作。
4.機(jī)器學(xué)習(xí)團(tuán)隊(duì) (4. Machine Learning Teams)
Photo by Matteo Vistocco on Unsplash Matteo Vistocco在Unsplash上的照片This module discusses the HR portion of the project: roles, team structure, managing projects, etc. In my view, this content belongs in module 1 above — Setting up ML Projects.
本模塊討論項(xiàng)目的人力資源部分:角色,團(tuán)隊(duì)結(jié)構(gòu),管理項(xiàng)目等。在我看來,此內(nèi)容屬于上面的模塊1 —設(shè)置ML項(xiàng)目。
There were some interesting points about how to get a job in the field — for hiring managers and candidates. There is also a good summary of the typical roles in an ML project:
關(guān)于如何在該領(lǐng)域找到一份工作,有一些有趣的觀點(diǎn),即招聘經(jīng)理和候選人。 ML項(xiàng)目中的典型角色也有很好的總結(jié):
5.培訓(xùn)和調(diào)試 (5. Training and Debugging)
Photo by Steven Lelham on Unsplash 史蒂文·萊勒姆 ( Steven Lelham)在Unsplash上拍攝的照片This module discusses the process of getting a model to work in the lab. It should really be required reading for every data scientist, and is similar to the workflow I used to win several Kaggle contests. You can also get this content in many other places, but this is a well-organized and succinct presentation:
本模塊討論使模型在實(shí)驗(yàn)室中可用的過程。 每位數(shù)據(jù)科學(xué)家都必須閱讀該書,并且該書與我贏得過幾次Kaggle競賽的工作流程相似。 您還可以在許多其他地方獲得此內(nèi)容,但這是一個(gè)組織良好且簡潔的演示文稿:
The discussion around debugging DL models was particularly good: Get your model to run, overfit a single batch and compare to a known result. Just to illustrate the depth, here is the subsection on overfitting a single batch:
關(guān)于調(diào)試DL模型的討論特別好:讓您的模型運(yùn)行,過度擬合單個(gè)批處理并與已知結(jié)果進(jìn)行比較。 只是為了說明深度,這是關(guān)于過度擬合單個(gè)批次的小節(jié):
More tips on overfitting are at the end of the article.
本文的末尾提供了更多關(guān)于過度擬合的技巧。
6.測試與部署 (6. Testing and Deployment)
Photo by Louis Reed on Unsplash Louis Reed在Unsplash上拍攝的照片This module discusses how to get your model from the lab to the real world. It’s the module I was originally looking for when I took the class. I found several useful nuggets here:
本模塊討論如何將模型從實(shí)驗(yàn)室轉(zhuǎn)移到現(xiàn)實(shí)世界。 這是我上課時(shí)最初尋找的模塊。 我在這里找到了幾個(gè)有用的塊:
Testing an ML system is very different from testing traditional software because its behavior is driven by the data as well as the algorithm:
測試ML系統(tǒng)與測試傳統(tǒng)軟件有很大不同,因?yàn)樗男袨槭怯蓴?shù)據(jù)和算法驅(qū)動(dòng)的:
You need to adjust your test suite accordingly. The course provides an excellent checklist for doing just that, taken from the now-famous paper Hidden Technical Debt in Machine Learning Systems.
您需要相應(yīng)地調(diào)整測試套件。 本課程提供了一個(gè)出色的清單,該清單摘自如今著名的論文《機(jī)器學(xué)習(xí)系統(tǒng)中的隱藏技術(shù)債務(wù)》 。
The course recommends you check that training-time and production-time variables have approximately consistent distributions (Monitoring Test 3, above). This is a critical test. It can help you detect a runtime error, such as blanks in the data feed. It can also tell you it may be time to re-train the model because the input is different than what you expected. A simple way to accomplish this is to plot training data vs production-time data, variable by variable. The Domino Data Lab tool does this.
本課程建議您檢查培訓(xùn)時(shí)間和生產(chǎn)時(shí)間變量是否具有大致一致的分布(上面的監(jiān)控測試3)。 這是一項(xiàng)關(guān)鍵測試。 它可以幫助您檢測運(yùn)行時(shí)錯(cuò)誤,例如數(shù)據(jù)饋送中的空白。 它還可以告訴您可能是時(shí)候重新訓(xùn)練模型了,因?yàn)檩斎氲膬?nèi)容與您的預(yù)期不同。 一種簡單的方法是繪制訓(xùn)練數(shù)據(jù)與生產(chǎn)時(shí)間數(shù)據(jù),并逐變量繪制。 Domino Data Lab工具可以執(zhí)行此操作。
A better way, which is not covered in the course, is to use adversarial validation: Train an auxiliary model (in production) which tries to classify an observation as belonging to train or prod data. If this model is successful at distinguishing the two, you have a significant distribution shift. You can then inspect the model to find the most important variables that drive that shift.
更好的方法(本課程中未涉及)是使用對抗性驗(yàn)證 :訓(xùn)練(生產(chǎn)中的)輔助模型,該模型試圖將觀察結(jié)果分類為屬于訓(xùn)練或生產(chǎn)數(shù)據(jù)。 如果此模型可以成功地區(qū)分兩者,則您的分配將發(fā)生重大變化。 然后,您可以檢查模型,以找出驅(qū)動(dòng)這一轉(zhuǎn)變的最重要變量。
Deployment is covered with a good introduction to Kubernetes and Docker, as well as GPU-based model serving.
Kubernetes和Docker以及基于GPU的模型服務(wù)都很好地介紹了部署。
客座講座 (Guest Lectures)
The course includes guest lectures from industry heavyweights. The quality is highly variable. Some speakers are polished and prepared, others… not so much. I was most impressed with two guests:
該課程包括來自行業(yè)重量級人物的客座演講。 質(zhì)量變化很大。 有些揚(yáng)聲器是經(jīng)過拋光和準(zhǔn)備的,而另一些則不是。 兩個(gè)客人給我留下了最深刻的印象:
● Jeremy Howard of fast.ai: This talk provided lots of “news you can use” in terms of improving model performance.
● fast.ai的 杰里米·霍華德 ( Jeremy Howard ) :在提高模型性能方面,此演講提供了許多“您可以使用的新聞”。
o The Fast.ai library is designed to use fewer resources (human and machine) to get good results. For example, training ImageNet in 3 hours for $25. This focus on efficiency is very much aligned with what our clients are looking for.
o Fast.ai庫旨在使用更少的資源(人力和機(jī)器資源)來獲得良好的結(jié)果。 例如, 以3美元的價(jià)格在3個(gè)小時(shí)內(nèi)培訓(xùn)ImageNet 。 對效率的關(guān)注與我們的客戶所尋找的非常一致。
o Howard asks “Why are people trying to automate machine learning?” The idea is we can get much better results working together. He calls this “AugmentML” vs. “AutoML.” Platform.ai is a case in point. It is a labeling product that allows the labeler to have an interactive “conversation” with a neural network. Each iteration improves both the labels and the model. I’ve never seen anything like it, and it seems to work, at least on the video he shared.
o霍華德問:“為什么人們試圖自動(dòng)化機(jī)器學(xué)習(xí)?” 我們的想法是,我們可以一起獲得更好的結(jié)果。 他將其稱為“ AugmentML”與“ AutoML”。 Platform.ai就是一個(gè)很好的例子。 它是一種貼標(biāo)產(chǎn)品,允許貼標(biāo)者與神經(jīng)網(wǎng)絡(luò)進(jìn)行交互式“對話”。 每次迭代都會(huì)改善標(biāo)簽和模型。 我從未見過類似的東西,而且至少在他分享的視頻上,它似乎奏效了。
o Howard shares a box of tricks for improving model performance, particularly for computer vision tasks. I found Test Time Augmentation (TTA) particularly eye-opening. Will have to try it in my next project.
霍華德(Howard)分享了一些技巧,以提高模型性能,特別是對于計(jì)算機(jī)視覺任務(wù)。 我發(fā)現(xiàn)測試時(shí)間增強(qiáng) (TTA)尤其令人大開眼界。 將不得不在我的下一個(gè)項(xiàng)目中嘗試。
● Andrej Karpathy of Tesla: This talk was interesting as well, although the audio wasn’t great. Karpathy discussed his Software 2.0 concept, the idea that we will increasingly use optimization methods like gradient descent to solve problems probabilistically rather than devising fixed software rules or heuristics to solve them. Like many others, I found this mental model compelling.
● 特斯拉(Tesla)的安德烈(Andrej Karpathy) :這個(gè)演講也很有趣,盡管音頻效果不佳。 Karpathy討論了他的Software 2.0概念,即我們將越來越多地使用梯度下降等優(yōu)化方法來概率地解決問題,而不是設(shè)計(jì)固定的軟件規(guī)則或試探法來解決問題。 像許多其他人一樣,我發(fā)現(xiàn)這種心理模型令人信服。
離別的想法 (Parting Thoughts)
The course is not perfect. A lot of this material was created in 2018 and is starting to show its age. Three examples:
課程并不完美。 許多此類材料創(chuàng)建于2018年,并開始顯示其年代。 三個(gè)例子:
● Richard Socher, chief scientist at Salesforce.com, is arguing for a unified NLP model with something called decaNLP. BERT has since taken over this niche, and GPT3 is an exciting recent development.
●Salesforce.com的首席科學(xué)家Richard Socher主張使用稱為decaNLP的統(tǒng)一NLP模型。 從那以后, BERT接管了這個(gè)利基市場,而GPT3是令人振奮的最新發(fā)展。
● Model Explainability has developed rapidly over the past few years, but is not well represented
●在過去的幾年中, 模型解釋能力得到了快速發(fā)展,但代表性不足
● As mentioned above, Microsoft Azure has been making strides since and does not get a fair shake in my view
●如上所述,自此以來,Microsoft Azure一直在取得長足進(jìn)步,在我看來并沒有引起太大的動(dòng)搖
Despite these nits, I think the course packs a lot of value into a compact and well-organized frame. The price is right, and I recommend it to anyone interested in understanding how AI/ML applications are built.
盡管有這些技巧,但我認(rèn)為該課程將很多價(jià)值打包到一個(gè)緊湊且組織良好的框架中。 價(jià)格合適,我向有興趣了解如何構(gòu)建AI / ML應(yīng)用程序的任何人推薦。
Lastly, and for no particular reason, I hope you will enjoy this thrilling conclusion:
最后,出于特殊原因,我希望您會(huì)喜歡這個(gè)令人振奮的結(jié)論:
翻譯自: https://towardsdatascience.com/moving-ai-to-the-real-world-e5f9d4d0f8e8
ai與虛擬現(xiàn)實(shí)
總結(jié)
以上是生活随笔為你收集整理的ai与虚拟现实_将AI推向现实世界的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 嵌入式和非嵌入式_我如何向非技术同事解释
- 下一篇: Akasa 发布新款下压式散热器:41.