當前位置：首頁 > 人工智能 > ChatGpt >内容正文

ChatGpt

AI：《Why is DevOps for Machine Learning so Different?—为什么机器学习的 DevOps 如此不同？》翻译与解读

發布時間：2025/3/21 ChatGpt 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 AI：《Why is DevOps for Machine Learning so Different?—为什么机器学习的 DevOps 如此不同？》翻译与解读小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

AI：《Why is DevOps for Machine Learning so Different?—為什么機器學習的 DevOps 如此不同？》翻譯與解讀

《Why is DevOps for Machine Learning so Different?》翻譯與解讀

Current State of DevOps vs MLOps

Why So Different?

Workflows

Training

Live Predictions and Model Serving

Rollout推廣

Monitoring

Governance治理

Summary

《Why is DevOps for Machine Learning so Different?》翻譯與解讀

文章地址：Why is DevOps for Machine Learning so Different? | Hacker Noon
發布時間：2019 年 11 月 12 日

The term ‘MLOps’ is appearing more and more. Many from a traditional DevOps background might wonder why this isn’t just called ‘DevOps’. In this article we’ll explain why MLOps is so different from mainstream DevOps and see why it poses new challenges for the industry.

We'll see that the key differences between DevOps and MLOps come from how machine learning uses data. We'll see that the need to handle data volume, transformation and quality affects the whole MLOps lifecycle.

“MLOps”一詞越來越多地出現。許多有傳統DevOps背景的人可能會問，為什么這不這是叫做“DevOps”。在本文中，我們將解釋為什么MLOps與主流的DevOps如此不同，并了解為什么它會給行業帶來新的挑戰。

我們將看到DevOps和MLOps之間的關鍵區別來自于機器學習如何使用數據。我們將看到處理數據量、轉換和質量的需求會影響整個MLOps生命周期。

Current State of DevOps vs MLOps

圖源自：https://yunyaniu.blog.csdn.net/article/details/79767367

DevOps is a well-established set of practices to ensure smooth build-deploy-monitor cycles. It is based around CI/CD and infrastructure. The space of tools includes git, Jenkins, Jira, docker, kubernetes etc.:

DevOps是一套完善的實踐，可以確保構建-部署-監控周期的順利進行。它基于CI/CD和基礎設施。工具的空間包括git、Jenkins、Jira、docker、kubernetes等：

MLOps has not achieved the same level of maturity. As much as 87% of machine learning projects never go live.

ML infrastructure is complex and workflows extend beyond production of artifacts to include data collection, prep and validation. The types of hardware resources involved can be specialised (e.g. GPUs) and require management. The data flowing through the model and the quality of predictions can also require monitoring, resulting in a complex MLOps landscape:

MLOps 尚未達到相同的成熟度。多達 87% 的機器學習項目從未上線。ML基礎設施復雜，工作流程超出工件的生產范圍，包括數據收集、準備和驗證。所涉及的硬件資源的類型可以專門化(例如gpu)并且需要管理。在模型中流動的數據和預測的質量也可能需要監控，從而導致復雜的MLOps場景:

Why So Different?

The driver behind all these differences can be found in what machine learning is and how it is practised. Software performs actions in response to inputs and in this ML and mainstream programming are alike. But the way actions are codified differs greatly.

Traditional software codifies actions as explicit rules. The simplest programming examples tend to be ‘hello world’ programs that simply codify that a program should output ‘hello world’. Further control structures can then be added to add more complex ways to perform actions in response to inputs. As we add more control structures, we learn more of the programming language. This rule-based input-output pattern is easy to understand in relation to older terminal systems where inputs are all via the keyboard and outputs are almost all text. But it also true of most of the software we interact with, though the types of inputs and outputs can be very diverse and complex.

所有這些差異背后的驅動因素可以從什么是機器學習以及如何進行機器學習中找到。軟件根據輸入執行操作，在這種ML和主流編程中是相似的。但行為的編纂方式卻大不相同。

傳統的軟件將行為編成明確的規則。最簡單的編程例子往往是' hello world '程序，它只是簡單地編寫一個程序應該輸出' hello world '。然后可以添加進一步的控制結構，以添加更復雜的方式來執行響應輸入的操作。隨著我們添加更多的控制結構，我們會學到更多的編程語言。與舊的終端系統相比，這種基于規則的輸入輸出模式很容易理解，舊的終端系統的輸入都是通過鍵盤，輸出幾乎都是文本。盡管輸入和輸出的類型可能是非常多樣化和復雜的，但我們所接觸的大多數軟件也是如此。

ML does not codify explicitly. Instead rules are indirectly set by capturing patterns from data. This makes ML more suitable for a more focused type of problem that can be treated numerically. For example, predicting salary from data points/features such as experience, education, location etc. This is a case of a regression problem, where the aim is to predict the value of a variable (salary) from the values of other variables by use of previous data. Machine learning is also used for classification problems, where instead of predicting a value for a variable, instead the model outputs a probability that a data point falls into a particular class. Example classification problems are:

Given hand-written samples for numbers, predict which number is which.

Classify images of objects according to category e.g. types of flowers

We don’t need to understand all the details here of how ML is done. However, it will help to have a picture of how ML models are trained. So let’s consider at a high level what is involved in a regression problem such as predicting salary from data for experience, education, location etc. This can be addressed by programmatically drawing a line through the data points:

The line is embodied in an equation:

ML沒有明確地進行編碼。相反，規則是通過從數據中捕獲模式來間接設置的。這使得ML更適合于可以用數值方法處理的更集中類型的問題。例如，根據經驗、教育程度、工作地點等數據點/特征預測薪水。這是一個回歸問題的例子，其目的是通過使用以前的數據從其他變量的值預測一個變量(工資)的值。機器學習也被用于分類問題，在這里，模型輸出的不是預測變量的值，而是數據點屬于特定類別的概率。分類問題的例子有:

給出手寫的數字樣本，預測哪個數字是哪個。

根據類別對物體的圖像進行分類，例如花的種類。

這里我們不需要了解ML是如何完成的所有細節。然而，它將有助于了解ML模型是如何訓練的。所以，讓我們從更高的層次來考慮回歸問題中涉及到什么，比如根據經驗、教育程度、地點等數據來預測工資。這可以通過編程在數據點上畫一條線來解決:

這條線體現在一個方程中:

The coefficients/weights get set to initial values (e.g. at random). The equation can then be used on the training data set to make predictions. In the first run the predictions are likely to be poor. Exactly how poor can be measured in the error, which is the sum of the distances of all the output variable (e.g. salary) samples from the prediction line. We can then update the weights to try to reduce the error and repeat the process of making new predictions and updating the weights. This process is called 'fitting' or 'training' and the end result is a set of weights that can be used to make predictions.

So the basic picture centres on running training iterations to update weights to progessively improve predictions. This helps to reveal how ML is different from traditional programming. The key points to take away from this from a DevOps perspective are:

The training data and the code together drive fitting.

The closest thing to an executable is a trained/weighted model. These vary by ML toolkit (tensorflow, sc-kit learn, R, h2o, etc.) and model type.

Retraining can be necessary. For example, if your model is making predictions for data that varies a lot by season, such as predictions for how many items of types of clothing will sell in a month. In that case training on data from summer may give good predictions in summer but will not give good predictions in winter.

Data volumes can be large and training can take a long time.

The data scientist’s working process is exploratory and visualisations can be an important part of it.

This leads to different workflows for traditional programming and ML development.

系數/權重被設置為初始值(例如隨機)。然后，可以在訓練數據集上使用該方程來進行預測。第一輪的預測可能會很糟糕。誤差到底有多差可以衡量，誤差是所有輸出變量(如工資)樣本與預測線之間距離的總和。然后，我們可以更新權值來嘗試減少錯誤，并重復進行新的預測和更新權值的過程。這個過程被稱為“擬合”或“訓練”，最終結果是一組可以用來進行預測的權重。

因此，基本圖像的中心是運行訓練迭代，以更新權值來逐步改善預測。這有助于揭示ML與傳統編程的區別。從DevOps的角度來看，要點如下:

(1)、訓練數據和代碼一起驅動擬合。

(2)、最接近可執行文件的是經過訓練的/加權模型。這些根據ML工具包(tensorflow, sc-kit learn, R, h2o等)和模型類型而有所不同。

(3)、再訓練可能是必要的。例如，如果您的模型正在預測隨季節變化很大的數據，例如預測一個月內將銷售多少種服裝。在這種情況下，利用夏天的數據進行訓練可能會在夏天給出很好的預測，但在冬天卻不會給出很好的預測。

(4)、數據量可能很大，訓練可能需要很長時間。

(5)、數據科學家的工作過程是探索性的，可視化可以是其中重要的一部分。

這導致了傳統編程和ML開發的不同工作流。

去本文章中摘取原動圖

Workflows

Consider a traditional programming a workflow:

User Story

Write code

Submit PR

Tests run automatically

Review and merge

New version builds

Built executable deployed to environment

Further tests

Promote to next environment

More tests etc.

PROD

Monitor - stacktraces or error codes

考慮一個傳統的編程工作流:

(1)、用戶故事

(2)、編寫代碼

(3)、提交PR

(4)、自動運行測試

(5)、審查和合并

(6)、新版本的構建

(7)、構建可執行文件部署到環境中

(8)、進一步的測試

(9)、提升到下一個環境

(10)、更多的測試等。

(11)、生產線

(12)、監視器-堆棧跟蹤或錯誤代碼

The trigger for a build is a code change in git. The packaging for an executable is normally docker.

With machine learning the driver for a build might be a code change. Or it might be new data. The data likely won’t be in git due to its size.

Tests on ML are not likely to be a simple pass/fail since you’re looking for quantifiable performance. One might choose to express performance numerically with an error level. The level can vary a lot by business context. For example, consider a model that predicts a likelihood of a financial transaction being fraudulent. Then there may be little risk in predicting good transactions as fraudulent so long as the customer is not impacted directly (there may be a manual follow-up). But predicting bad transactions as good could be very high risk.

觸發構建的是git中的代碼更改。可執行文件的包裝通常是docker。

通過機器學習，構建的驅動程序可能是代碼更改。也可能是新的數據。數據可能不會在git中，因為它的大小。

ML上的測試不太可能是簡單的通過/失敗，因為您正在尋找可量化的性能。人們可能會選擇用錯誤級別以數字方式表達性能。該級別可能因業務環境而有很大差異。例如，考慮一個預測金融交易欺詐可能性的模型。然后，只要客戶不直接受到影響（可能需要手動跟進），將良好交易預測為欺詐的風險就很小。但將壞交易預測為好交易可能會帶來很高的風險。

The ML workflow can also differ depending on whether the model can learn while it is being used (online learning) or if the training takes place separately from making live predictions (offline learning). For simplicity let’s assume the training takes place separately.

A high-level MLOps workflow could look like:

Data inputs and outputs. Preprocessed. Large.

Data scientist tries stuff locally with a slice of data.

Data scientist tries with more data as long-running experiments.

Collaboration - often in jupyter notebooks & git

Model may be pickled/serialized

Integrate into a running app e.g. add REST API (serving)

Integration test with app.

Rollout and monitor performance metrics

ML 工作流程也可能有所不同，具體取決于模型是否可以在使用時學習（在線學習），或者訓練是否與實時預測分開進行（離線學習）。為了簡單起見，讓我們假設訓練是單獨分開進行的。

一個高級的MLOps工作流程應該是這樣的:

(1)、數據輸入和輸出。預處理。大。

(2)、數據科學家用一片數據在本地嘗試一些東西。

(3)、數據科學家嘗試使用更多的數據作為長期運行的實驗。

(4)、協作-經常在jupyter notebooks和git中

(5)、模型可以被pickle /序列化

(6)、集成到一個正在運行的應用中，例如添加REST API(服務)

(7)、與應用程序集成測試。

(8)、展示和監視性能指標

The monitoring for performance metrics part can be particularly challenging and may involve business decisions. Let’s say we have a model being used in an online store and we’ve produced a new version. In these cases it is common to check the performance of the new version by performing an A/B test. This means that a percentage of live traffic is given to the existing model (A) and a percentage to the new model (B). Let’s say that over the period of the A/B test we find that B leads to more conversions/purchases. But what if it also correlates with more negative reviews or more users leaving the site entirely or is just slower to respond to requests? A business decision may be needed.

The role of MLOps is to support the whole flow of training, serving, rollout and monitoring. Let's better understand the differences from mainstream DevOps by looking at some MLOps practices and tools for each stage of this flow.

性能指標的監視部分可能特別具有挑戰性，并且可能涉及業務決策。假設我們有一個模型正在網上商店中使用，并且我們已經制作了一個新版本。在這些情況下，通常通過執行A/B測試來檢查新版本的性能。這意味著現有模型(A)和新模型(B)分別獲得一定比例的實時流量。假設在A /B測試期間，我們發現B能夠帶來更多的轉化率/購買。但如果這也與更多的負面評論有關，或者更多的用戶完全離開網站，或者只是響應請求較慢有關，該怎么辦？可能需要一個業務商業決策。

MLOps的作用是支持訓練、服務、展示和監控的整個流程。讓我們通過查看這個流程每個階段的一些MLOps實踐和工具，讓我們更好地理解它與主流DevOps的區別。

Training

Initially training jobs can be run on the data scientist’s local machine. But as the size of the dataset or processing grows then a tool will be needed that can leverage specialised cloud hardware, parallelize steps and allow long-running jobs to run unattended. One tool for this is kubeflow pipelines:

最初的訓練工作可以在數據科學家的本地機器上運行。但隨著數據集或處理規模的增長，就需要一種工具來利用專門的云硬件，并行化步驟，并允許長時間運行的作業在無人看管的情況下運行。其中一個工具就是kubeflow管道:

Steps can be broken out as reusable operations and run in parallel. This helps address needs for steps to split the data into segments and apply cleaning and pre-processing on the data. The UI allows for inspecting the progress of steps. Runs can be given different parameters and executed in parallel. This allows data scientists to experiment with different parameters and see which result in a better model. Similar functionality is provided by MLFlow experiments, polyaxon and others.

Many training platforms can be hooked up with Continuous Integration. For example, a training run could be triggered on a commit to git and the model could be pushed to make live predictions. But deciding whether a model is good for live use can involve a mixture of factors. It might be that the main factors can be tested adequately at the training stage (e.g. model accuracy on test data). Or it might be that only initial checks are done at the training stage and the new version is only cautiously rolled out for live predictions. We’ll look at rollout and monitoring later - first we should understand what live predictions can mean.

步驟可以分解為可重復使用的操作并并行運行。這有助于解決將數據分割為多個段并對數據應用清理和預處理的步驟的需求。UI允許檢查步驟的進度。運行可以被賦予不同的參數并并行執行。這使得數據科學家可以用不同的參數進行實驗，看看哪個模型更好。MLFlow實驗、polyaxon等也提供了類似的功能。

許多訓練平臺都可以與持續集成相連接。例如，可以在提交到git時觸發一個訓練運行，并且可以推動模型進行實時預測。但是，決定一個模型是否適合實際使用需要考慮多種因素。可能主要因素可以在訓練階段進行充分的測試(例如測試數據的模型準確性)。或者可能是在訓練階段只進行了初步的檢查，新版本只是謹慎地推出實時預測。稍后我們將討論部署和監控——首先我們應該理解實時預測的含義。

Live Predictions and Model Serving

For some models the predictions are made on a file of data points or a new file each week. This kind of scenario is called offline predictions. In other cases predictions need to be made on demand. For this live use-cases typically the model is made available to respond to HTTP requests. This is called real-time serving.

One approach to serving is to package a model by serializing it as a python pickle file and hosting that for the serving solution to load it. For example, this is serving manifest for kubernetes using the Seldon serving solution (a tool on which I work):

apiVersion: machinelearning.seldon.io/v1alpha2

kind: SeldonDeployment

metadata:

??name: sklearn

spec:

??name: iris

??predictors:

??- graph:

??????children: []

??????implementation: SKLEARN_SERVER

??????modelUri: gs://seldon-models/sklearn/iris

??????name: classifier

????name: default

????replicas: 1

對于一些模型，預測是每周對數據點文件或新文件進行的。這種情況稱為離線預測。在其他情況下，需要按需進行預測。對于這個實時用例，通常可以使用模型來響應HTTP請求。這被稱為實時服務。

提一種提供服務的方法是通過將模型序列化為 python pickle 文件并將其托管以供服務解決方案加載來打包模型。例如，這是使用Seldon服務解決方案(我使用的一個工具)為kubernetes服務的清單:

apiVersion: machinelearning.seldon.io/v1alpha2

kind: SeldonDeployment

metadata:

??name: sklearn

spec:

??name: iris

??predictors:

??- graph:

??????children: []

??????implementation: SKLEARN_SERVER

??????modelUri: gs://seldon-models/sklearn/iris

??????name: classifier

????name: default

????replicas: 1

The ‘SeldonDeployment’ is a kubernetes custom resource. Within that resource it needs to be specified which toolkit was used to build the model (here sci-kit learn) and where to obtain the model (in this case a google storage bucket).

Some serving solutions also cater for the model to be baked into a docker image but python pickles are common as a convenient option for data scientists. Submitting the serving resource to kubernetes will make an HTTP endpoint available that can be called to get predictions. Often the serving solution will automatically apply any needed routing/gateway configuration needed, so that data scientists don’t have to do so manually.

“SeldonDeployment”是kubernetes的一個自定義資源。在該資源中，需要指定使用哪個工具箱來構建模型(在這里scikitlearn)以及從哪里獲取模型(在本例中是谷歌的storage bucket)。

一些服務解決方案也可以將模型放入docker映像中，但python pickle通常是數據科學家的方便選擇。將服務資源提交給kubernetes將使一個HTTP端點可用，可以調用該端點來獲得預測。通常，服務解決方案會自動應用所需的任何路由/網關配置，因此數據科學家不必手動這樣做。

Rollout推廣

Self-service for data scientists can also be important for rollout. This can need careful handling because the model has been trained on a particular slice of data and that data might turn out to differ from live. The key strategies used to reduce the risk of this are:

1) Canary rollouts

With a canary rollout a percentage of the live traffic is routed to the new model while most of the traffic goes to the existing version. This is run for a short period of time as a check before switching all traffic to the new model.

2) A/B Test

With an A/B test the traffic is split between two versions of a model for a longer period of time. The test may run until a sufficient sample size is obtained to compare metrics for the two models. For some serving solutions (e.g. Seldon, KFServing) the traffic-splitting part of this can be handled by setting percentage values in the serving resource/descriptor. Again, this is to enable data scientists to set this without getting into the details of traffic-routing or having to make a request to DevOps.

3) Shadowing

With shadowing all traffic is sent to both existing and new versions of the model. Only the existing/live version of the model’s predictions are returned as responses to live requests. The non-live model’s predictions are not returned and instead are just tracked to see how well it is performing.

數據科學家的自助式服務對于推廣也很重要。這可能需要謹慎處理，因為模型是針對特定的數據片進行訓練的，而該數據可能與實際數據有所不同。降低這種風險的關鍵策略有:

1)、Canary的發布

隨著canary的推出，一部分實時流量被流向到新模型，而大部分流量則被流向到現有版本。在將所有流量切換到新模型之前，這將運行一小段時間作為檢查。

2) 、A / B測試

通過A/B測試，流量在一個模型的兩個版本之間分配更長的時間。測試可能會一直運行，直到獲得足夠的樣本大小來比較兩個模型的指標。對于某些服務解決方案(例如Seldon, KFServing)，流量分流部分可以通過在服務資源/描述符中設置百分比值來處理。同樣，這是為了讓數據科學家能夠在不涉及流量路由細節或向DevOps提出請求的情況下進行設置。

3)、跟蹤

通過跟蹤，所有流量都被發送到模型的現有版本和新版本。只有模型預測的現有/實時版本將作為實時請求的響應返回。非實時模型的預測不會被返回，而是被跟蹤以查看其表現如何。

Monitoring

Deciding between different versions of a model naturally requires monitoring.

With mainstream web apps it is common to monitor requests to pick up on any HTTP error codes or an increase in latency. With machine learning the monitoring can need to go much deeper into domain-specific metrics. For example, for a model making recommendations on a website it can be important to track metrics such as how often a customer makes a purchase vs chooses not to make a purchase or goes to another page vs leaves the site.

It can also be important to monitor the data points in the requests to see whether they are approximately in line with the data that the model was trained on. If a particular data point is radically different from any in the training set then the quality of prediction for that data point could be poor.

It is termed an ‘outlier’ and in cases where poor predictions carry high risk then it can be valuable to monitor for outliers.

If a large number of data points differ radically from the training data then the model risks giving poor predictions across the board - this is termed ‘concept drift’.

Monitoring for these can be advanced as the boundaries for outliers or drift may take some experimenting to decide upon.

在模型的不同版本之間做出決定自然需要監控。

在主流的web應用程序中，監控請求以獲取任何 HTTP 錯誤代碼或延遲增加是很常見的。通過機器學習，監控可能需要更深入地研究特定領域的指標。例如，對于一個在網站上做推薦的模型來說，跟蹤一些指標是很重要的，比如客戶購買的頻率，選擇不購買的頻率，或者訪問另一個頁面的頻率，離開網站的頻率。

監控請求中的數據點也很重要，以查看它們是否與模型所訓練的數據大致一致。如果一個特定的數據點與訓練集中的任何數據點完全不同，那么該數據點的預測質量可能很差。

它被稱為“異常值”，在糟糕的預測帶有高風險的情況下，監測異常值是有價值的。

如果大量的數據點與訓練數據完全不同，那么模型就有可能在所有方面給出糟糕的預測——這被稱為“概念漂移”。

由于異常值或漂移的邊界可能需要一些實驗來確定，因此可以進一步監測這些異常值。

For metrics that can be monitored in real-time it may be sufficient to expose dashboards with a tool such as grafana. However, sometimes the information that reveals whether a prediction was good or not is only available much later. For example, there may be a customer account opening process that flags a customer as risky. This could lead to a human investigation and only later will it be decided whether the customer was risky or not. For this reason it can be important to log the entire request and the prediction and also store the final decision. Then offline analysis run over a longer period can provide a wider view of how well the model is performing.

Support for custom metrics, request logging and advanced monitoring varies across serving solutions. In some cases a serving solution comes with out of the box integrations (e.g. Seldon) and in other cases the necessary infrastructure may have to be setup and configured separately.

對于可以實時監控的指標，使用諸如grafana之類的工具公開儀表板可能就足夠了。然而，有時揭示預測好壞的信息要很久以后才能得到。可能存在將客戶標記為有風險的客戶開戶流程。這可能會導致人工調查，只有稍后才能確定客戶是否有風險。由于這個原因，記錄整個請求和預測并存儲最終決策可能很重要。然后，在更長的時間內運行離線分析可以更廣泛地了解模型的執行情況。

對自定義指標、請求日志記錄和高級監控的支持因服務解決方案而異。在某些情況下，提供服務的解決方案帶有開箱即用的集成(例如Seldon)，而在其他情況下，可能必須單獨設置和配置必要的基礎設施。

Governance治理

If something goes wrong with running software then we need to be able to recreate the circumstances of the failure. With mainstream applications this means tracking which code version was running (docker image), which code commit produced it and the state of the system at the time. That enables a developer to recreate that execution path in the source code. This is reproducibility.

Achieving reproducibility for machine learning involves much more. It involves knowing what data was sent in (full request logging), which version of the model was running (likely a python pickle), what source code was used to build it, what parameters were set on the training run and what data was used for training. The data part can be particularly challenging as this means retaining the data from every training run that goes to live and in a form that can be used to recreate models. So any transformations on the training data would need to be tracked and reproducible.

如果軟件運行出現問題，那么我們需要能夠重新創建失敗的情況。對于主流應用程序，這意味著跟蹤正在運行的代碼版本(docker 鏡像)，哪個代碼提交產生了它，以及當時的系統狀態。這使開發人員能夠在源代碼中重新創建該執行路徑。這是再現性。

實現機器學習的可再現性涉及更多。它涉及到知道發送了什么數據(完整的請求日志記錄)，模型的哪個版本正在運行(可能是python pickle)，使用什么源代碼來構建它，在訓練運行中設置了什么參數，以及在訓練中使用了什么數據。數據部分可能特別具有挑戰性，因為這意味著要保留每一次運行的訓練的數據，并且以一種可以用于重建模型的形式保存這些數據。因此，訓練數據上的任何轉換都需要被跟蹤和重現。

The tool scene for tracking across the ML lifecycle is currently dynamic. There are tools such as ModelDB, kubeflow metadata, pachyderm and Data Version Control (DVC), among others. As yet few standards have emerged as to what to track and how to track it. Typically platforms currently just integrate to a particular chosen tool or leave it to the users of the platform to build any tracking they need into their own code.

There are also wider governance challenges for ML concerning bias and ethics. Without care models might end up being trained using data-points that a human would consider unethical to use in decision-making. For instance, a loan approval system might be trained on historic loan repayment data. Without a conscious decision about which data points are to be used, it might end up making decisions based on Race or Gender.

Given concerns about bias, some organisations are putting an emphasis on being able to explain why a model made the prediction that it did in a given circumstance. This goes beyond reproducibility as being able to explain why a prediction was made can be a data science problem in itself ('explainability'). Some types of models such as neural networks are being referred to as ‘black box’ as it is not easy to see why a prediction would come about from inspecting their internal structure. There are black-box explanation techniques emerging (such as Seldon's Alibi library) but for now many organisations for whom explainability is a key concern are currently sticking to white box modelling techniques.

用于跨ML生命周期跟蹤的工具場景目前是動態的。有諸如ModelDB、kubeflow元數據、pachyderm和數據版本控制(DVC)等工具。迄今為止，關于跟蹤什么以及如何跟蹤的標準還很少。通常，目前的平臺只是集成到特定選擇的工具，或者讓平臺用戶將他們需要的任何跟蹤構建到他們自己的代碼中。

ML在偏見和道德方面也面臨更廣泛的治理挑戰。如果沒有護理模型，最終可能會使用人類認為在決策中使用不道德的數據點進行訓練。例如，可以根據歷史貸款還款數據訓練貸款審批系統。如果沒有關于要使用哪些數據點的有意識的決定，它可能最終會根據種族或性別做出決定。

考慮到對偏見的擔憂，一些組織正在強調能夠解釋為什么模型會在給定情況下做出預測。這超越了可重復性，因為能夠解釋為什么做出預測本身就是一個數據科學問題(“可解釋性”)。神經網絡等一些類型的模型被稱為“黑匣子”，因為很難通過檢查它們的內部結構來了解為什么會產生預測。雖然出現了一些黑箱解釋技術(例如 Seldon 的 Alibi 庫)，但目前許多將可解釋性視為關鍵問題的組織目前都堅持使用白盒建模技術。

Summary

MLOps is an emerging area. MLOps practices are distinct from mainstream DevOps because the ML development lifecycle and artifacts are different. Machine learning works by using patterns from training data - this makes the whole MLOps workflow sensitive to data changes, volumes and quality.

There are a wide range of MLOps tools available but most are young and compared with mainstream DevOps the tools may not yet interoperate very well. There are some initiatives towards standardisation but currently the landscape is quite splintered with big commercial players (including major cloud providers) each focusing primarily on their own end-to-end ML platform offering. Large organisations are having to choose whether an end-to-end offering meets their machine learning platform needs or if they instead want to assemble a platform themselves from individual (likely open source) tools.

MLOps是一個新興的領域。MLOps 實踐與主流 DevOps 不同，因為?ML 開發生命周期和工件是不同的。機器學習通過使用訓練數據中的模式來工作——這使得整個MLOps工作流對數據的變化、數量和質量都很敏感。

有多種MLOps工具可用，但大多數都很年輕，與主流的DevOps相比，這些工具可能還不能很好地互操作。有一些標準化舉措，但目前的格局與大型商業參與者（包括主要云提供商）相當分裂，每個參與者主要專注于自己的端到端 ML 平臺產品。大型組織不得不選擇端到端的產品是滿足他們的機器學習平臺需求，或者他們想用單獨的（可能是開源的）工具自己組裝一個平臺。

總結

以上是生活随笔為你收集整理的AI：《Why is DevOps for Machine Learning so Different?—为什么机器学习的 DevOps 如此不同？》翻译与解读的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：成功解决A value is tryin
下一篇： Algorithm：【Algorithm