當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习测试_测试优先机器学习

發布時間：2023/12/15 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习测试_测试优先机器学习小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

機器學習測試

Testing software is one of the most complex tasks in software engineering. While in traditional software engineering there are principles that define in a non-ambiguous way how software should be tested, the same does not hold for machine learning, where testing strategies are not always defined. In this post, I elucidate a testing approach that is not only highly influenced by one of the most recognized testing strategies in software engineering — that is test-driven development. But also seems to be an approach that is agnostic from the family of machine learning models under testing, and adapts very well to the typical production environments that lead to the large scale AI/ML services of today.

測試軟件是軟件工程中最復雜的任務之一。盡管在傳統的軟件工程中，有一些原則以明確的方式定義了如何測試軟件，但是對于機器學習而言卻并非如此，因為機器學習并不總是定義測試策略。在本文中，我將闡明一種測試方法，該方法不僅受到軟件工程中最公認的測試策略之一的影響，即測試驅動的開發。但是，這似乎也是一種與正在測試的機器學習模型家族無關的方法，并且非常適合于導致當今大規模AI / ML服務的典型生產環境。

After reading this post, you will learn how to set up a testing strategy that works for machine learning models with production in mind. Production in mind means that the team you are operating in is heterogeneous, the project under testing is developed together with other data scientists, data engineers, business customers, developers, and testers. The goals of a good testing strategy are to achieve production readiness and improve code maintainability.

閱讀這篇文章后，您將學習如何設置一個針對生產的機器學習模型的測試策略。考慮到生產，意味著您所在的團隊是異構的，正在測試的項目是與其他數據科學家，數據工程師，業務客戶，開發人員和測試人員一起開發的。好的測試策略的目標是實現生產就緒并提高代碼可維護性。

An appropriate name of the approach is Test-First machine learning, in short TFML, because everything starts from writing tests, rather than models.

這種方法的合適名稱是TFML(簡稱TFML)，是“ 測試優先”機器學習，因為一切都始于編寫測試，而不是模型。

TFML的步驟 (Steps of TFML)

A characteristic of TFML is to start from writing tests, instead of machine learning models. The approach is based on mocking whatever is not yet available so that different actors involved in the project can proceed with their tasks anyway. It is known that data scientists and data engineers run at a different pace. Mocking a particular aspect of the world that is not yet available not only mitigates such difference but also reduces blockers within larger teams. This, in turn, increases efficiency. Below are the five essential steps of a TFML approach.

TFML的一個特征是從編寫測試開始，而不是從機器學習模型開始。該方法基于模擬尚不可用的內容，以便項目中涉及的不同參與者無論如何都可以繼續執行其任務。眾所周知，數據科學家和數據工程師的運行速度不同。模擬世界上尚不存在的特定方面，不僅可以緩解這種差異，而且可以減少較大團隊中的阻礙者。反過來，這提高了效率。以下是TFML方法的五個基本步驟。

1.編寫測試 (1. Write a test)

As the name suggests, Test-First in TFML indicates that everything starts with writing a test. Even for a feature that does not yet exist. Such a test is usually very short and should stay so. Larger and more complex tests should be broken down to their essential and testable components. A test can be written after understanding the feature’s specs and requirements that are usually discussed earlier during requirement analysis (e.g. use cases and user stories).

顧名思義，TFML中的Test-First表示一切都始于編寫測試。即使對于尚不存在的功能。這樣的測試通常很短，應該保持下去。更大和更復雜的測試應該分解為它們的基本和可測試組件。可以在了解功能的規格和要求之后編寫測試，這些功能通常在需求分析(例如，用例和用戶案例)中進行過討論。

A working test will fail or pass for the right reasons. This is the step in which such reasons are defined. Defining the happy path is essential to defining what should be observed and considered a success.

正常的測試會因正確的原因而失敗或通過。這是定義此類原因的步驟。定義幸福的道路對于定義應觀察和認為成功的事情至關重要。

3.編寫代碼 (3. Write the code)

In this step, the code that leads to the happy path is actually written. This code will cause the test to pass. No other code, beyond the test’s happy path, should be provided. For example, if a machine learning model is expected to return 42, one can just return 42 and force the test to succeed here. If time constraints are needed, adding sleep(milliseconds) is also acceptable. Such mocked values will provide engineers with visible constraints such that they can proceed with their tasks as if the model was complete and working.

在此步驟中，實際編寫了通往幸福道路的代碼。此代碼將導致測試通過。不應提供超出測試滿意范圍的其他代碼。例如，如果預期機器學習模型將返回42，則可以僅返回42并強制測試在此處成功。如果需要時間限制，則增加sleep(milliseconds)也是可以接受的。這樣的模擬值將為工程師提供可見的約束，以便他們可以像完成模型和正常工作一樣繼續執行任務。

4.運行測試 (4. Run tests)

Adding new tests should never break the previous ones. Having tests that depend on each other is considered an anti-pattern in software engineering.

添加新測試永遠不會破壞以前的測試。相互依賴的測試被認為是軟件工程中的反模式。

5.添加功能(+清理+重構) (5. Add functionality (+ cleanup + refactor))

When values are mocked, success conditions are defined and tests are running, it’s time to show that the ML model under testing is training and performing predictions. Related to the example above, some questions that should find an answer in this step are:

當模擬值，定義成功條件并運行測試時，是時候表明正在測試的ML模型正在訓練和執行預測。與上面的示例相關，在此步驟中應該找到答案的一些問題是：

Is the test breaking the constraints we set previously?
測試是否突破了我們先前設定的限制？
Is our ML model returning 84 rather than 42?
我們的ML模型返回84而不是42嗎？
How about time constraints?
時間限制如何？

Traditionally, in this step developers perform code cleanup, deduplication, and refactoring (whenever it applies), to improve both readability and maintainability. This strategy should be applied to ML developers too.

傳統上，開發人員在此步驟中執行代碼清除，重復數據刪除和重構(只要適用)，以提高可讀性和可維護性。該策略也應適用于ML開發人員。

Falling in the trap of alternative approaches is easier in machine learning due to its nature and the enthusiasm of data scientists who connect-train-analyze data in no time.

在替代方法的陷阱下降，由于其性質和誰的數據科學家的熱情是在機器學習更容易connect-train-analyze在任何時間的數據。

The most common approach in the data science community is probably the Test-Last approach a.k.a. code now, test later. This approach can be extremely risky in ML model development, since even for a trivial linear regression there might be just too many moving parts, compared with traditional software (e.g. UI, API calls, data streams, databases, preprocessing steps, etc.) As a matter of fact, the Test-First approach encourages and forces developers to put the minimum amount of code into modules depending on such moving parts (e.g. UIs and databases) and to implement the logic that should belong to the testable section of the codebase.

數據科學界中最普遍的方法可能是現在的Test-Last方法，也稱為代碼，稍后再測試。這種方法在ML模型開發中可能具有極大的風險，因為與傳統軟件(例如，UI，API調用，數據流，數據庫，預處理步驟等)相比，即使對于微不足道的線性回歸，移動部分也可能太多。實際上，“ 測試優先”方法鼓勵并迫使開發人員根據此類活動部分(例如，UI和數據庫)將最少的代碼放入模塊中，并實施應屬于代碼庫可測試部分的邏輯。

One important pitfall to avoid is developer bias. Tests created in a Test-First environment are usually created by the same developer who is writing the code being tested. This can be a problem e.g. if a developer does not consider certain input parameters to be checked. In that case, neither the test nor the code will verify such parameters. There is a reason why in traditional software development, testing engineers and developers are usually not the same individuals.

要避免的一個重要陷阱是開發人員的偏見。在“測試優先”環境中創建的測試通常由編寫測試代碼的同一開發人員創建。例如，如果開發人員不考慮某些輸入參數，則可能會出現問題。在這種情況下，測試和代碼都不會驗證此類參數。在傳統的軟件開發中，測試工程師和開發人員通常不是同一個人，這是有原因的。

TFML反模式 (TFML anti-patterns)

Below are some anti-patterns in TFML.

以下是TFML中的一些反模式。

測試依賴 (Test dependence)

Tests should be standalone. Tests that depend on others can lead to cascading failures or success out of the developer’s control.

測試應該是獨立的。依賴其他測試的測試可能會導致級聯的失敗或成功，而這是開發人員無法控制的。

精確測試模型 (Test model precisely)

As in traditional software engineering, testing precise execution behavior, timing or performance can lead to test failure. In machine learning, it is even more important to consider soft constraints because models can be probabilistic. Moreover, the ranges of output variables and input data can change. Such a dynamic and sometimes loosely defined behavior is the norm rather than the exception in ML.

與傳統軟件工程中一樣，測試精確的執行行為，時序或性能可能會導致測試失敗。在機器學習中，考慮軟約束更為重要，因為模型可能是概率性的。而且，輸出變量和輸入數據的范圍可以改變。這種動態的，有時是寬松定義的行為是規范，而不是ML中的例外。

測試模型的數學細節 (Test model’s mathematical details)

Testing model implementation details such as statistical and mathematical soundness are not part of the TFML strategy. Such details should be tested separately and are specific to the family of the model under consideration.

測試模型實現的詳細信息(例如統計和數學上的正確性)不是TFML策略的一部分。此類詳細信息應單獨測試，并且特定于所考慮的模型系列。

大型測試裝置 (Large testing unit)

The testing surface should always be minimal for the functionality under test. Keeping the testing unit small gives more control to the developer. Larger testing units should be broken down into smaller tests, specialized in one particular aspect of the models to be tested.

對于被測功能，測試表面應始終保持最小。保持測試單元較小可以為開發人員提供更多控制權。較大的測試單元應細分為較小的測試，專門針對要測試的模型的特定方面。

結論 (Conclusion)

The TFML approach forces developers to spend initial time defining the testing strategy for their models. This in turn facilitates the integration of such models in the bigger picture of complex engineering systems where larger teams are involved. It has been observed that programmers who write more tests tend to be more productive. Testing code is as important as developing software core functionality. Testing code should be produced and maintained with the same rigor as production code. In ML all this becomes even more critical, due to the heterogeneity of the systems and the people involved in ML projects.

TFML方法迫使開發人員花費初始時間來定義其模型的測試策略。反過來，這有助于在涉及較大團隊的復雜工程系統的更大范圍內集成此類模型。據觀察，編寫更多測試的程序員往往會提高工作效率。測試代碼與開發軟件核心功能一樣重要。測試代碼的生產和維護應與生產代碼相同。在ML中，由于系統和參與ML項目的人員的異質性，所有這些變得更加關鍵。

Originally published at https://codingossip.github.io on August 4, 2020.

最初于 2020年8月4日發布在 https://codingossip.github.io 。

翻譯自: https://medium.com/swlh/test-first-machine-learning-8d2cadc3ffe