我们如何在Pinterest Ads中使用AutoML,多任务学习和多塔模型
Ernest Wang | Software Engineer, Ads Ranking
歐內(nèi)斯特·王| 軟件工程師,廣告排名
People come to Pinterest in an exploration mindset, often engaging with ads the same way they do with organic Pins. Within ads our mission is to help Pinners go from inspiration to action by introducing them to the compelling products and services that advertisers have to offer. A core component of the ads marketplace is predicting engagement of Pinners based on the ads we show them. In addition to click prediction, we look at how likely a user is to save or hide an ad. We make these predictions for different types of ad formats (image, video, carousel) and in context of the user (e.g., browsing the home feed, performing a search, or looking at a specific Pin.)
人們以探索的心態(tài)來(lái)到Pinterest,通常以與有機(jī)Pins相同的方式參與廣告。 在廣告中,我們的任務(wù)是通過(guò)向廣告客戶提供引人注目的產(chǎn)品和服務(wù),幫助Pinners從靈感變?yōu)樾袆?dòng)。 廣告市場(chǎng)的核心組成部分是根據(jù)我們展示給他們的廣告來(lái)預(yù)測(cè)Pinner的參與度。 除了點(diǎn)擊預(yù)測(cè)之外,我們還將研究用戶保存或隱藏廣告的可能性。 我們針對(duì)不同類型的廣告格式(圖像,視頻,輪播)以及針對(duì)用戶的情況(例如,瀏覽家庭供稿,執(zhí)行搜索或查看特定的Pin)做出這些預(yù)測(cè)。
In this blog post, we explain how key technologies, such as AutoML, DNN, Multi-Task Learning, Multi-Tower models, and Model Calibration, allow for highly performant and scalable solutions as we build out the ads marketplace at Pinterest. We also discuss the basics of AutoML and how it’s used for Pinterest Ads.
在此博客文章中,我們將解釋在我們建立Pinterest廣告市場(chǎng)時(shí),諸如AutoML,DNN,多任務(wù)學(xué)習(xí),多塔模型和模型校準(zhǔn)之類的關(guān)鍵技術(shù)如何提供高性能和可擴(kuò)展的解決方案。 我們還將討論AutoML的基礎(chǔ)知識(shí)以及如何將其用于Pinterest Ads。
自動(dòng)語(yǔ)言 (AutoML)
Pinterest’s AutoML is a self-contained deep learning framework that powers feature injection, feature transformation, model training and serving. AutoML features a simple descriptive template to fuse varieties of pre-implemented feature transforms such that the deep neural networks are able to learn from raw signals. This significantly eases the human labor in feature engineering. AutoML also provides rich model representations where state-of-the-art machine learning techniques are employed. We developed ads CTR prediction models with AutoML, which has resulted in substantial outcomes.
PinterestAutoML是一個(gè)獨(dú)立的深度學(xué)習(xí)框架,可支持功能注入,功能轉(zhuǎn)換,模型訓(xùn)練和服務(wù)。 AutoML具有一個(gè)簡(jiǎn)單的描述性模板,可以融合各種預(yù)先實(shí)現(xiàn)的特征轉(zhuǎn)換,從而使深度神經(jīng)網(wǎng)絡(luò)能夠從原始信號(hào)中學(xué)習(xí)。 這極大地減輕了要素工程中的人力。 AutoML還使用最先進(jìn)的機(jī)器學(xué)習(xí)技術(shù),提供了豐富的模型表示。 我們使用AutoML開(kāi)發(fā)了廣告點(diǎn)擊率預(yù)測(cè)模型,從而產(chǎn)生了可觀的結(jié)果。
特征處理 (Feature processing)
While many data scientists and machine learning engineers believe that feature engineering is more of an art than science, AutoML finds many common patterns in this work and automates the process as much as possible. Deep learning theory has demonstrated that deep neural networks (DNN) can approximate arbitrary functions if provided enough resources. AutoML leverages this advantage and enables us to directly learn from raw features by applying a series of predefined feature transform rules.
盡管許多數(shù)據(jù)科學(xué)家和機(jī)器學(xué)習(xí)工程師認(rèn)為功能工程比科學(xué)更是一門(mén)藝術(shù),但AutoML在這項(xiàng)工作中發(fā)現(xiàn)了許多常見(jiàn)的模式,并盡可能地使過(guò)程自動(dòng)化。 深度學(xué)習(xí)理論表明,如果提供足夠的資源,深度神經(jīng)網(wǎng)絡(luò)(DNN)可以近似任意函數(shù)。 AutoML充分利用了這一優(yōu)勢(shì),使我們能夠通過(guò)應(yīng)用一系列預(yù)定義的特征轉(zhuǎn)換規(guī)則直接從原始特征中學(xué)習(xí)。
AutoML firstly characterizes the features into generic signal formats:
AutoML首先將特征表征為通用信號(hào)格式:
Continuous: single floating point value feature that can be consumed directly
連續(xù) :可直接使用的單個(gè)浮點(diǎn)值功能
OneHot: single-valued categorical data that usually go through an embedding lookup layer, e.g., user country and language
OneHot :通常通過(guò)嵌入查找層進(jìn)行的單值分類數(shù)據(jù),例如,用戶國(guó)家/地區(qū)和語(yǔ)言
Indexed: multi-hot categorical features that usually go through embedding and then projection/MLP summarize layers
索引 :通常通過(guò)嵌入然后投影/ MLP匯總層的多熱點(diǎn)分類特征
Hash_OneHot: one-hot data with unbounded vocabulary size
Hash_OneHot :詞匯量無(wú)限制的一鍵數(shù)據(jù)
Hash_Indexed: indexed data with unbounded vocabulary size
Hash_Indexed :具有無(wú)限制詞匯量的索引數(shù)據(jù)
Dense: a dense floating point valued vector, e.g., GraphSage [6] embeddings
密集:密集的浮點(diǎn)值向量,例如GraphSage [6]嵌入
Then the feature transforms are performed according to the signal format and the statistical distribution of the raw signal:
然后根據(jù)信號(hào)格式和原始信號(hào)的統(tǒng)計(jì)分布執(zhí)行特征變換:
- Continuous and dense features usually go through squashing or normalization 連續(xù)且密集的特征通常會(huì)經(jīng)過(guò)擠壓或歸一化
- One-hot and multi-hot encoded signals will be looked up embeddings and be projected 一熱和多熱編碼信號(hào)將被查找嵌入并被投影
- Categorical signals with unbounded vocabulary are hashed and converted to one-hot and multi-hot signals 詞匯量不受限制的分類信號(hào)被散列并轉(zhuǎn)換為單熱點(diǎn)和多熱點(diǎn)信號(hào)
This way the usually tedious feature engineering work can be saved, as the machine learning engineers can focus more on the signal quality and modeling techniques.
這樣,由于機(jī)器學(xué)習(xí)工程師可以將更多精力放在信號(hào)質(zhì)量和建模技術(shù)上,因此可以節(jié)省通常繁瑣的特征工程工作。
模型結(jié)構(gòu) (Model structure)
AutoML leverages state-of-the-art deep learning technologies to empower ranking systems. The model consists of multiple layers that have distinct, yet powerful, learning capabilities.
AutoML利用最先進(jìn)的深度學(xué)習(xí)技術(shù)來(lái)支持排名系統(tǒng)。 該模型由具有獨(dú)特但強(qiáng)大的學(xué)習(xí)能力的多層組成。
The representation layer: The input features are formulated in the representation layer. The feature transforms described in the previous section are applied on this layer.
表示層 :輸入要素在表示層中制定。 上一節(jié)中描述的特征轉(zhuǎn)換將應(yīng)用于此層。
The summarization layer: Features of the same type (e.g., Pin’s category vector and Pinner’s category vector) are grouped together. A common representation (embedding) is learned to summarize the signal group.
匯總層:相同類型的特征(例如,Pin的類別向量和Pinner的類別向量)被分組在一起。 學(xué)習(xí)通用表示(嵌入)以總結(jié)信號(hào)組。
The latent cross layer: The latent cross layers concatenate features from multiple signal groups and conduct feature crossing with multiplicative layers. Latent crossing enables high degree interactions among features.
潛在交叉層:潛在交叉層連接來(lái)自多個(gè)信號(hào)組的要素,并與乘法層進(jìn)行要素交叉。 潛在交叉可實(shí)現(xiàn)要素之間的高度交互。
The fully connected layer: The fully connected (FC) layers implement the classic deep feed-forward neural network.
全連接層:全連接(FC)層實(shí)現(xiàn)了經(jīng)典的深度前饋神經(jīng)網(wǎng)絡(luò)。
重點(diǎn)學(xué)習(xí) (Key learnings)
As sophisticated as AutoML is, the framework can be sensitive to errors or noises introduced to the system. It’s critical to ensure the model’s stability to maximize its learning power. We find that several factors affect the quality of AutoML models significantly during our development:
就像AutoML一樣復(fù)雜,該框架可能對(duì)引入系統(tǒng)的錯(cuò)誤或噪聲敏感。 確保模型的穩(wěn)定性以最大化其學(xué)習(xí)能力至關(guān)重要。 我們發(fā)現(xiàn)在我們的開(kāi)發(fā)過(guò)程中,有幾個(gè)因素會(huì)嚴(yán)重影響AutoML模型的質(zhì)量:
Feature importance: AutoML gives us a chance to revisit signals used in our models. Some signals that stood out in the old GBDT models (see the Calibration section) are not necessarily significant in DNNs, and vice versa. Bad features are not only useless to the model, the noises introduced may potentially deteriorate it. A feature importance report is thus developed with the random permutation [7] technique, which facilitates model development very well.
功能重要性: AutoML使我們有機(jī)會(huì)重新審視模型中使用的信號(hào)。 在舊的GBDT模型中突出的某些信號(hào)(請(qǐng)參閱“校準(zhǔn)”部分)在DNN中不一定很重要,反之亦然。 不良特征不僅對(duì)模型沒(méi)有用處,而且引入的噪聲可能會(huì)使模型惡化。 因此,使用隨機(jī)置換[7]技術(shù)開(kāi)發(fā)了一種功能重要性報(bào)告,這非常有利于模型開(kāi)發(fā)。
The distribution of feature values: AutoML relies on the “normal” distribution of feature values since it skips human engineering. The feature transforms defined in the representation layer may sometimes, however, fail to capture the extreme values. They will disrupt the stability of the subsequent neural networks, especially the latent cross layers where extreme values are augmented and are passed to the next layer. Outliers in both training and serving data must be properly managed.
特征值的分布: AutoML依靠特征值的“正態(tài)”分布,因?yàn)樗^(guò)了人工工程。 但是,有時(shí)在表示層中定義的特征轉(zhuǎn)換可能無(wú)法捕獲極值。 它們將破壞后續(xù)神經(jīng)網(wǎng)絡(luò)的穩(wěn)定性,尤其是潛伏的交叉層,其中極值會(huì)增加并傳遞到下一層。 訓(xùn)練和服務(wù)數(shù)據(jù)中的異常值必須得到適當(dāng)管理。
Normalization: (Batch) normalization is one of the most commonly used deep learning techniques. Apart from it, we find that minmax normalization with value clipping is particularly useful for the input layer. It’s a simple yet effective treatment to the outliers in feature values as mentioned above.
規(guī)范化:(批量)規(guī)范化是最常用的深度學(xué)習(xí)技術(shù)之一。 除此之外,我們發(fā)現(xiàn)帶有值裁剪的minmax歸一化對(duì)于輸入層特別有用。 如上所述,這是一種針對(duì)特征值的異常值的簡(jiǎn)單有效的處理方法。
多任務(wù)多塔 (Multi-task and multi-tower)
Besides clickthrough rate (CTR) prediction, we also estimate other user engagement rates as a proxy to the comprehensive user satisfaction. Those user engagements include but not are not limited to good clicks (click throughs where the user doesn’t bounce back immediately), and scroll ups (user scrolls up on the ad to reach more content on the landing page). DNNs allow us to learn multi-task models. Multi-task learning (MTL) [1] has several advantages:
除了點(diǎn)擊率(CTR)預(yù)測(cè)之外,我們還估算其他用戶參與率,以代替用戶的綜合滿意度。 這些用戶互動(dòng)包括但不限于良好的點(diǎn)擊(用戶不會(huì)立即反彈的點(diǎn)擊次數(shù))和向上滾動(dòng)(用戶向上滾動(dòng)廣告以在目標(biāo)網(wǎng)頁(yè)上獲得更多內(nèi)容)。 DNN使我們能夠?qū)W習(xí)多任務(wù)模型。 多任務(wù)學(xué)習(xí)(MTL)[1]有幾個(gè)優(yōu)點(diǎn):
Simplify system: Learning a model for each of the engagement types and housekeeping them can be a difficult and tedious task. The system will be much simplified and the engineering velocity will be improved if we can train more than one of them at the same time.
簡(jiǎn)化系統(tǒng) :為每種參與類型學(xué)習(xí)一個(gè)模型并將其整理內(nèi)幕是一項(xiàng)艱巨而繁瑣的任務(wù)。 如果我們可以同時(shí)訓(xùn)練其中一個(gè)以上的系統(tǒng),它將大大簡(jiǎn)化系統(tǒng)并提高工程速度。
Save infra cost: With a common underneath model that is shared by multiple heads, repeated computation can be minimized at both serving and training time.
節(jié)省基礎(chǔ)成本 :通過(guò)由多個(gè)負(fù)責(zé)人共享的通用底層模型,可以在服役和訓(xùn)練時(shí)間將重復(fù)計(jì)算減至最少。
Transfer knowledge across objectives: Learning different yet correlated objectives simultaneously enables the models to share knowledge from each other.
跨目標(biāo)轉(zhuǎn)移知識(shí) :同時(shí)學(xué)習(xí)不同但相關(guān)的目標(biāo),可使模型彼此共享知識(shí)。
Each of the engagement types is defined as an output of the model. The loss function of an MTL model looks like:
每種參與類型都定義為模型的輸出。 MTL模型的損失函數(shù)如下所示:
Where n denotes the number of examples, k the number of heads, y^ and y the prediction and true label, respectively.
其中n表示示例數(shù), k表示頭數(shù), y和y分別是預(yù)測(cè)和真實(shí)標(biāo)簽。
Apart from MTL, we face another challenge with the Pinterest Shopping and Standard ads products, which are distinct in many ways:
除了MTL,我們還面臨Pinterest購(gòu)物和標(biāo)準(zhǔn)廣告產(chǎn)品的另一項(xiàng)挑戰(zhàn),這些產(chǎn)品在許多方面都與眾不同:
Creatives: The images of Standard Ads are partners’ creatives; Shopping Ads are crawled from partners’ catalogs.
廣告素材 :標(biāo)準(zhǔn)廣告的圖像是合作伙伴的廣告素材; 購(gòu)物廣告從合作伙伴的目錄中抓取。
Inventory size: Shopping Ads’ inventory is multitudes bigger than Standard Ads
廣告空間大小:購(gòu)物廣告的廣告空間比標(biāo)準(zhǔn)廣告大很多
Features: Shopping Ads have unique product features like texture, color, price, etc.
功能 :購(gòu)物廣告具有獨(dú)特的產(chǎn)品功能,例如質(zhì)地,顏色,價(jià)格等。
User behavior patterns: Pinners with stronger purchase intention tend to engage with Shopping Ads.
用戶行為模式:購(gòu)買意愿更強(qiáng)的固定人傾向于與購(gòu)物廣告互動(dòng)。
We had been training and serving Shopping and Standard models separately before adopting DNN. With the help of AutoML, we started consolidating the two models. We then encountered a paradox: although the individual DNN models trained with Shopping or Standard data respectively outperformed the old models, a consolidated model that learned from a combination of Shopping and Standard data did not outperform either of the old individual models. The hypothesis is that single tower structure fails to learn the distinct characteristics of the two data sources simultaneously.
在采用DNN之前,我們?cè)謩e培訓(xùn)和提供購(gòu)物和標(biāo)準(zhǔn)模型。 在AutoML的幫助下,我們開(kāi)始整合兩個(gè)模型。 然后,我們遇到了一個(gè)悖論:盡管分別使用Shopping或Standard數(shù)據(jù)訓(xùn)練的單個(gè)DNN模型的性能優(yōu)于舊模型,但是從Shopping和Standard數(shù)據(jù)的組合中學(xué)到的合并模型卻沒(méi)有優(yōu)于任何一個(gè)舊的單個(gè)模型。 假設(shè)是單塔結(jié)構(gòu)無(wú)法同時(shí)學(xué)習(xí)兩個(gè)數(shù)據(jù)源的獨(dú)特特征。
The shared-bottom, multi-tower model architecture [1] was hence employed to tackle the problem. We use the existing AutoML layers as the shared bottom of the two data sources. The multi-tower structure is implemented as separate multilayer perceptrons (MLP) on top of that. Examples from each source only go through a single tower. Those from other sources are masked. For each tower, every objective (engagement type) is trained with an MLP. Figure 3 illustrates the model architecture.
因此,采用了共享底的多塔模型體系結(jié)構(gòu)[1]。 我們將現(xiàn)有的AutoML層用作兩個(gè)數(shù)據(jù)源的共享底部。 在其之上,多塔式結(jié)構(gòu)被實(shí)現(xiàn)為單獨(dú)的多層感知器(MLP)。 每個(gè)來(lái)源的示例僅通過(guò)單個(gè)塔。 來(lái)自其他來(lái)源的信息被屏蔽。 對(duì)于每座塔,每個(gè)目標(biāo)(參與類型)都需要通過(guò)MLP進(jìn)行訓(xùn)練。 圖3說(shuō)明了模型架構(gòu)。
Figure 3: Multi-task, multi-tower structure圖3:多任務(wù),多塔式結(jié)構(gòu)The multi-tower structure is effective in isolating the interference between training examples from different data sources, while the shared bottom captures the common knowledge of all the data sources.
多塔式結(jié)構(gòu)可以有效地隔離來(lái)自不同數(shù)據(jù)源的訓(xùn)練示例之間的干擾,而共享底部則捕獲了所有數(shù)據(jù)源的常識(shí)。
We evaluated the offline AUC and log-loss of the proposed multi-tower model and a single-tower baseline model. The results are summarized in Table 1. We found that the performance of the proposed model is better on both shopping and standard Ads. Especially on the shopping ads slice, we observed significant improvement. We further validated the results through online A/B tests, which demonstrate positive gains consistently as seen from the offline evaluation.
我們?cè)u(píng)估了建議的多塔模型和單塔基線模型的離線AUC和對(duì)數(shù)損失。 結(jié)果匯總在表1中。我們發(fā)現(xiàn),所建議的模型在購(gòu)物廣告和標(biāo)準(zhǔn)廣告上的效果都更好。 特別是在購(gòu)物廣告方面,我們看到了明顯的改善。 我們通過(guò)在線A / B測(cè)試進(jìn)一步驗(yàn)證了結(jié)果,從離線評(píng)估中可以看出,這些結(jié)果始終顯示出積極的收益。
校準(zhǔn) (Calibration)
Calibration represents the confidence in the probability predictions, which is essential to Ads ranking. For CTR prediction models, calibration is defined as:
校準(zhǔn)表示對(duì)概率預(yù)測(cè)的信心,這對(duì)廣告排名至關(guān)重要。 對(duì)于CTR預(yù)測(cè)模型,校準(zhǔn)定義為:
The calibration model of the Pinterest Ads ranking system has evolved through three stages:
Pinterest Ads排名系統(tǒng)的校準(zhǔn)模型經(jīng)歷了三個(gè)階段:
GBDT + LR hybrid [5]: Gradient boosting descent trees (GBDT) are trained against the CTR objective. The GBDT model is featurized and embedded into a logistic regression (LR) model that optimizes against the same objective. LRs by nature generate calibrated predictions.
GBDT + LR混合 [5]:針對(duì)CTR目標(biāo)訓(xùn)練了梯度提升后裔樹(shù)(GBDT)。 GBDT模型是功能化的,并嵌入到針對(duì)同一目標(biāo)進(jìn)行優(yōu)化的邏輯回歸(LR)模型中。 LR本質(zhì)上會(huì)生成校準(zhǔn)的預(yù)測(cè)。
Wide & deep: We rely on the wide component (also an LR model) of the wide & deep model [2] for calibration.
寬和深 :我們依靠寬和深模型[2]的寬組件(也是LR模型)進(jìn)行校準(zhǔn)。
AutoML + calibration layer: A lightweight Platt Scaling model [3] is trained for each of the heads of the AutoML model.
AutoML +校準(zhǔn)層 :針對(duì)AutoML模型的每個(gè)頭部訓(xùn)練輕量級(jí)的Platt Scaling模型[3]。
The AutoML + calibration layer approach is the latest milestone for the calibration models.
AutoML +校準(zhǔn)層方法是校準(zhǔn)模型的最新里程碑。
As described above, we have been relying on the LR models to calibrate the prediction of engagement rates. The solution has several drawbacks:
如上所述,我們一直依靠LR模型來(lái)校準(zhǔn)參與率的預(yù)測(cè)。 該解決方案有幾個(gè)缺點(diǎn):
We push all the sparse features to the AutoML model. The AutoML’s DNN models tend to be not calibrated well [4]. We then create a lightweight Platt Scaling model (essentially an LR model) with a relatively small number of signals for calibration. The signals in the calibration layer include contextual signals (country, device, time of day, etc.), creative signals (video vs image) and user profile signals (language, etc.). The model is both lean and dense, which enables it to converge fast. We are able to update the calibration layer hourly, when the DNNs are updated daily.
我們將所有稀疏功能推入AutoML模型。 AutoML的DNN模型往往沒(méi)有得到很好的校準(zhǔn)[4]。 然后,我們創(chuàng)建一個(gè)輕量級(jí)的Platt Scaling模型(本質(zhì)上是LR模型),并使用相對(duì)較少的信號(hào)進(jìn)行校準(zhǔn)。 校準(zhǔn)層中的信號(hào)包括上下文信號(hào)(國(guó)家,設(shè)備,一天中的時(shí)間等),創(chuàng)意信號(hào)(視頻與圖像)和用戶配置文件信號(hào)(語(yǔ)言等)。 該模型既瘦又密集,因此可以快速收斂。 當(dāng)DNN每天更新時(shí),我們能夠每小時(shí)更新一次校準(zhǔn)層。
The new calibration solution reduced the day-to-day calibration error by as much as 80%.
新的校準(zhǔn)解決方案將日常校準(zhǔn)誤差降低了多達(dá)80% 。
More specifically, we found two technical nuances about the calibration model: negative downsampling and selection bias.
更具體地說(shuō),我們發(fā)現(xiàn)了有關(guān)校準(zhǔn)模型的兩個(gè)技術(shù)細(xì)微差別:負(fù)下采樣和選擇偏差。
負(fù)下采樣 (Negative downsampling)
Negative examples in the training data are downsampled to keep labels balanced [5]. The prediction p generated by the model is rescaled with downsampling rate w to ensure the final prediction q is calibrated:
對(duì)訓(xùn)練數(shù)據(jù)中的負(fù)樣本進(jìn)行下采樣以保持標(biāo)簽的平衡[5]。 使用下采樣率w重新縮放模型生成的預(yù)測(cè)p ,以確保對(duì)最終預(yù)測(cè)q進(jìn)行了校準(zhǔn):
This formula doesn’t hold with multi-task learning, because the ratio between different user engagements are non-deterministic. Our solution is to set a base downsampling rate on one of the tasks (say the CTR head); the rescaling multiplier of other tasks are estimated dynamically during each training batch according to the base rate and the ratio in the number of engagements between other tasks and the base.
該公式不適用于多任務(wù)學(xué)習(xí),因?yàn)椴煌挠脩魠⑴c度之間的比率是不確定的。 我們的解決方案是為其中一項(xiàng)任務(wù)(例如CTR標(biāo)頭)設(shè)置基本下采樣率; 在每個(gè)訓(xùn)練批次中,根據(jù)基本比率和其他任務(wù)與基礎(chǔ)之間的參與次數(shù)之比,動(dòng)態(tài)估算其他任務(wù)的重新調(diào)整乘數(shù)。
選擇偏見(jiàn) (Selection bias)
Our ads models are trained over user action logs. The selection bias is inevitable when the training examples are generated by other models. The intuition lies in the fact that the new model never has the exposure to the examples that are not selected by the old model. As a result, we often observe that newly trained models are always mis-calibrated when they are put on for experimentation. The calibration is usually fixed after ramping up, with the hypothesis that they are less affected by selection bias with a larger portion of examples generated by themselves.
我們的廣告模型是根據(jù)用戶操作日志進(jìn)行訓(xùn)練的。 當(dāng)訓(xùn)練示例由其他模型生成時(shí),選擇偏差是不可避免的。 直覺(jué)在于新模型永遠(yuǎn)不會(huì)暴露于舊模型未選擇的示例。 結(jié)果,我們經(jīng)常觀察到,新訓(xùn)練的模型在進(jìn)行實(shí)驗(yàn)時(shí)總是被錯(cuò)誤校準(zhǔn)。 校準(zhǔn)通常是在加速后固定的,其假設(shè)是,選擇偏差對(duì)它們的影響較小,而大部分示例是由它們自己生成的。
While we don’t aim at fundamentally fixing the selection bias, a small trick helps us mitigate the issue: we train the calibration layer only with the examples generated by its own. The underlying DNN models are still trained with all the examples available to ensure convergence. The lightweightness of the calibration model, however, doesn’t need a lot of training data to converge. That way the calibration layer can learn from the mistakes made by itself and the results are surprisingly good: the newly trained models are as well calibrated as the production model during A/B testing, even with lower traffic.
雖然我們的目標(biāo)不是從根本上解決選擇偏差問(wèn)題,但有一個(gè)小技巧可以幫助我們緩解這一問(wèn)題:我們僅使用自己生成的示例來(lái)訓(xùn)練校準(zhǔn)層。 仍會(huì)使用所有可用示例來(lái)訓(xùn)練基礎(chǔ)DNN模型,以確保收斂。 但是,校準(zhǔn)模型的輕巧性并不需要大量的訓(xùn)練數(shù)據(jù)即可收斂。 這樣,校準(zhǔn)層可以從自身的錯(cuò)誤中學(xué)習(xí),并且結(jié)果出奇的好:在A / B測(cè)試過(guò)程中,即使是在流量較低的情況下,新訓(xùn)練的模型也可以像生產(chǎn)模型一樣進(jìn)行校準(zhǔn)。
結(jié)論 (Conclusion)
AutoML has equipped our multi-task ads CTR models with automatic feature engineering and state-of-the-art machine learning techniques. The multi-tower structure enables us to learn from data sources with distinct characteristics by isolating the interference from one another. This innovation has driven significant improvement in values for Pinners, advertisers and Pinterest. We also learned that a lightweight Platt Scaling model can effectively calibrate the DNN predictions and mitigate selection bias. As future work, we will make the AutoML framework more extensible so we can try more deep learning techniques such as sequence models.
AutoML為我們的多任務(wù)廣告點(diǎn)擊率模型配備了自動(dòng)功能工程和最新的機(jī)器學(xué)習(xí)技術(shù)。 多塔式結(jié)構(gòu)使我們能夠通過(guò)相互隔離干擾來(lái)學(xué)習(xí)具有鮮明特征的數(shù)據(jù)源。 這項(xiàng)創(chuàng)新推動(dòng)了Pinners,廣告商和Pinterest價(jià)值顯著提高。 我們還了解到,輕量級(jí)的Platt Scaling模型可以有效地校準(zhǔn)DNN預(yù)測(cè)并減輕選擇偏差。 在將來(lái)的工作中,我們將使AutoML框架更具可擴(kuò)展性,因此我們可以嘗試更多的深度學(xué)習(xí)技術(shù),例如序列模型。
致謝 (Acknowledgements)
This article summarizes a three-quarter work that involved multiple teams of Pinterest. The author wants to thank Minzhe Zhou, Wangfan Fu, Xi Liu, Yi-Ping Hsu and Aayush Mudgal for their tireless contributions. Thanks to Xiaofang Chen, Ning Zhang, Crystal Lee and Se Won Jang for many meaningful discussions. Thanks to Jiajing Xu, Xin Liu, Mark Otuteye, Roelof van Zwol, Ding Zhou and Randall Keller for the leadership.
本文總結(jié)了涉及多個(gè)Pinterest團(tuán)隊(duì)的四分之三的工作。 作者要感謝周敏哲,王望富,劉曦,徐一平和Aayush Mudgal的不懈貢獻(xiàn)。 感謝Chen Xiaofang,Ning Zhang,Crystal Lee和Se Won Jang進(jìn)行了許多有意義的討論。 感謝徐嘉靖,劉鑫,馬克·奧圖蒂,羅洛夫·范茲沃,丁周和蘭德?tīng)枴P勒的領(lǐng)導(dǎo)。
翻譯自: https://medium.com/pinterest-engineering/how-we-use-automl-multi-task-learning-and-multi-tower-models-for-pinterest-ads-db966c3dc99e
總結(jié)
以上是生活随笔為你收集整理的我们如何在Pinterest Ads中使用AutoML,多任务学习和多塔模型的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 落叶飞花
- 下一篇: 软考高级系统架构设计师:Web架构设计