On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models

TL;RD,

本文是Google广告团队在CTR模型优化迭代上的上一些最佳实践,包含准确率提升的tricks、效果提升和模型规模的trade-off、可复现性提升以及bias的处理。

本来是想简单摘抄一下这篇文章中的精华,写到一半觉得这篇文章不应如此,本文应该是一篇可以比肩Wide&Deep的文章。

大概六年前,写本科毕设时候我有幸选中了Wide&Deep那篇文章并一字一句翻译了一遍,今日这篇文章也希望可以一字一句翻译,好好消化。

摘要

For industrial-scale advertising systems, prediction of ad click- through rate (CTR) is a central problem. Ad clicks constitute a significant class of user engagements and are often used as the primary signal for the usefulness of ads to users. Additionally, in cost-per-click advertising systems where advertisers are charged per click, click rate expectations feed directly into value estimation. Accordingly, CTR model development is a significant investment for most Internet advertising companies. Engineering for such prob- lems requires many machine learning (ML) techniques suited to online learning that go well beyond traditional accuracy improve- ments, especially concerning efficiency, reproducibility, calibration, credit attribution. We present a case study of practical techniques deployed in Google’s search ads CTR model. This paper provides an industry case study highlighting important areas of current ML research and illustrating how impactful new ML methods are evaluated and made useful in a large-scale industrial setting.

对于工业级的广告系统,广告点击率(CTR)的预估是一个核心问题。点击作为用户参与行为的重要类别,经常被作为广告对用户产生作用的主要信号。此外,在按点击付费的广告系统中,点击率的期望直接影响价值的估计。因此,对于大部分互联网广告公司,CTR模型的开发是一项重大的投资。

为了解决这类问题,需要许多适合在线学习的机器学习技术,不同于用于提升传统机器学习模型准确率的技巧,其更关注于效率、可重复性、校准和归因(credit attribution,简单翻译为归因可能不太准确,后文会详细解释)。

本文展示了一个部署在谷歌搜索广告的CTR模型中,用到的实用技术的案例研究,该工业级的案例研究,强调了机器学习研究的重要性,展现了新的机器学习方法在工业环境中是如何评估并产生巨大影响的。

Introduction