ate_decomposition

今天在公司内部技术论坛上看到一个帖子,关于用Tree方法做ATE估计的,本着一个严(zhao)谨(cha)的态度,认认真真看了一遍,发现里面一个式子长的比较奇怪,随手推了推感觉还挺有意思的。在这里记一下。

POM框架就不多写了,这篇文章的目的是估计平均因果效应\(\tau\),然后提到最简单的办法就是用实验组的期望减去对照组的期望,即 \[ \begin{aligned} \hat{\tau} &= \mathbb{E}\Big[Y(1)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] \\ &= \mathbb{E}_{q\in\{0,1\}}\mathbb{E}\Big[Y(1)-Y(0)|W=q\Big] + \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] \\ &\quad\quad + \mathbb{P}(W=0) \Bigg( \mathbb{E}\Big[Y(1)-Y(0)|W=1\Big] - \mathbb{E}\Big[Y(1)-Y(0)|W=1\Big] \Bigg) \end{aligned} \] 乍一看很短的内容推出了长长一串,我先把详细的推导写一遍,然后再写我的一些想法。 \[ \begin{aligned} \hat{\tau} &= \mathbb{E}\Big[Y(1)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] \\ &= \mathbb{E}\Big[Y(1) {\color{red}{- Y(0) +Y(0)}}|W=1\Big] - \mathbb{E}\Big[Y(0) {\color{red}{- Y(1) + Y(1)}}|W=0\Big] \\ &= \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] + \mathbb{E}\Big[Y(0)|W=1\Big] \\ &\quad \quad + \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big] - \mathbb{E}\Big[Y(1)|W=0\Big] \\ &= \sum_{q\in\{0,1\}}\Bigg[\mathbb{E}\Big[Y(1) - Y(0)|W=q\Big]\Bigg] + \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(1)|W=0\Big] \\ &\quad \quad {\color{red}{ + \mathbb{E}\Big[Y(0)|W=0\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] }} \\ &= \sum_{q\in\{0,1\}}\Bigg[\mathbb{E}\Big[Y(1) - Y(0)|W=q\Big]\Bigg] \\ &\quad\quad + \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] \\ &\quad\quad - \mathbb{E}\Big[Y(1)|W=0\Big] + \mathbb{E}\Big[Y(0)|W=0\Big] \\ &= \sum_{q\in\{0,1\}}\Bigg[\mathbb{E}\Big[Y(1) - Y(0)|W=q\Big]\Bigg] \\ &\quad\quad + \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big] \\ &\quad\quad - \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big] \\ &= \underbrace{\sum_{q\in\{0,1\}}\Bigg[\mathbb{E}\Big[Y(1) - Y(0)|W=q\Big]\Bigg]}_{\color{red}{\text{term A}}} \\ &\quad\quad \underbrace{+ \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\color{red}{\text{term B}}} \\ &\quad\quad \underbrace{- \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big]}_{\color{red}{\text{term C}}} \\ \text{term A} &= \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] + \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \Bigg] \\ &= \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \\ & \quad\quad + \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ &= \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ & \quad\quad + \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ &= \mathbb{E}\Big[Y(1) - Y(0) \Big] + \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ \hat{\tau} &= \underbrace{\sum_{q\in\{0,1\}}\Bigg[\mathbb{E}\Big[Y(1) - Y(0)|W=q\Big]\Bigg]}_{\color{red}{\text{term A}}} \\ &\quad\quad \underbrace{+ \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\color{red}{\text{term B}}} \\ &\quad\quad \underbrace{- \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big]}_{\color{red}{\text{term C}}} \\ &= \mathbb{E}\Big[Y(1) - Y(0) \Big] + \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=1) + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ &\quad\quad \underbrace{+ \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\color{red}{\text{term B}}} \\ &\quad\quad \underbrace{- \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big]}_{\color{red}{\text{term C}}} \\ &= \mathbb{E}\Big[Y(1) - Y(0) \Big] \\ &\quad\quad \underbrace{+ \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\color{red}{\text{term B}}} \\ & \quad \quad + \Bigg[ \mathbb{E}\Big[Y(1) - Y(0)|W=0\Big] \cdot \mathbb{P}(W=1) - \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big] + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ &= \mathbb{E}\Big[Y(1) - Y(0) \Big] \\ &\quad\quad \underbrace{+ \mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\color{red}{\text{term B}}} \\ & \quad \quad + \Bigg[ - \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big] \cdot \mathbb{P}(W=0) + \mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] \cdot \mathbb{P}(W=0) \Bigg] \\ &= \underbrace{\mathbb{E}\Big[Y(1) - Y(0) \Big]}_{\text{ATE期望}} \\ &\quad\quad + \underbrace{\mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\text{选择偏差,实验组对照组分布不同}} \\ & \quad \quad + \mathbb{P}(W=0) \cdot \Bigg[ \underbrace{\mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] - \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big]}_{\text{干预在实验组对照组上存在异质性}} \Bigg] \\ \end{aligned} \] 嗯,整个式子写到最后是 \[ \begin{aligned} \hat{\tau}&= \underbrace{\mathbb{E}\Big[Y(1) - Y(0) \Big]}_{\text{ATE期望}} + \underbrace{\mathbb{E}\Big[Y(0)|W=1\Big] - \mathbb{E}\Big[Y(0)|W=0\Big]}_{\text{选择偏差,实验组对照组分布不同}} \\ & \quad \quad + \mathbb{P}(W=0) \cdot \Bigg[ \underbrace{\mathbb{E}\Big[Y(1) - Y(0)|W=1\Big] - \mathbb{E}\Big[Y(1) - Y(0) |W=0\Big]}_{\text{干预在实验组对照组上存在异质性}} \Bigg] \end{aligned} \] 由3部分组成:

  1. ATE的期望,潜在结果相减的期望,这个没什么好说的,就是我们希望得到的。
  2. 选择偏差,实验组和对照组在\(Y(0)\)上分布的差异,即实验组和对照组在接受实验前的基础水平差异。在做实验前我们会观测分组是否AA平,就是在看\(Y(0)\)的差异。当假设存在稳定映射关系\(f\)使得\(Y(0)=f(X)\),也可以只校验实验组和对照的关键特征\(X\)是否相同,
  3. 干预效应在实验组和对照组上的异质性,即实验组的treatment effect和对照组的treatment effect是不一样的。思考了一下什么时候会出现3的问题,感觉是相比2而言,更不满足unconfoundedness时候,\(\Big(Y(1),Y(0)\Big)\perp W|X\),会有这样的问题,在\(\Big(Y(1),Y(0)\Big)\not \perp W|X\)基础上有\(\Big(Y(1)-Y(0)\Big)\not \perp W|X\) 【个人想法,可能不对,欢迎拍砖】

顺便复习一下对于CATE估计,3大基础假设的作用: \[ \begin{aligned} \hat{\tau}(x) &= \mathbb{E}\Big[Y(1)-Y(0)|X=x\Big] \\ &= \underbrace{\mathbb{E}\Big[Y(1)|W=1, X=x\Big] -\mathbb{E}\Big[Y(0)|W=0, X=x\Big]}_{\text{observed}} \\ & \quad\quad + \mathbb{P}(W=1|X=x)\cdot\Bigg( \underbrace{\mathbb{E}\Big[Y(0)|W=0, X=x\Big] }_{\text{observed}} - \underbrace{\mathbb{E}\Big[Y(0)|W=1, X=x\Big]}_{\text{unobserved}} \Bigg) \\ & \quad\quad + \mathbb{P}(W=0|X=x)\cdot\Bigg( \underbrace{\mathbb{E}\Big[Y(1)|W=0, X=x\Big] }_{\text{unobserved}} - \underbrace{\mathbb{E}\Big[Y(1)|W=1, X=x\Big]}_{\text{observed}} \Bigg) \end{aligned} \] SUTV就不说了,当不满足Overlap假设时候,无法估计CATE,当如果满足Unconfoundedness时候,对于上式后两项有 \[ \begin{aligned} \mathbb{E}\Big[Y(0)|W=0, X=x\Big]- \mathbb{E}\Big[Y(0)|W=1, X=x\Big] &=\mathbb{E}\Big[Y(0)|X=x\Big] - \mathbb{E}\Big[Y(0)|X=x\Big] = 0 \\ \mathbb{E}\Big[Y(1)|W=0, X=x\Big]- \mathbb{E}\Big[Y(1)|W=1, X=x\Big] &=\mathbb{E}\Big[Y(1)|X=x\Big] - \mathbb{E}\Big[Y(1)|X=x\Big] = 0 \end{aligned} \] 可以得到无偏估计。