Table of contents

The Conditional Expectation Function (CEF)

The CEF for a dependent variable \(Y_i \), given a vector of \(k \times 1\) covariates \(X_i\) (with elements \(x_{ki}\)) is the expectation of the population average (or an infinitely large sample).

$$E[Y_i \vert X_i=x]$$

The CEF is only a function of $X_i$. The potential outcomes framework presented an important case of the CEF where $D_i \in {0, 1}$. If $X_i$ is random then the CEF is also random, but a particular value of $X_i$ can give a concrete answer to the CEF.

Assume that $Y_i$ with a conditional probability density function represented by $f_y(t|X_i=x)$ for $Y_i = t$. Then

$$E[Y_i \vert X_i = x] = \int tf_y(t\vert X_i = x) dt$$

In the discrete case
$$E[Y_i \vert X_i = x] = \sum_{t} t \cdot P(Y_i = t|X_i=x)$$

Law of Iterated Expectations

$$E[Y_i] = E{E[Y_i|X_i]}$$

A proof of LIE for the continuous case

$$ \begin{equation} \begin{split} E{E[Y_i \vert X_i ]} &= \int E[Y_i |X_i = u] g_x(u) du \\\
&= \int \left[ \int tf_y(t\vert X_i = x) dt\right] g_x(u) du \\\
&= \int \int tf_y(t\vert X_i = x)g_x(u) dtdu \\\
&= \int t\int\left[f_y(t\vert X_i = x)g_x(u)\right] dt \\\
&= \int t \left[ \int f_{xy}(u, t) du\right] dt \\\
&= \int t g_y(t)dt = E[y_i] \end{split} \end{equation} $$ In the discrete case, I start by noting that

$$P(Y_i = y_t\vert X_i=x_j) = \frac{P(Y_i = y_t, X_i=x_j)}{P(X_i=x_j)}$$

$$ \begin{equation} \begin{split} E{E[Y_i \vert X_i] } &= \sum_{j=1}^{d_x} E[Y_i \vert X_i = x_j] \cdot P(X_i = x_j) \\\
& = \sum_{j=1}^{d_x}\left [\sum_{k=1}^{d_y} y_t\cdot P(Y_i = y_k|X_i=x_j) \right] \cdot P(X_i = x_j) \\\
& = \sum_{k=1}^{d_y} y_t\sum_{j=1}^{d_x} P(Y_i = y_t, X=X_j) \\\
& = \sum_{k=1}^{d_y} y_t P(Y_i = y_t ) = E[Y_i] \end{split} \end{equation} $$ **The CEF Decomposition Property**

$$Y_i = E[y_i | X_i] + \varepsilon_i$$ Under the assumptions that

  • (1) $\varepsilon$ is mean independent of $X_i$, $E[\varepsilon_i | X_i] = 0$
  • (2) $\varepsilon$ is uncorrelated with any function of $X_i$

The CEF Prediction Property

Let $m(X_i)$ be any function of $X_i$, the CEF solves

$$E[Y_i \vert X_i ] = \arg \min_{m(X_i)}E[(Y_i - m(X_i))^2]$$

Proof

$$ \begin{split} (Y_i - m(X_i))^2 &= ((Y_i - E[Y_i \vert X_i]) + (E[Y_i\vert X_i] - m(X_i)))^2 \\\
&=(Y_i - E[Y_i\vert X_i])^2 + 2[(Y_i - E[Y_i\vert X_i]) \times (E[Y_i \vert X_i] - m(X_i))] + (E[Y_i \vert X_i] - m(X_i))^2 \\\
&=\varepsilon_i^2 + 2h(x_i)\varepsilon_i + (E[Y\vert X_i] - m(X_i))^2 \end{split} $$ Taking the expectation we arrive at

$$\begin{split} E[(Y_i - m(X_i))^2] &= E[\varepsilon_i^2] + 2E[h(x_i)]E[\varepsilon_i] + E[(E[Y_i|X_i] - m(X_i))^2] \\\
& = E[\varepsilon_i^2] + 0+ E[(E[Y_i|X_i] - m(X_i))^2] \\\ & = E[\varepsilon_i^2] + E[E[Y_i|X_i]^2 ] - 2E[E[Y_i\vert X_i]m(X_i)] + E[m(X_i)^2] \end{split} $$

Taking the first order conditions

$$\begin{split} \frac{\partial (Y_i - m(X_i))^2}{\partial m(X_i)} =- 2E[Y_i\vert X_i] + 2m(X_i) = 0 \\\
\therefore m(X_i) = E[Y_i\vert X_i] \end{split} $$

The ANOVA Theorem

$$V(Y_i) = V(E[Y_i\vert X_i]) + E[V(Y_i\vert X_i)]$$ Proof

By the CEF decomposition property we know that

$$\varepsilon_i = Y_i - E[Y_i \vert X_i]$$ Because $\varepsilon_i$ and $E[Y_i \vert X_i]$ are by definition uncorrelated the variance of $\varepsilon_i$ can be written as:

$$\begin{split}V(\varepsilon_i) &= E(\varepsilon_i^2) + E(\varepsilon_i)^2 = E(\varepsilon_i^2) \\\
&= E[E[\varepsilon_i|X_i]^2] = E[V(Y_i \vert X_i)] \end{split}$$