Econometrics Series - Regression Basics
The Conditional Expectation Function (CEF) The CEF for a dependent variable \(Y_i \), given a vector of \(k \times 1\) covariates \(X_i\) (with elements \(x_{ki}\)) is the expectation of the population average (or an infinitely large sample). $$E[Y_i \vert X_i=x]$$ The CEF is only a function of $X_i$. The potential outcomes framework presented an important case of the CEF where $D_i \in {0, 1}$. If $X_i$ is random then the CEF is also random, but a particular value of $X_i$ can give a concrete answer to the CEF.
Econometrics Series - The Experimental Ideal
The Experimental Ideal Do hospitals have a positive impact on health? Let $Y_i$ be the observed health status of an individual and let $D_i$ be whether they went to the doctor or not. $$ \textit{Potential Outcome} = \begin{cases} Y_{1i} & \text{if } D_i = 1 \\\ Y_{0i} & \text{if } D_i = 0 \end{cases}$$ If we naively take the difference between the average health status of people who go to the hospital, and people who don’t we end up with the answer that going to the hospital actually makes you sicker.
Setting up Python for GNNs
It seems that in the years that have passed since my first hello.py I have forgotten the harrowing experience of setting up a python environment. Compared to R + RStudio, stata, or even julia, python installation seems unnecessarily complex. Here I’ll briefly talk about how to manage your python environments, and how to develop python effectively. I’m just grateful that the language of choice for data science wasn’t Javascript. Getting Around Your Computer with The Terminal The terminal is powerful and precise tool, and key to developing effective applications.
On The Checklist Manifesto
The Checklist Manifesto Why Checklists? The Checklist Manifesto is a short book written by Atul Gawande detailing the uses of checklists for medical interventions. The idea behind checklists is simple: build a system to ensure that the most common errors are caught to greatly reduce overload. Like many distributions, system errors have a high density in top $n$, but also a long tail. We should build checklists to optimize against the top $n$ errors, and use domain expertise on a case-by-case basis for the long-tail.