# 18 Instrumental variables

## 18.1 Motivation

Suppose that we want to estimate the effect of watching a TV show Sesame Street on preschool children’s recognition of English letters. We might consider an experiment where the treatment is watching Sesame Street. However, it would be difficult for us to force the children to watch the TV show for prevent them from watching it.

Instead, what we can encourage a random subgroup of the children to watch the show. Now, we have an additional variable, the *encouragement* that affects the treatment variable.

Let us take a look at the data, which is contained `sesame.csv`

.

```
set.seed(0)
library(brms)
library(rstanarm)
```

```
<- read.csv("data/sesame.csv")
sesame
head(sesame[, c("encouraged", "watched", "setting", "site", "postlet")])
```

```
encouraged watched setting site postlet
1 1 0 2 1 30
2 1 1 2 1 37
3 0 1 2 1 46
4 0 0 2 1 14
5 0 1 2 1 63
6 0 1 2 1 36
```

The outcome is `postlet`

, which is the post-treatment measurement of the letter recognition task.

The encouragement is an example of an *instrumental variable* (IV), a variable that affects the treatment but does not affect the outcome. Here are some examples of situations in which an intrumental variable may arises.

- Encouragement or advertisement as an instrumental variable. For example,
- A livestreaming service Twitch studied the effect of number of friends on the user’s likelihood to return to the site. Their instrumental variable is the random assignment of receiving a prompt ot add more friends (Forter 2017).
- An online gaming company Roblox studied the impact of the Avatar Shop on the community engagement. Their instrumental variable is the random assignment of getting a recomendation for items in the Avatar Shop (Kharel 2021).

- Judge assignment as an intrumental variable. For example, Farre-Mensa, Hegde, and Ljungqvist (2017) studied the impact of having patents on a startup’s subsequent growth. The judges are the examiners of the patents, with different levels of leniency, and the instrumental variable is the random assignment of judges to the patent applications.

These are examples of *quasi-experiment*, that is, an experiment to study the causal effect without random treatment assigment.

## 18.2 Terminologies for instrumental varialbes

In addition to the usual causal notations of the treatment variable \(T\), the outcome variable \(y\), and the confounders \(x\), we also have:

- the instrumental variable \(z\) affecting the treatment,
- the
*potential*treatments \(T^0\) and \(T^1\) which is observed only if \(z=0\) and \(z=1\), respectively. - the
*potential*outcomes \(y^0\) and \(y^1\) which is observed only if \(z=0\) and \(z=1\), respectively.

We summarize the variables in the graphical model below.

We can estimate the effect of the instrument on the outcome—sometimes called *intent-to-treat* (ITT) effect—using the difference in the means of encouraged and not encouraged units effect. This is different than the treatment effect which is usually the quantity of interest. We will discuss the formulations and assumptions imposed by this model and ways to estimate the average treatment effect.

We now categorize the units into four groups by their treatment response to the instrument.

**Compliers**Instrument has a positive effect on their treatments, that is, \(T^1_i=1\) and \(T^0_i=0\).**Defiers**Instrument has a negative effect on their treatments, that is, \(T^1_i=0\) and \(T^0_i=1\).**Never-takers**Units who*never*take the treatment, that is, \(T^1_i=T^0_i=0\).**Always-takers**Units who*always*take the treatment, that is, \(T^1_i=T^0_i=1\).

We are only interested in estimating the average treatment effect on the compliers—this is called the *complier average causal effect* (CACE).

A good instrument induces many compliers and no defiers; we will discuss more about desired properties of an instrument in the next section.

## 18.3 Assumptions for instrumental variables

In addition to SUTVA, the method of intrumental variables relies on several assumptions.

**Ignorability of the instrument.** We assume that the randomization in the instrumental variable is independent of the potential treatments and the outcomes:

\[ y^1,y^0,T^1,T^0 \perp z. \]

Unlike the standard causal analysis, the ignorability of the treatment does not have to be satisfied.

**Monotonicity.** We assume that there is no defiers, that is, no unit who would have taken the treatment if they were not encouraged and would not take the treatment if they were encouraged.

**Relevance.** Of course, in an instrumental variable design, there would be no hope in estimating the treatment effect if the instrument is unrelated to the treatment. This assumption can be written as:

\[ \Pr[T^1_i=1 \vert z_i=1] >0 \quad\text{and}\quad \Pr[T^1_i=0 \vert z_i=0] > 0. \]

**Exclusion restriction.** We assume that there is no effect of the instrument on the outcomes of the never-takers (who would not have taken the treatment either way) and always-takers (who would have taken the treatment either way).

We can come up with a story that violates the exclusion restriction. In the Sesame Street example, we could have parents who prohibited their children from watching television (so the children were never-takers). But after being encouraged to have their children watch the Sesame Street, they decided to purchases a similar educational material for their children to read instead.

If there are some covariates \(x\), we might instead assume the conditional ignorability:

\[ y^1,y^0,T^1,T^0 \perp z\vert x, \]

We will see that these assumptions lead to unbiasedness of the difference-in-means estimation in the next section.

## 18.4 Intent-to-treat (ITT) effect and complier average causal effec (CACE)

The following table shows hypothetical data of the Sesame Street experiment. The bold numbers are observed values.

Unit \(i\) |
Potential treatment \(T^0_i\) |
Potential treatment \(T^1_i\) |
Unit Type \(c_i\) |
Encouragement \(z_i\) |
Potential outcome \(y^0_i\) |
Potential outcome \(y^1_i\) |
Instrument effect \(y^1_i-y^0_i\) |
---|---|---|---|---|---|---|---|

1 | 0 |
1 | complier | 0 | 67 |
76 | 9 |

2 | 0 |
1 | complier | 0 | 72 |
80 | 8 |

3 | 0 |
0 | never-taker | 0 | 68 |
68 | 0 |

4 | 1 |
1 | always-taker | 0 | 76 |
76 | 0 |

5 | 1 |
1 | always-taker | 0 | 74 |
74 | 0 |

6 | 0 | 1 |
complier | 1 | 74 | 81 |
7 |

7 | 0 | 1 |
complier | 1 | 68 | 78 |
10 |

8 | 0 | 0 |
never-taker | 1 | 70 | 70 |
0 |

9 | 1 | 1 |
always-taker | 1 | 80 | 80 |
0 |

10 | 1 | 1 |
always-taker | 1 | 82 | 82 |
0 |

Assuming that the instrument assignments are random, this data satisfies all the four assumptions above. Specifically, there is no defiers (no \(T^0_i=1\) and \(T^1_i=0\)) and no instrument effect on the never-takers and always-takers.

Now let us calculate the intent-to-treat (ITT) effect using the difference in the means of encouraged and not encouraged children, which in turn is the average of the instrument effect \(\tau_i\).

\[ \text{ITT} = \frac{9+8+0+0+0+7+10+0+0+0}{10} =3.4. \]

We can also calculate the *complier average causal effect* (CACE), using the difference in the means of the treated and controlled compliers.

\[ \text{CACE} = \frac{9+8+7+10}{4} = 8.5. \]

In our hypothetical data, we can write CACE in terms of ITT as follows:

\[ \text{CACE} = \frac{\text{ITT}}{4/10} = \frac{\text{ITT}}{\Pr[c=\text{complier}]} = \frac{\mathbb{E}[y\vert z=1] - \mathbb{E}[y\vert z=0]}{\Pr[c=\text{complier}]}. \]

### 18.4.1 Compute CACE using ITT

In a instrumental variable design, our quantity of interest is the average treatment effect over the compliers, that is, the CACE. However, we generally do not observe which units are complier, so we cannot compute CACE directly from the definition. But following computations will show that, under the four assumptions above, we can compute CACE using ITT.

First, we have

\[\begin{align*} \text{ITT} &=\mathbb{E}[y\vert z=1]-\mathbb{E}[y\vert z=0] \\ &= \mathbb{E}[y^1-y^0]\\ &=\mathbb{E}[y^1-y^0\vert c=\text{complier}]\Pr[c=\text{complier}] \\ &\quad + \mathbb{E}[y^1-y^0\vert c=\text{defier}]\Pr[c=\text{defier}] \\ &\quad + \mathbb{E}[y^1-y^0\vert c=\text{never-taker}]\Pr[c=\text{never-taker}] \\ &\quad + \mathbb{E}[y^1-y^0\vert c=\text{always-taker}]\Pr[c=\text{always-taker}] \\ &= \mathbb{E}[y^1-y^0\vert c=\text{complier}]\Pr[c=\text{complier}]\\ & = \text{CACE}*\Pr[c=\text{complier}], \end{align*}\]where the last equality follows from the monotonicity assumption (\(\Pr[c=\text{defier}]=0\)) and the exclusion restriction assumption (\(Y^1-Y^0=0\) for the never-takers and always-takers). To remove \(\Pr[c=\text{complier}]\), we consider \(\mathbb{E}[T\vert z=1]-\mathbb{E}[T\vert z=0]\). Using the monotonicity assumption as before, we obtain

\[\begin{align*} \mathbb{E}[T\vert z=1]-\mathbb{E}[T\vert z=0] &= \mathbb{E}[T^1-T^0]\\ &=\mathbb{E}[T^1-T^0\vert c=\text{complier}]\Pr[c=\text{complier}] \\ &\quad + \mathbb{E}[T^1-T^0\vert c=\text{defier}]\Pr[c=\text{defier}] \\ &= \mathbb{E}[T^1-T^0\vert c=\text{complier}]\Pr[c=\text{complier}] \\ &= \Pr[c=\text{complier}], \end{align*}\]which cannot be zero because of the assumption that the instrument must have an effect on the treatment. Dividing these two expressions yields

\[ \text{CACE} = \frac{\mathbb{E}[y\vert z=1] - \mathbb{E}[y\vert z=0]}{\mathbb{E}[T\vert z=1]-\mathbb{E}[T\vert z=0]}. \]

Now, using the ignorability assumption, we can estimate the numerator using the difference in the means of the outcomes of the encouraged children (\(z=1\)) and not encouraged children (\(z=0\)), which can be computed with a linear regression.

```
<- stan_glm(postlet ~ encouraged, data=sesame,
itt_zy refresh=0)
coef(itt_zy)["encouraged"]
```

The coefficient of `encouraged`

is our estimate of ITT. And we can estimate the denominator using the difference in the means of the treatment assignments of those two groups of children.

```
<- stan_glm(watched ~ encouraged, data=sesame,
itt_zt refresh=0)
coef(itt_zt)["encouraged"]
```

```
encouraged
0.3626182
```

The coefficient of `encouraged`

is the proportion of compliers in the data. Dividing the ITT estimate by the proportion, we obtain a CACE estimate, sometimes called a *Wald estimate*.

```
<- coef(itt_zy)["encouraged"] / coef(itt_zt)["encouraged"]
wald_est
wald_est
```

```
encouraged
7.901558
```

When there are covariates \(x\), the CACE needs to be defined for each level of \(x\), and all of our derivations above have to be conditioned on \(x\). More precisely, with the conditional ignorability \(y^1,y^0,T^1,T^0 \perp z\vert x\) and the corresponding assumptions, we have

\[ \text{CACE}(x) = \frac{\mathbb{E}[y\vert z=1,x] - \mathbb{E}[y\vert z=0,x]}{\mathbb{E}[T\vert z=1,x]-\mathbb{E}[T\vert z=0,x]}. \]

## 18.5 Two-stage least squares

We discuss a general method of estimating CACE, called *two-stage least square* (2SLS).

In 2SLS, we pe form two linear regressions:

- Regression of the treatment variable on the instrument:

\[ \hat{T} = \alpha_0 + \alpha_1z_1. \]

- Regression of the outcome on the first model’s
*predicted*treatment:

\[ y = \beta_0 + \beta_1\hat{T} + \varepsilon. \]

Here is an example of 2SLS on the Sesame Street data.

```
<- stan_glm(watched ~ encouraged, data=sesame,
fit_2a refresh=0)
$watched_hat <- fit_2a$fitted
sesame<- stan_glm(postlet ~ watched_hat, data=sesame,
fit_2b refresh=0)
fit_2b
```

```
stan_glm
family: gaussian [identity]
formula: postlet ~ watched_hat
observations: 240
predictors: 2
------
Median MAD_SD
(Intercept) 20.5 4.0
watched_hat 8.1 5.0
Auxiliary parameter(s):
Median MAD_SD
sigma 13.3 0.6
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
```

Here, the estimated treatment effect is \(8.1\). However, the second regression does not give the correct standard error as it has to be computed from the observed treatment assignments, not the predicted one. Instead, we can fit 2SLS using the `brms`

library and return the correct standard error.

```
<- bf(watched ~ encour)
f1 <- bf(postlet ~ watched)
f2 <- brm(f1 + f2, data=sesame, refresh=0) IV_brm_a
```

` IV_brm_a`

```
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: watched ~ encour
postlet ~ watched
Data: sesame (Number of observations: 240)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
watched_Intercept 0.55 0.04 0.47 0.63 1.00 3977 3069
postlet_Intercept 20.44 3.71 13.37 28.21 1.00 2076 2310
watched_encour 0.36 0.05 0.26 0.46 1.00 4128 2913
postlet_watched 8.10 4.69 -1.61 17.05 1.00 1956 1862
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_watched 0.38 0.02 0.35 0.42 1.00 4661 2943
sigma_postlet 12.64 0.69 11.47 14.19 1.00 2790 2448
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(watched,postlet) 0.16 0.15 -0.13 0.45 1.00 1839
Tail_ESS
rescor(watched,postlet) 2117
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

We can also incorporate the covariates by adding them to both regression models. For example, let us add two covariates: `site`

and `setting`

to our models.

```
<- bf(watched ~ encour + prelet + setting + factor(site))
f1 <- bf(postlet ~ watched + prelet + setting + factor(site))
f2 <- brm(f1 + f2, data=sesame, refresh=0) IV_brm_b
```

` IV_brm_b`

```
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: watched ~ encour + prelet + setting + factor(site)
postlet ~ watched + prelet + setting + factor(site)
Data: sesame (Number of observations: 240)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
watched_Intercept 0.66 0.11 0.45 0.87 1.00 3995 3119
postlet_Intercept 1.33 4.51 -7.62 10.27 1.00 1803 1713
watched_encour 0.34 0.05 0.24 0.44 1.00 5514 3055
watched_prelet 0.01 0.00 -0.00 0.01 1.00 6257 3176
watched_setting -0.05 0.05 -0.16 0.05 1.00 5656 2961
watched_factorsite2 0.03 0.07 -0.10 0.17 1.00 3359 2622
watched_factorsite3 -0.11 0.07 -0.24 0.02 1.00 3467 2980
watched_factorsite4 -0.34 0.07 -0.49 -0.20 1.00 3891 3161
watched_factorsite5 -0.29 0.10 -0.49 -0.10 1.00 3778 3169
postlet_watched 13.90 3.86 6.40 21.33 1.00 1721 1917
postlet_prelet 0.70 0.08 0.56 0.85 1.00 5399 2797
postlet_setting 1.60 1.42 -1.20 4.39 1.00 3497 3150
postlet_factorsite2 8.39 1.87 4.64 12.05 1.00 4126 3028
postlet_factorsite3 -3.98 1.76 -7.40 -0.43 1.00 3237 2916
postlet_factorsite4 0.87 2.32 -3.67 5.58 1.00 2094 2162
postlet_factorsite5 2.69 2.81 -2.85 8.26 1.00 3138 2528
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_watched 0.36 0.02 0.33 0.39 1.00 5986 2795
sigma_postlet 9.44 0.54 8.51 10.63 1.00 3014 2271
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(watched,postlet) -0.18 0.15 -0.46 0.13 1.00 1692
Tail_ESS
rescor(watched,postlet) 2050
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

The instrumental variables’ treatment effect estimate is the coefficient of `postlet_watched`

, which is \(13.9\) with standard error \(3.9\).

The 2SLS method can be easily extended to continuous instrument and treatment variables. But we must be careful in the case of binary instrument and continuous treatment as there is no universal way—as least without parametric assumptions—to quantify the instrument’s effect on the treatment.

## 18.6 Multiple instruments, treatments and covariates

We can have multiple variables playing different roles in the instrumental variables model. In general, we have

- The outcome \(y\)
- Multiple treatments \(T_1,\ldots,T_k\)
- Multiple instruments \(z_1,\ldots,z_m\); these variables only affect the outcome through the treatments.
- Multiple covariates \(x_1,\ldots,x_l\); these variables affect the outcomes directly.

For each treatment and each instrument, we categorized the units by the compliance as follows:

**Compliers**Instrument has a positive (negative) effect on their treatments.**Never-takers**and**Always-taker**Instrument has no effect on their treatments.**Defiers**Instrument has a negative (positive) effect on their treatments.

In order to estimate the CACE of each treatment’s the following assumptions must be satisfied:

**Ignorability of the instrument**\(y^1,y^0,T_i^1,T_i^0 \perp z_j \vert x_1,\ldots,x_l\) for \(i=1,\ldots,k\) and \(j=1\ldots,m\).**Monotonicity**There is no defier, that is, no unit who reacts in the opposite direction of what we expect.**Relevance**The instrument and the treatment must be related.**Exclusion restriction**There is no instrument effect on the outcomes of the never-takers and always-takers.

In addition, to identify all treatment effects, we have to ensure that any combination of the observed treatments can be attained by adjusting the instruments; this only happens when the number of instruments is equal or greater than the number of treatments.

**Full rank model**\(m=k\) (exactly identified) or \(m>k\) (overidentified).

With these assumptions, we can estimate the treatment effect(s) using the following 2SLS.

\[\begin{align*} \hat{T}_{1} &= \alpha_1 + \sum_{j=1}^{k}\alpha_{1j}z_j + \sum_{j=1}^{l}\delta_{1j}x_j \\ \hat{T}_{2} &= \alpha_2 + \sum_{j=1}^{k}\alpha_{2j}z_j + \sum_{j=1}^{l}\delta_{2j}x_j \\ &\vdots \\ \hat{T}_{m} &= \alpha_m + \sum_{j=1}^{k}\alpha_{mj}z_j + \sum_{j=1}^{l}\delta_{mj}x_j \\ y &= \beta_0 + \sum_{i=1}^{m}\beta_i\hat{T}_i +\sum_{i=1}^{l}\gamma_ix_i + \varepsilon. \end{align*}\]As an example, we apply this method on cigarette sales data in the 48 continental US States in 1995 from Webel (2011).

```
<- read.csv("data/cigarette.csv")
cigarette
head(cigarette[, c("packs", "rprice", "rincome", "salestax", "cigtax")])
```

```
packs rprice rincome salestax cigtax
1 101.08543 103.9182 12.91535 0.9216975 26.57481
2 111.04297 115.1854 12.16907 5.4850193 36.41732
3 71.95417 130.3199 13.53964 6.2057067 42.86964
4 56.85931 138.1264 16.07359 9.0363074 40.02625
5 82.58292 109.8097 16.31556 0.0000000 28.87139
6 79.47219 143.2287 20.96236 8.1072834 48.55643
```

We would like to estimate the treatment effect of cigarette price (`rprice`

) on the number of cigarette packs sold (`packs`

). However, the price itself might be affected by the demand for cigarettes; so we instead add two instrumental variables that are rather affected by fiscal policy, namely sales tax (`salestax`

) and cigarette tax (`cigtax`

). We also has income (`rincome`

) as a covariate. Our model is summarized in the following graph:

With this, we can now perform 2SLS using `brm`

.

```
<- bf(log(rprice) ~ salestax + cigtax + log(rincome))
lm1 <- bf(log(packs) ~ log(rprice) + log(rincome))
lm2 <- brm(lm1 + lm2, data=cigarette, refresh=0) IV_cig
```

` IV_cig`

```
Family: MV(gaussian, gaussian)
Links: mu = identity; sigma = identity
mu = identity; sigma = identity
Formula: log(rprice) ~ salestax + cigtax + log(rincome)
log(packs) ~ log(rprice) + log(rincome)
Data: cigarette (Number of observations: 48)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
logrprice_Intercept 4.10 0.10 3.89 4.30 1.00 3290
logpacks_Intercept 9.90 1.10 7.75 12.02 1.00 4381
logrprice_salestax 0.01 0.00 0.01 0.02 1.00 6481
logrprice_cigtax 0.01 0.00 0.01 0.01 1.00 4017
logrprice_logrincome 0.11 0.04 0.03 0.19 1.00 3254
logpacks_logrprice -1.28 0.28 -1.83 -0.74 1.00 2792
logpacks_logrincome 0.29 0.25 -0.18 0.77 1.00 2662
Tail_ESS
logrprice_Intercept 2875
logpacks_Intercept 2757
logrprice_salestax 3128
logrprice_cigtax 3607
logrprice_logrincome 2812
logpacks_logrprice 2425
logpacks_logrincome 2699
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma_logrprice 0.03 0.00 0.03 0.04 1.00 3110 2668
sigma_logpacks 0.20 0.02 0.16 0.25 1.00 2959 2399
Residual Correlations:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
rescor(logrprice,logpacks) -0.25 0.15 -0.53 0.05 1.00 2788
Tail_ESS
rescor(logrprice,logpacks) 2582
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
```

The point estimate of the coefficient is \(-1.28\) with standard error \(0.28\), which implies that \(1\%\) increase in cigarette price would decrease cigarette consumption by \(-1.00\%-1.56\%\). Note that what we just estimated is the treatment effect on the complier states, that is, the states that would increase the cigarette price as a result of the tax raises.

## 18.7 Testing the assumptions

The method of instrument variables require many assumptions, some of which can be explained away with the prior knowledge about the variables. Even if this is not the case, we can still test some of the assumptions from the data.

### 18.7.1 Testing relevance

Not only the instrument must be related to the treatment, the relationship must be strong enough so that we can estimate the treatment effect reliably. A common problem is when we have a *weak instrument*, an instrument that does not have a strong relationship with the treatment, which can lead to biased estimates.

To test the relevance, we can use the *joint \(F\)-test* to compare the first-stage regression with and without the instruments; and we reject the null hypothesis if the model with the instruments has better predictive power. For more details on the join F-test, see e.g. James et al. (2021, chap. 3.2) and Hanck et al. (2019, chap. 7.3).

The usual statistical test with a specified type I error does not exclude weak instrument, so we have to come up with a new criteria on the F-statistic in order to decide if the instrument is strong enough. A rule of thumb says that an F-statistic greater than 10 is sufficient. For more reliable cutoffs and other testing methods for weak instruments, see Stock and Yogo (2002) for an extensive study.

### 18.7.2 Testing exclusion restriction

One way to test the exclusion restriction is by performing 2SLS only on the group of always-takers and never-takers and see if the effect of the instrument is negligible.

Another way is by fitting the second-stage regression with the *observed* treatment and the instrument:

\[ y = \beta_0 +\sum_{i=1}^{k}\alpha_i z_i + \sum_{i=1}^{m}\beta_i T_i +\sum_{i=1}^{l}\gamma_i x_i + \varepsilon. \]

The exclusion restriction says that the instrument only affects the outcome through the treatment, so we can use the joint \(F\)-test to see if some of \(\alpha_i \not= 0\), in which case we say that the assumption is violated.

When the 2SLS model is overidentified, that is, when there is more instruments than the treatments, there is also Sargan test, which look at the relationship between the residuals of the second-stage model and the instruments, and we reject the null if the relationship is significant.

## References

*Twitch Blog*. https://blog.twitch.tv/en/2017/06/30/two-stage-least-squares-for-a-b-tests-669d07f904f7/.

*Introduction to Econometrics with r*.

*University of Duisburg-Essen*.

*An Introduction to Statistical Learning*. Springer US. https://doi.org/10.1007/978-1-0716-1418-1.

*Roblox Blog*. http://https%253A%252F%252Fblog.roblox.com%252F2021%252F09%252Fcausal-inference-using-instrumental-variables%252F.

*JH Stock, MW Watson: Introduction to Econometrics*. Springer Nature BV.