`library("rstanarm")`

# 2 Linear regression with a single predictor

As usual, we load the `rstanarm`

package.

## 2.2 Checking the model’s fit via simulation

We will demonstrate how to check the model’s fit using the elections data above.

#### 2.2.0.1 Step 1: Creating parameters from the fitted model

```
<- 46.3
a <- 3.0
b <- 3.9
sigma <- hibbs$growth
x <- length(x) n
```

#### 2.2.0.2 Step 2: Simulating fake data

```
<- a + b*x + rnorm(n, 0, sigma)
y <- data.frame(x, y) fake
```

#### 2.2.0.3 Step 3: Fitting the model and comparing fitted to assumed parameters

`<- stan_glm(y ~ x, data=fake, refresh=0) fit `

`print(fit)`

```
stan_glm
family: gaussian [identity]
formula: y ~ x
observations: 16
predictors: 2
------
Median MAD_SD
(Intercept) 46.3 1.8
x 4.1 0.7
Auxiliary parameter(s):
Median MAD_SD
sigma 4.1 0.8
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
```

The estimated coefficients (48.5 and 2.0) seem close enough to the assumed true values (46.3 and 3.0).

We can compare the coefficients formally by checking if the true values falls within 68% and 95% confidence intervals of the estimated coefficients. For simplicity, we will only do this for the slope \(b\).

```
<- coef(fit)["x"]
b_hat <- se(fit)["x"]
b_se print(b_hat)
```

```
x
4.105981
```

`print(b_se)`

```
x
0.7335738
```

We then check whether the assumed true value of \(b\) falls within the 68% and 95% confidence intervals. However, since the original data is small, with a sample size of only 16, we need to use \(t\)-distribution instead of the normal distribution.

```
<- qt(0.84, n-2)
t_68 <- qt(0.975, n-2)
t_95 <- abs(b - b_hat) < t_68 * b_se
cover_68 <- abs(b - b_hat) < t_95 * b_se
cover_95 paste("68% coverage: ", mean(cover_68))
```

`[1] "68% coverage: 0"`

`paste("95% coverage: ", mean(cover_95))`

`[1] "95% coverage: 1"`

#### 2.2.0.4 Step 4: Repeating the simulation in a loop

Now, we have to repeat the simulation several times and see if the *coverage probabilities*, that is, the probabilities that the confidence intervals contain the true coefficient, are close to 68% and 95%, respectively.

```
<- 1000
n_fake <- rep(NA, n_fake) # c(NA, ... , NA)
cover_68 <- rep(NA, n_fake) # c(NA, ... , NA)
cover_95 <- qt(0.84, n-2)
t_68 <- qt(0.975, n-2)
t_95 <- txtProgressBar(min=0, max=n_fake, initial=0, style=3)
pb for (s in 1:n_fake){
setTxtProgressBar(pb, s)
<- a + b*x + rnorm(n, 0, sigma)
y <- data.frame(x, y)
fake <- stan_glm(y ~ x, data = fake, refresh = 0)
fit
<- coef(fit)["x"]
b_hat <- se(fit)["x"]
b_se
<- abs(b - b_hat) < t_68 * b_se
cover_68[s] <- abs(b - b_hat) < t_95 * b_se
cover_95[s]
}close(pb)
```

`paste("68% coverage: ", mean(cover_68))`

`[1] "68% coverage: 0.702"`

`paste("95% coverage: ", mean(cover_95))`

`[1] "95% coverage: 0.949"`

This simulation gives the desired result: close to 68% of 68% confidence intervals, and close to 95% of 95% confidence intervals, contain the true coefficients.