The linear regression and the logistic regression are examples of a more general class of models: the generalized linear models (GLM). As in the logistic regression, we can modify the nonlinear function and the model of the outcomes to handle various types of data, such as data with bounded outcomes, count data and data with multi-valued outcomes.

12.1 Definition of generalized linear models

Generalized linear models (GLMs) are a class of models that conform to the transformed linear predictor design. Specifically, a GLM consists of:

A vector of outcome data \(y=(y_1,\ldots,y_n)\).

A matrix of predictor \(X=(\mathbf{1}_n,X_1,\ldots,X_p)\) and a vector of coefficients \(\beta=(\beta_0,\ldots,\beta_p)^T\). A linear predictor is given by \(X\beta\).

A link function\(g\) that transforms the linear predictor to the model’s prediction through its inverse: \(\hat{y} = g^{-1}(X\beta)\).

A distribution of the outcome, given the prediction: \(p(y\vert\hat{y})\).

Other parameters such as variances, overdispersions and the outcome’s upper and/or lower bounds.

Here are the link functions and the outcome distribution that we have used in the linear regression and logistic regression:

In the linear regression, we used \(g(x)=x\) (and so \(g^{-1}(x)=x\)) and \(p(y\vert \hat{y})=p(y\vert X\beta)=\mathcal{N}(y-X\beta,\sigma^2)\).

In the logistic regression, we used the logit function \(g(x)=\text{logit}(x)\) with the logistic inverse: \(g^{-1}(x) = \text{logit}^{-1}(x)= e^x/(1+e^x)\). With the prediction \(\hat{y}=\text{logit}^{-1}(X\beta)\), the outcome distribution is the Bernoulli distribution: \(p(y\vert\hat{y}) = \hat{y}^y(1-\hat{y})^{1-y}\).

12.2 Poisson and negative binomial regression

12.2.1 Poisson regression

We start with the simplest regression model for count data.

\[
y \sim \operatorname{Poisson}(e^{X\beta}).
\]

This is a GLM in which:

The link function is \(g(x)=\log x\).

The prediction is \(\hat{y} = g^{-1}(X\beta) = e^{X\beta}\).

The outcome distribution is \(p(y\vert\hat{y})\sim \operatorname{Poisson}(\hat{y})\).

From the properties of the Poisson distribution, we have

Thus, in a Poisson model, the standard deviation of the outcome is already specified by the model. If the expected outcome is \(\mathbb{E}[y] = 10\), then the prediction errors are mostly in the range of \(\pm\sqrt{\mathbb{E}[y]} \approx\pm 3.33\). But for many datasets, the prediction errors might fall out of these ranges.

12.2.2 Overdispersion and underdispersion

Overdispersion and underdispersion refer to data that show more or less variation than the Poisson model. In other words, when fitting a Poisson model, the residuals of overdispersed data are often greater, while those of underdispersed data are mostly smaller than the square root of the predicted value. For such data, using Poisson models would be inappropriate. In the next section, we introduce another model that allows more prediction errors.

12.2.3 Negative binomial regression

We introduce another model for the count data.

\[
y \sim \operatorname{negative binomial}(e^{X\beta},\phi),
\]

where \(\operatorname{negative binomial}(p,\phi)\) models the number of failures in a sequence of iid \(\operatorname{Bernoulli}(p)\) trials before observing \(\phi\) successes (but the range of \(\phi\) can be extended to positive real numbers).

In the context of count data modeling, \(\phi\) is the “reciprocal dispersion” parameter:

Lower values of \(\phi\) correspond to more overdispersion.

Higher values of \(\phi\) correspond to less overdispersion.

The negative binomial distribution becomes the Poisson distribution in the limit \(\phi\to\infty\) (that is, when there is no overdispersion).

12.2.4 Exposure and offset

In many cases, the outcomes depend on the amounts of exposure to the environment, and so two different outcomes may not be directly compared. For example, the number of daily deaths by country depends population size. The number of car accidents at an intersection depend on the number of cars running through that intersection.

If this is the case, we may instead let \(e^{X\beta}\) be the expected rate of outcomes\(r\), and the expected outcome is the product of exposure \(u\) and rate \(r\).

\[
\mathbb{E}[y] = ur = ue^{X\beta},
\tag{12.1}\]

which leads to a new model with the new predictor \(u\):

We can separate the exposure and the exponential term by applying the logarithm on both sides of Equation 12.1.

\[
\log\mathbb{E}[y] = \log u +X\beta.
\]

With this, we call \(\log u\) the offset.

12.2.5 Example: effect of pest management on reducing cockroach levels

We consider the Roaches dataset, which was used to study the effect of pest management on reducing cockroach levels in urban apartments. In the experiment, there were 158 apartments in the treatment group and 104 apartments in the control group. The data consists of the following variables:

Name

Description

y

post-treatment roach count

roach1

pre-treatment roach level

treatment

0 if control, 1 if treatment

senior

1 if the apartment is restricted to elderly

exposure2

number of days the traps had been laid

We create a new predictor named roach100, which isroach1 scaled down by a factor of 100.

First, we fit Poisson regression by specifying family=poisson in stan_glm. The number of post-treatment roaches depend on the number of days the traps had been laid, so it makes sense to let exposure2 be the model’s exposure.

stan_glm
family: poisson [log]
formula: y ~ roach100 + treatment + senior
observations: 262
predictors: 4
------
Median MAD_SD
(Intercept) 3.1 0.0
roach100 0.7 0.0
treatment -0.5 0.0
senior -0.4 0.0
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Interpreting the coefficients. The fitted Poisson model for the number of post-treatment roaches is

\[
y \sim \operatorname{Poisson}(e^{3.1+0.7*\texttt{roach100}-0.5*\texttt{treatment}-0.4*\texttt{senior}}).
\]

We can interpret the coefficients of the regression as follows:

The intercept is the prediction for \(\texttt{roach100}=0\), \(\texttt{treatment}=0\) and \(\texttt{senior}=0\). More precisely, for a non-senior apartment that was roach-free before and did not receive the pest management, the expected number of roaches is \(e^3.1\approx 22\). Note that sometimes some of the predictors cannot be zero, and the intercept cannot be interpreted in that case.

The coefficient \(0.7\) of \(\texttt{roach100}\) indicates that, for each additional \(100\) roaches (while keeping treatment and senior at the same level), the expected number of post-treatment roaches increases by a factor of \(e^{0.7}\approx 1+0.7=1.7\), or a \(70\%\) increase.

The coefficient \(-0.5\) of \(\texttt{treatment}\) indicates that the number of post-treatment roaches in an apartment with pest management is lower that that of an apartment without pest management (with the same level of roach100 and senior) by a factor of \(e^{-0.5}\approx 1-0.5=0.5\), or a \(50\%\) decrease.

The coefficient \(-0.4\) of \(\texttt{senior}\) indicates that the number of post-treatment roaches in a senior apartment is lower that that of a non-senior apartment (with the same level of roach100 and treatment) by a factor of \(e^{-0.4}\approx 1-0.4=0.6\), or a \(40\%\) decrease.

Checking the fit via simulation. Now we check the model’s fit by looking at the posterior predictive distribution. As usual, we use the posterior_predictive function to generate 4000 numbers of post-treatment roaches, then we sample 400 of them.

After that, we use ppc_dens_overlay to compare between the distributions of original and simulated data. Here, we plot both data in the logarithmic scale to make the difference between the plots more pronounced.

The plots show that the original data is overdispersed and contains a lot of zeros; this indicates that the Poisson model might not be suitable as it only allows a relatively small number of zeros.

We thus turn to the negative binomial regression, which can be done by specifying family=neg_binomial_2. As before, we let exposure2 be the exposure. There is no need to specify the reciprocal dispersion parameter \(\phi\)—it can be estimated from the data.

stan_glm
family: neg_binomial_2 [log]
formula: y ~ roach100 + treatment + senior
observations: 262
predictors: 4
------
Median MAD_SD
(Intercept) 2.8 0.2
roach100 1.3 0.3
treatment -0.8 0.2
senior -0.3 0.3
Auxiliary parameter(s):
Median MAD_SD
reciprocal_dispersion 0.3 0.0
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Let us see how the posterior predictive fits our data.

The negative binomial looks like a better fit with a high probability of zero. However, the model allows the number of roaches to be as high as \(10^5\), which is unrealistic. We will see how we can make adjustment to this model later in the chapter.

12.3 Logistic-binomial and beta-binomial models

12.3.1 Logistic-binomial model

The logistic regression can be used to model the number of successes from \(n\) Bernoulli trials. In this setting, we can use the following GLM design:

The link function is \(g(x)=\text{logit}(x)\).

The prediction is \(\hat{p} = g^{-1}(X\beta) = \text{logit}^{-1}(X\beta)\).

The outcome distribution is \(p(y\vert\hat{p})\sim \operatorname{Binomial}(n,\hat{p})\), where \(n\) is the number of trials.

Let us try this model on the simulated data of basketball shooting. The following code produces \(N =100\) players, each shooting \(n=20\) shots. We also encode our assumption that the field goal percentage is inversely correlated to the weight.

N <-100weight <-rnorm(N, 216, 31)p <-0.6-0.1*(weight -216)/31n <-rep(20, N)y <-rbinom(N, n, p)data <-data.frame(n=n, y=y, weight=weight)head(data)

stan_glm
family: binomial [logit]
formula: cbind(y, 20 - y) ~ weight
observations: 100
predictors: 2
------
Median MAD_SD
(Intercept) 3.735 0.383
weight -0.016 0.002
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Since the data is generated from the binomial model, so we expect the residuals to be very small. Let us see if that is the case.

y <- fit_1a$y[,1]p_hat <-fitted(fit_1a)y_hat <-20* p_hatplot(y_hat, y - y_hat,xlab="Predicted number of field goals",ylab="Residual", pch=20)abline(0, 0)

12.3.2 Overdispersion

Logistic models usually have overdispersion problem, that is, the variation in the data is more than indicated by the model. To detect overdispersion, we first recall the standard deviation of \(y\sim \operatorname{Binomial}(n,\hat{p})\), which is \(\sqrt{n\hat{p}(1-\hat{p})}\), where \(\hat{p}=\text{logit}^{-1}(X\beta)\). Then, we consider the standardized residual:

which has mean \(0\) and standard deviation \(1\). We then formally test for overdispersion by comparing \(\sum_{i=1}^N z^2_i\) to the \(\chi^2_{N-p}\) distribution, where \(p\) is the number of predictors.

12.3.3 Beta-binomial model

To handle the overdispersion, we modify the outcome distribution of the logistic-binomial model to obtain the beta-binomial model, which has the following GLM design:

The link function is \(g(x)=\text{logit}(x)\).

The prediction is \(\hat{p} = g^{-1}(X\beta) = \text{logit}^{-1}(X\beta)\).

The outcome distribution is \[
p(y\vert\hat{p})\sim \operatorname{Beta-Binomial}\left(n,\hat{p}\phi,\left(1-\hat{p}\right)\phi\right),
\] where \(n\) is the number of trials and \(\phi\in(0,1)\) is an overdispersion parameter.

The parameters of the beta-binomial distribution are chosen so that the mean of \(y\) is \(n\hat{p}\) and the standard deviation of \(y\) is controlled by \(\phi\).

Therefore, lowerr \(\phi\) allows for more overdispersion, higher \(\phi\) for less overdispersion. In the limit \(\phi\to \infty\) (no overdispersion), we go back to the logistic-binomial model.

To fit the beta-binomial model, we use the brms library, which has the brm function that allows us to select beta_binomial family. We specify the number of trials via the trials() function. In this example, the number of trials (i.e. number of shots) for each basketball player is 20. The input of trials can be a vector in the case that the number of trials varies by the data point. The dispersion parameter \(\phi\) is estimated directly from the data, so there is no need to specify \(\phi\) in brm.

Family: beta_binomial
Links: mu = logit; phi = identity
Formula: y | trials(20) ~ weight
Data: data (Number of observations: 100)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.723 0.403 2.937 4.506 1.000 4797 3090
weight -0.016 0.002 -0.019 -0.012 0.999 5058 3181
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 129.642 77.237 43.188 339.890 1.003 2204 2323
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The dispersion parameter \(\phi=127.2\) tells us that the predictive standard deviation of the beta-binomial model is \(\sqrt{\frac{100+127.2}{1+127.2}}\approx 1.33\) times that of the logistic-binomial model.

12.4 Ordered and unordered categorical regression

Sometimes the outcomes can have more than two categories, which can be ordered or unordered.

Examples of ordered categories are:

Easy, Medium, Hard

Slow, Normal, Fast

Urban, Sub-urban, Rural

Examples of unordered categories are:

Football, Basketball, Baseball

Car, Bus, Train

A, B, O, AB

Here, we introduce two models: one to handle ordered categories, and another to handle unordered categories.

12.4.1 Ordered logistic regression

For the data with ordered categories \(1,\ldots,K\), we may use the following logistic model:

The link function is \(g(x)=\text{logit}(x)\), where \(g\) can be any strictly increasing function that maps \((0,1)\) to \((-\infty, \infty)\).

If \(g(x)=\text{logit}(x)\), then the resulting model is called the ordered logit model or proportional odds model.

If \(g(x)=\Phi^{-1}(x)\), where \(\Phi\) is the cdf of the standard normal distribution, then the resulting model is called the ordered probit model.

The prediction for the \(k\)-th category is \(\hat{p}_{\leq k} = g^{-1}(c_{k\vert k+1}+X\beta)\) for \(1\leq k \leq K-1\). Here, we have additional parameters \(0<c_{1\vert 2}<c_{2\vert 3}<\ldots<c_{K-1\vert K}\), called the cutpoints.

This formulation is well-defined, as \(\{c_{k\vert k+1}\}_{k=1}^K\) is increasing and \(g\) (and hence \(g^{-1}\)) is increasing imply that \(\hat{p}_{\leq k}\) is increasing as well. The two variations of the model are used in different situations: the ordered logit model is used if the probabilities of \(y\) are expected to be evenly distributed across all categories, while the ordered probit model is used if the probabilities are expected to be concentrated at the middle values.

Let us try the ordered logit model on the Storable data. The data contains records of voting games played between 2-6 college students. A summary of the game is as follows:

Each student was given a total of 4 votes to play in two rounds.

The students had the choice to cast 1, 2 or 3 votes on the first round, and the remaining votes on the second round.

Before casting the first votes, the students were told the payoffs for the winner, which were drawn from the uniform distribution on the interval \([1,100]\).

In addition, the students were told the distribution of the payoffs.

Here, we only take the results of the game played between two students. The vote column contains the number of the first votes and the value column contains the payoffs.

As expected, with more payoffs in the first round, the students were willing to cast more votes.

To fit the ordered logit model, we can use the stan_polr function (proportional odds logistic regression) provided in rstanarm. Here. we specify the prior mean of the \(R^2\) of the prediction to be \(0.3\).

stan_polr
family: ordered [logistic]
formula: factor(vote) ~ value
observations: 20
------
Median MAD_SD
value 0.1 0.0
Cutpoints:
Median MAD_SD
1|2 2.7 1.4
2|3 5.8 2.1
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

The results show two cutpoints: \(c_{1\vert 2} = 2.7\) and \(c_{2\vert 3}=5.9\). Below is the plot of the expected vote given the payoffs from the model.

12.4.2 Unordered logistic regression

For the data with unordered categories \(1,\ldots,K\), we can use the categorical logit model, which has the following GLM specification:

The prediction is \((\hat{p}_{1},\ldots,\hat{p}_K)= g^{-1}(X\beta_1,\ldots,X\beta_{K-1})\) for \(1\leq k \leq K-1\). Notice that we now have \(K-1\) sets of parameters: \(\beta_1,\ldots,\beta_{K-1}\).

Let us fit this model on the iris dataset, which contains data of pedal height, pedal width, sepal height and sepal width of three different species of iris.

To fit the unordered logistic regression model of the species on the other predictors, we again use the brm function.

fit_2 <-brm(Species ~ ., family =categorical(link ="logit"),data=iris, prior=set_prior("normal (0, 8)"),refresh=0)print(fit_2)

Family: categorical
Links: muversicolor = logit; muvirginica = logit
Formula: Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
Data: iris (Number of observations: 150)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
muversicolor_Intercept -14.25 20.10 -55.11 22.78 1.00 2062
muvirginica_Intercept -40.24 23.24 -88.70 2.94 1.00 2270
muversicolor_Sepal.Length 1.38 4.25 -6.94 9.90 1.00 1642
muversicolor_Sepal.Width -4.02 4.05 -12.36 3.49 1.00 1731
muversicolor_Petal.Length 6.90 3.25 0.99 13.70 1.00 1338
muversicolor_Petal.Width 0.16 5.18 -10.23 9.84 1.00 1723
muvirginica_Sepal.Length -0.91 4.31 -9.14 7.49 1.00 1715
muvirginica_Sepal.Width -6.82 4.24 -15.56 1.13 1.00 1763
muvirginica_Petal.Length 13.44 3.90 6.34 21.65 1.00 1327
muvirginica_Petal.Width 10.00 5.41 -0.73 20.90 1.00 1962
Tail_ESS
muversicolor_Intercept 2212
muvirginica_Intercept 2368
muversicolor_Sepal.Length 2441
muversicolor_Sepal.Width 2005
muversicolor_Petal.Length 1965
muversicolor_Petal.Width 2154
muvirginica_Sepal.Length 2283
muvirginica_Sepal.Width 1692
muvirginica_Petal.Length 1635
muvirginica_Petal.Width 2483
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Since the data has three categories, the fitted model has two sets of coefficients associated with two categories: versicolor and virginica.

12.5 Models with unequal error standard deviations

The usual linear regression model \(y\sim\mathcal{N}(X\beta,\sigma^2)\) assumes that the error standard deviation \(\sigma\) is fixed. That said, we can allow the standard deviation to vary by the values of the predictors—such condition is called heteroscedasticity. For example, it is possible to fit the model \(y\sim\mathcal{N}(X\beta_1,e^{X\beta_2})\), where \(\beta_1\) and \(\beta_2\) are the model’s parameters. In the example of earnings data, we could fit the linear regression of log-earnings with the error standard deviation in the form of \(e^{c+d*\texttt{male}}\), allowing different error standard deviations for women and men. This can be done using brm, with bf as a formula helper.

Family: gaussian
Links: mu = identity; sigma = log
Formula: log(earn) | subset(earn > 0) ~ height + male
sigma ~ male
Data: earnings (Number of observations: 1816)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 7.93 0.51 6.97 8.95 1.00 2666 2943
sigma_Intercept -0.12 0.02 -0.16 -0.07 1.00 4280 2922
height 0.02 0.01 0.01 0.04 1.00 2592 2944
male 0.37 0.06 0.25 0.49 1.00 2665 2945
sigma_male -0.06 0.04 -0.13 0.01 1.00 4101 2939
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

We can allow heteroskedasticity in other models as well. For example, the negative binomial model can be extended to \(y\sim\operatorname{negative binomial}(e^{X\beta_1},e^{X\beta_2})\), in which \(\phi\) depends on the predictors. Let us try this on the roaches data, with \(\phi\) (the shape parameter) depending on the treatment and the seniority.

Family: negbinomial
Links: mu = log; shape = log
Formula: y ~ treatment + senior + offset(log(exposure2))
shape ~ treatment + senior
Data: roaches (Number of observations: 262)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 3.75 0.21 3.36 4.18 1.00 4948 2693
shape_Intercept -1.27 0.15 -1.56 -0.99 1.00 4524 2910
treatment -0.54 0.27 -1.06 -0.01 1.00 4729 2661
senior -0.74 0.34 -1.37 -0.03 1.00 5038 2875
shape_treatment -0.17 0.19 -0.54 0.20 1.00 4903 3170
shape_senior -0.55 0.21 -0.97 -0.16 1.00 4836 3303
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

12.6 Mixture models for data with many zeros

12.6.1 Hurdle models

We have briefly mentioned back in Chapter 7.1.1 how to handle data with a lot of zero-valued outcomes, specifically for log-linear models. In summary, we can use the logistic regression to classify whether the outcome is zero, and then we use the linear regression to model non-zero outcomes. We demonstrate here how to do this on the earnings data with stan_glm. First, we fit a logistic regression model on whether the earning is zero.

# (earn > 0) is an indicator of whether the earning is zerofit_1a <-stan_glm((earn ==0) ~ height + male,family=binomial(link="logit"),data=earnings, refresh=0)print(fit_1a)

stan_glm
family: binomial [logit]
formula: (earn == 0) ~ height + male
observations: 1816
predictors: 3
------
Median MAD_SD
(Intercept) 2.9 1.9
height -0.1 0.0
male -1.7 0.3
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

Then, we fit a linear regression model on the data with non-zero earnings.

stan_glm
family: gaussian [identity]
formula: log(earn) ~ height + male
observations: 1629
predictors: 3
subset: earn > 0
------
Median MAD_SD
(Intercept) 8.0 0.5
height 0.0 0.0
male 0.4 0.1
Auxiliary parameter(s):
Median MAD_SD
sigma 0.9 0.0
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg

The resulting model is a mixture of the linear and logistic models. Suppose that we want to simulate the earnings of a randomly chosen 68-inch tall woman from this model; this can be done in two steps: first, we use the logistic model to predict whether her earning is zero, and if it is not zero, we use the linear model to predict her earning. Both predictions can be obtained using posterior_predict.

new <-data.frame(height=68, male=0)pred_1a <-posterior_predict(fit_1a, newdata=new)pred_1b <-posterior_predict(fit_1b, newdata=new)pred <-ifelse(pred_1a==1, 0, exp(pred_1b))print(pred[1:10])

Family: hurdle_lognormal
Links: mu = identity; sigma = identity; hu = logit
Formula: earn ~ height + male
hu ~ height + male
Data: earnings (Number of observations: 1816)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 7.99 0.50 7.01 8.99 1.00 3170 2735
hu_Intercept 2.91 2.01 -0.98 6.88 1.00 3628 3229
height 0.02 0.01 0.01 0.04 1.00 3106 2731
male 0.37 0.06 0.25 0.49 1.00 2992 2943
hu_height -0.07 0.03 -0.13 -0.01 1.00 3660 3077
hu_male -1.67 0.32 -2.33 -1.06 1.00 2728 2445
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.87 0.02 0.84 0.90 1.00 4786 2594
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Notice that the coefficients of the linear regression (Intercept, height and male) and the logistic regression (hu_Intercept, hu_height and hu_male) are almost identical to the ones from the direct fits above. For count data with many zeros, one can use the hurdle_poisson or hurdle_negbinomial family in brm.

12.6.2 Zero-inflated models

As in hurdle models, a zero-inflated model also consists a logistic regression model that predicts whether the outcome is zero, followed by a regression model of our choice. The difference is that, in a hurdle model, outputs of the second model must be non-zero (e.g. log-linear models), while in a zero-inflated model, they can be zero (e.g. Poisson or negative binomial models).

In Section 12.2.5, we have fitted a negative binomial model to the roaches data. Let us try fitting a zero-inflated negative binomial model on this data using brm. Here, we use all predictors, including the exposure variable, to predict whether the number of post-treatment roaches is zero.

The fit of the model is slightly better than that of the negative binomial model. Specifically, there are fewer apartments whose expected post-treatment numbers of roaches are larger than 10,000.