10 Random Effects: Generalized Linear Mixed Models
10.1 Random Effects Modeling of Clustered Categorical Data
10.1.1 The Generalized Linear Mixed Model (GLMM)
\[\begin{equation} g(u_{it}) = u_i + \alpha + \beta_1 x_{it1}+ \dots + \beta_p x_{itp},\ i = 1, \dots, n,\ t = 1, \dots, T. \tag{26} \end{equation}\]
10.1.2 A Logistic GLMM for Binary Matched Pairs
\[\begin{equation} \mathrm{logit}[P(Y_{i1}= 1)] = u_i + \alpha + \beta,\ \ \mathrm{logit}[P(Y_{i2}= 1)] = u_i + \alpha \tag{27} \end{equation}\]
10.1.3 Example: Environmental Opinions Revised
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Envir_opinions.dat",
Opinions header = TRUE, stringsAsFactors = TRUE)
# Make contingency table
<- as.matrix(addmargins(xtabs(~y1 + y2, data = Opinions)))
tab
library(tibble)
# Add label as the first column and variable names
`Table 10.1` <- tibble(`Pay Higher Taxes` = c("Yes", "No", "Total"),
Yes = tab[,1], No = tab[,2], Total = tab[,3])
library(flextable)
<- data.frame(
my_header col_keys = colnames(`Table 10.1`),
line1 = c("Pay Higher Taxes", rep("Cut Living Standards", 2), "Total"),
line2 = colnames(`Table 10.1`)
)
flextable(`Table 10.1`, col_keys = my_header$col_keys) %>%
set_header_df(
mapping = my_header,
key = "col_keys"
%>%
) theme_booktabs() %>%
autofit(part = "all") %>%
align(align = "center", part = "all") %>%
merge_h(part = "header") %>%
merge_v(part = "header") %>%
merge_h(part = "body") %>%
merge_v(part = "body") %>%
align_nottext_col(align = "center") %>%
set_caption(caption = "Opinions relating to the environment")
Pay Higher Taxes | Cut Living Standards | Total | |
---|---|---|---|
Yes | No | ||
Yes | 227 | 132 | 359 |
No | 107 | 678 | 785 |
Total | 334 | 810 | 1,144 |
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Opinions.dat",
Opinions header = TRUE, stringsAsFactors = TRUE)
%>%
Opinions filter(row_number() %in% c(1, 2, n()-1, n()))
person question y
1 1 1 1
2 1 0 1
3 1144 1 0
4 1144 0 0
# library(lme4)
# fit GLMM by adaptive Gaussian quadrature,
# with nAGQ quadrature points, as 10.1.5 explains
# (1|person) is random intercept for person
<- lme4::glmer(y ~ (1|person) + question, family = binomial, nAGQ = 50,
fit data = Opinions)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 50) [glmerMod]
Family: binomial ( logit )
Formula: y ~ (1 | person) + question
Data: Opinions
AIC BIC logLik deviance df.resid
2526.8 2544.0 -1260.4 2520.8 2285
Scaled residuals:
Min 1Q Median 3Q Max
-0.8872 -0.2691 -0.2423 0.4646 1.2519
Random effects:
Groups Name Variance Std.Dev.
person (Intercept) 8.143 2.854
Number of obs: 2288, groups: person, 1144
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.8343 0.1624 -11.295 <2e-16 ***
question 0.2100 0.1301 1.614 0.106
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
question -0.452
10.2 Examples: Random Effects Models for Binary Data
10.2.1 Small-Area Estimation of Binomial Probabilities
\[\begin{equation} \mathrm{logit}[P(Y_{it} = 1)] = \mathrm{logit}(\pi_i) = u_i + \alpha, \tag{28} \end{equation}\]
10.2.2 Example: Estimating Basketball Free Throw Success
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/FreeThrow.dat",
FreeThrow header = TRUE, stringsAsFactors = TRUE)
%>%
FreeThrow filter(row_number() %in% c(1, n()))
player y T
1 Davis.A 32 39
2 Gobert.R 11 14
library(lme4)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
# (1|player) = random intercepts for each player
# nAGQ = number of points for adaptive Gaussian quadrature
<-
fit glmer(y/T ~ 1 + (1|player), family=binomial, weights = T, nAGQ= 100,
data = FreeThrow)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 100) [glmerMod]
Family: binomial ( logit )
Formula: y/T ~ 1 + (1 | player)
Data: FreeThrow
Weights: T
AIC BIC logLik deviance df.resid
33.8 35.7 -14.9 29.8 18
Scaled residuals:
Min 1Q Median 3Q Max
-1.5111 -0.6988 0.1050 0.5467 1.3008
Random effects:
Groups Name Variance Std.Dev.
player (Intercept) 0.1807 0.4251
Number of obs: 20, groups: player, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.174 0.181 6.484 8.95e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fitted(fit) # estimated prob's for 20 players using predicted random effects
1 2 3 4 5 6 7 8
0.7948554 0.7709709 0.6592042 0.7654111 0.7929072 0.7921630 0.7235316 0.7829412
9 10 11 12 13 14 15 16
0.7110775 0.6664667 0.8087839 0.7550424 0.8005694 0.8053209 0.7341520 0.8087839
17 18 19 20
0.7475986 0.7341520 0.7726096 0.7706175
<- c(0.80, 0.77, 0.63, 0.81, 0.84, 0.81, 0.83, 0.78, 0.57, 0.39,
season 0.81, 0.82, 0.81, 0.61, 0.79, 0.74, 0.80, 0.67, 0.77, 0.65)
`Table 10.2` <- bind_cols(Player = word(FreeThrow$player, 1, sep = fixed(".")), T_i = FreeThrow$T,
p_i = round(FreeThrow$y/ FreeThrow$T, 2),
hat_pi_i = round(fitted(fit), 2), pi_i = season)
library(flextable)
flextable(`Table 10.2`) %>%
theme_booktabs() %>%
fix_border_issues() %>%
set_caption(caption = "Estimates of probability of making a free throw, based on data from centers from week 1 of an NBA season.") %>%
set_table_properties(width = .5, layout = "autofit") %>%
::compose(part = "header", j = "T_i",
flextablevalue = as_paragraph("T", as_sub("i"))) %>%
::compose(part = "header", j = "p_i",
flextablevalue = as_paragraph("p", as_sub("i"))) %>%
::compose(part = "header", j = "hat_pi_i",
flextablevalue = as_paragraph("\U1D70B\U0302", as_sub("i"))) %>%
::compose(part = "header", j = "pi_i",
flextablevalue = as_paragraph("\U1D70B", as_sub("i")))
Player | Ti | pi | 𝜋̂i | 𝜋i |
---|---|---|---|---|
Davis | 39 | 0.82 | 0.79 | 0.80 |
Cousins | 49 | 0.78 | 0.77 | 0.77 |
Whiteside | 21 | 0.52 | 0.66 | 0.63 |
Turner | 13 | 0.77 | 0.77 | 0.81 |
Gasol | 19 | 0.84 | 0.79 | 0.84 |
Valanciunas | 14 | 0.86 | 0.79 | 0.81 |
Towns | 3 | 0.33 | 0.72 | 0.83 |
Embiid | 12 | 0.83 | 0.78 | 0.78 |
Nurkic | 19 | 0.63 | 0.71 | 0.57 |
Drummond | 16 | 0.50 | 0.67 | 0.39 |
Lopez | 13 | 0.92 | 0.81 | 0.81 |
Jokic | 3 | 0.67 | 0.76 | 0.82 |
Dieng | 6 | 1.00 | 0.80 | 0.81 |
Adams | 7 | 1.00 | 0.81 | 0.61 |
Kanter | 8 | 0.62 | 0.73 | 0.79 |
Monroe | 13 | 0.92 | 0.81 | 0.74 |
Horford | 6 | 0.67 | 0.75 | 0.80 |
Vucevic | 8 | 0.62 | 0.73 | 0.67 |
Muscala | 10 | 0.80 | 0.77 | 0.77 |
Gobert | 14 | 0.79 | 0.77 | 0.65 |
# compose is also used by purrr (and igraph)
# Unicode is a hat over the next character and U1D70B is pi
summary(glm(y/T ~ 1, family = binomial, weights = T, data = FreeThrow))
Call:
glm(formula = y/T ~ 1, family = binomial, data = FreeThrow, weights = T)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3217 -0.8336 0.2710 0.9268 1.9710
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.1400 0.1363 8.361 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31.322 on 19 degrees of freedom
Residual deviance: 31.322 on 19 degrees of freedom
AIC: 81.032
Number of Fisher Scoring iterations: 4
10.2.3 Example: Opinions about Legalizing Abortion Revised
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Abortion.dat",
Abortion header = TRUE, stringsAsFactors = TRUE) %>%
mutate(sit = factor(situation, levels = c(3,1,2)))
%>%
Abortion filter(row_number() %in% c(1:3, n()-2, n()-1, n()))
person gender situation response sit
1 1 1 1 1 1
2 1 1 2 1 2
3 1 1 3 1 3
4 1850 0 1 0 1
5 1850 0 2 0 2
6 1850 0 3 0 3
<-
fit ::glmer(response ~ (1 | person) + sit + gender, family = binomial,
lme4nAGQ = 100, data = Abortion)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 100) [glmerMod]
Family: binomial ( logit )
Formula: response ~ (1 | person) + sit + gender
Data: Abortion
AIC BIC logLik deviance df.resid
4588.5 4621.6 -2289.3 4578.5 5545
Scaled residuals:
Min 1Q Median 3Q Max
-1.7810 -0.1223 -0.1055 0.1396 1.7149
Random effects:
Groups Name Variance Std.Dev.
person (Intercept) 76.49 8.746
Number of obs: 5550, groups: person, 1850
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.61937 0.37816 -1.638 0.101
sit1 0.83478 0.16007 5.215 1.84e-07 ***
sit2 0.29245 0.15670 1.866 0.062 .
gender 0.01262 0.48970 0.026 0.979
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) sit1 sit2
sit1 -0.218
sit2 -0.211 0.508
gender -0.725 0.000 0.000
10.3 Extensions to Multinomial Responses and Multiple Random Effect Terms
10.3.1 Example: Insomnia Study Revisited
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Insomnia.dat",
Insomnia header = TRUE, stringsAsFactors = TRUE)
%>%
Insomnia filter(row_number() %in% c(1, 2, n()-1, n()))
case treat occasion response
1 1 1 0 1
2 1 1 1 1
3 127 0 0 4
4 127 0 1 4
# response var. from clmm must be a factor
<-
fit ::clmm(factor(response) ~ (1|case) + occasion + treat + occasion:treat,
ordinalnAGQ = 20, data = Insomnia)
summary(fit)
Cumulative Link Mixed Model fitted with the adaptive Gauss-Hermite
quadrature approximation with 20 quadrature points
formula: factor(response) ~ (1 | case) + occasion + treat + occasion:treat
data: Insomnia
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 478 -592.97 1199.94 401(1549) 8.28e-05 6.1e+01
Random effects:
Groups Name Variance Std.Dev.
case (Intercept) 3.628 1.905
Number of groups: case 239
Coefficients:
Estimate Std. Error z value Pr(>|z|)
occasion -1.60158 0.28336 -5.652 1.59e-08 ***
treat -0.05785 0.36630 -0.158 0.87450
occasion:treat -1.08129 0.38046 -2.842 0.00448 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -3.4896 0.3588 -9.727
2|3 -1.4846 0.2903 -5.114
3|4 0.5613 0.2702 2.078
library(multgee)
<-
fit_multgee ordLORgee(response ~ occasion + treat + occasion:treat, id = case,
LORstr = "independence", data = Insomnia)
<-
fit_ordinal ::clmm(factor(response) ~ (1|case) + occasion + treat + occasion:treat,
ordinalnAGQ = 20, data = Insomnia)
`Table 10.4` <-
data.frame(Effect = c("Occasion", "Treatment", "Treatment x Occasion"),
mm_coef = c(coef(fit_multgee)["occasion"],
coef(fit_multgee)["treat"],
coef(fit_multgee)["occasion:treat"]),
mm_se = c(coef(summary(fit_multgee))["occasion", "san.se"],
coef(summary(fit_multgee))["treat", "san.se"],
coef(summary(fit_multgee))["occasion:treat", "san.se"]),
# note the -1 because of the (weird negative) model specification
re_coef = c(coef(fit_ordinal)["occasion"],
coef(fit_ordinal)["treat"],
coef(fit_ordinal)["occasion:treat"]) * -1,
re_se = c(coef(summary(fit_ordinal))["occasion", "Std. Error"],
coef(summary(fit_ordinal))["treat", "Std. Error"],
coef(summary(fit_ordinal))["occasion:treat",
"Std. Error"])) %>%
mutate(across(where(is.numeric), round, 3)) %>%
mutate(`Marginal Model GEE` = paste0(mm_coef, " (", mm_se, ")")) %>%
mutate(`Random Effects Model (GLMM) ML` = paste0(re_coef, " (", re_se, ")")) %>%
select(Effect, `Marginal Model GEE`, `Random Effects Model (GLMM) ML`)
library(flextable)
flextable(`Table 10.4`) %>%
set_caption(caption = "Results of fitting cumulative logit marginal model and random effects model to Table 9.2, with standard errors in parentheses.") %>%
set_table_properties(width = .75, layout = "autofit") %>%
align(align = "center", part = "all")
Effect | Marginal Model GEE | Random Effects Model (GLMM) ML |
---|---|---|
Occasion | 1.038 (0.168) | 1.602 (0.283) |
Treatment | 0.034 (0.238) | 0.058 (0.366) |
Treatment x Occasion | 0.708 (0.244) | 1.081 (0.38) |
10.3.2 Meta-Analysis: Bivariate Random Effects for Association Heterogeneity
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Ulcers.dat",
Ulcers header = TRUE, stringsAsFactors = TRUE)
%>%
Ulcers filter(row_number() %in% c(1, 2, n()-1, n()))
study treat y n
1 1 1 7 15
2 1 0 11 13
3 41 1 0 9
4 41 0 0 16
<-
fit ::glmer(y/n ~ (1|study) + treat, family = binomial, weights = n, nAGQ = 50,
lme4data = Ulcers)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 50) [glmerMod]
Family: binomial ( logit )
Formula: y/n ~ (1 | study) + treat
Data: Ulcers
Weights: n
AIC BIC logLik deviance df.resid
306.1 313.3 -150.0 300.1 79
Scaled residuals:
Min 1Q Median 3Q Max
-4.1808 -0.7646 0.0229 0.9771 4.5243
Random effects:
Groups Name Variance Std.Dev.
study (Intercept) 0.6721 0.8198
Number of obs: 82, groups: study, 41
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3212 0.1500 -2.142 0.0322 *
treat -1.1728 0.1176 -9.969 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
treat -0.313
\[\begin{equation} \mathrm{logit}[P(Y_{i1}= 1)] = u_i + \alpha + (\beta + v_1),\ \ \mathrm{logit}[P(Y_{i2}= 1)] = u_i + \alpha \tag{29} \end{equation}\]
<-
fit2 ::glmer(y/n ~ (1 + treat|study), family = binomial, weights = n,
lme4data = Ulcers)
summary(fit2)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: y/n ~ (1 + treat | study)
Data: Ulcers
Weights: n
AIC BIC logLik deviance df.resid
466.9 476.5 -229.4 458.9 78
Scaled residuals:
Min 1Q Median 3Q Max
-1.4935 -0.1543 0.0495 0.2850 1.1408
Random effects:
Groups Name Variance Std.Dev. Corr
study (Intercept) 3.307 1.819
treat 4.257 2.063 -0.93
Number of obs: 82, groups: study, 41
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2494 0.1626 -7.682 1.57e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
10.4 Multilevel (Hierarchical) Models
10.4.2 Example: Smoking Prevention and Cessation Study
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Smoking.dat",
Smoking header = TRUE, stringsAsFactors = TRUE)
`Table 10.6` <- Smoking %>%
filter(row_number() %in% c(1, 2, n())) %>%
select(-y)
library(flextable)
flextable(`Table 10.6`) %>%
set_caption(caption = "Part of smoking prevention and cessation data file.") %>%
set_table_properties(width = .75, layout = "autofit") %>%
align(align = "center", part = "all")
student | school | class | SC | TV | PTHK | THK |
---|---|---|---|---|---|---|
1 | 403 | 403,101 | 1 | 0 | 2 | 3 |
2 | 403 | 403,101 | 1 | 0 | 4 | 4 |
1,600 | 515 | 515,113 | 0 | 0 | 3 | 3 |
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Smoking.dat",
Smoking header = TRUE, stringsAsFactors = TRUE)
%>%
Smoking filter(row_number() %in% c(1, n()))
student school class SC TV PTHK THK y
1 1 403 403101 1 0 2 3 1
2 1600 515 515113 0 0 3 3 1
<-
fit ::glmer(y ~ (1|class) + (1|school) + PTHK + SC + TV, family = binomial,
lme4data = Smoking)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: y ~ (1 | class) + (1 | school) + PTHK + SC + TV
Data: Smoking
AIC BIC logLik deviance df.resid
2070.2 2102.5 -1029.1 2058.2 1594
Scaled residuals:
Min 1Q Median 3Q Max
-2.9905 -0.8823 0.4645 0.8379 2.0923
Random effects:
Groups Name Variance Std.Dev.
class (Intercept) 0.16728 0.4090
school (Intercept) 0.06413 0.2532
Number of obs: 1600, groups: class, 135; school, 28
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.13163 0.17827 -6.348 2.18e-10 ***
PTHK 0.39512 0.04627 8.539 < 2e-16 ***
SC 0.80014 0.16893 4.737 2.17e-06 ***
TV 0.10786 0.16819 0.641 0.521
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) PTHK SC
PTHK -0.576
SC -0.498 0.082
TV -0.494 0.023 -0.005
<- glm(y ~ PTHK + SC + TV, family = binomial, data = Smoking)
fit.glm summary(fit.glm)
Call:
glm(formula = y ~ PTHK + SC + TV, family = binomial, data = Smoking)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1313 -1.0944 0.6746 1.0936 1.6744
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.11920 0.13127 -8.526 < 2e-16 ***
PTHK 0.39886 0.04401 9.063 < 2e-16 ***
SC 0.76532 0.10563 7.245 4.32e-13 ***
TV 0.12305 0.10462 1.176 0.24
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2212.5 on 1599 degrees of freedom
Residual deviance: 2077.2 on 1596 degrees of freedom
AIC: 2085.2
Number of Fisher Scoring iterations: 4
10.5 Latent Class Models *
10.5.2 Example: Latent Class Model for Rater Agreement
<- read.table("http://users.stat.ufl.edu/~aa/cat/data/Carcinoma.dat",
Carcinoma header = TRUE, stringsAsFactors = TRUE)
<- -Carcinoma + 2
Carcinoma
%>%
Carcinoma filter(row_number() %in% c(1, n()))
library(poLCA)
poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 1, data = Carcinoma)
poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 2, nrep = 9, data = Carcinoma)
poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 3, nrep = 9, data = Carcinoma)
poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 4, nrep = 9, data = Carcinoma)