10 Random Effects: Generalized Linear Mixed Models

10.1 Random Effects Modeling of Clustered Categorical Data

10.1.1 The Generalized Linear Mixed Model (GLMM)

\[\begin{equation} g(u_{it}) = u_i + \alpha + \beta_1 x_{it1}+ \dots + \beta_p x_{itp},\ i = 1, \dots, n,\ t = 1, \dots, T. \tag{26} \end{equation}\]

10.1.2 A Logistic GLMM for Binary Matched Pairs

\[\begin{equation} \mathrm{logit}[P(Y_{i1}= 1)] = u_i + \alpha + \beta,\ \ \mathrm{logit}[P(Y_{i2}= 1)] = u_i + \alpha \tag{27} \end{equation}\]

10.1.3 Example: Environmental Opinions Revised

Opinions <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Envir_opinions.dat",
                        header = TRUE, stringsAsFactors = TRUE)
# Make contingency table
tab <- as.matrix(addmargins(xtabs(~y1 + y2, data = Opinions)))

library(tibble)
# Add label as the first column and variable names
`Table 10.1` <- tibble(`Pay Higher Taxes` = c("Yes", "No", "Total"), 
                      Yes = tab[,1], No = tab[,2], Total = tab[,3])


library(flextable)
my_header <- data.frame(
  col_keys = colnames(`Table 10.1`),
  line1 = c("Pay Higher Taxes", rep("Cut Living Standards", 2), "Total"),
  line2 = colnames(`Table 10.1`)
)

flextable(`Table 10.1`, col_keys = my_header$col_keys) %>%
  set_header_df(
    mapping = my_header,
    key = "col_keys"
  ) %>% 
  theme_booktabs() %>% 
  autofit(part = "all") %>%    
  align(align = "center", part = "all") %>% 
  merge_h(part = "header") %>% 
  merge_v(part = "header") %>% 
  merge_h(part = "body") %>% 
  merge_v(part = "body") %>%
  align_nottext_col(align = "center") %>% 
  set_caption(caption = "Opinions relating to the environment") 
Table 11: Opinions relating to the environment

Pay Higher Taxes

Cut Living Standards

Total

Yes

No

Yes

227

132

359

No

107

678

785

Total

334

810

1,144

Opinions <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Opinions.dat",
                        header = TRUE, stringsAsFactors = TRUE)

Opinions %>% 
  filter(row_number() %in% c(1, 2, n()-1, n())) 
  person question y
1      1        1 1
2      1        0 1
3   1144        1 0
4   1144        0 0
# library(lme4)
# fit GLMM by adaptive Gaussian quadrature, 
# with nAGQ quadrature points, as 10.1.5 explains 
# (1|person) is random intercept for person
fit <- lme4::glmer(y ~ (1|person) + question, family = binomial, nAGQ = 50, 
                   data = Opinions)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 50) [glmerMod]
 Family: binomial  ( logit )
Formula: y ~ (1 | person) + question
   Data: Opinions

     AIC      BIC   logLik deviance df.resid 
  2526.8   2544.0  -1260.4   2520.8     2285 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8872 -0.2691 -0.2423  0.4646  1.2519 

Random effects:
 Groups Name        Variance Std.Dev.
 person (Intercept) 8.143    2.854   
Number of obs: 2288, groups:  person, 1144

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.8343     0.1624 -11.295   <2e-16 ***
question      0.2100     0.1301   1.614    0.106    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
         (Intr)
question -0.452

10.1.4 Differing Effects in GLMMs and Marginal Models

10.1.5 Model Fitting for GLMMs

10.1.6 Inference for Model Parameters and Prediction

10.2 Examples: Random Effects Models for Binary Data

10.2.1 Small-Area Estimation of Binomial Probabilities

\[\begin{equation} \mathrm{logit}[P(Y_{it} = 1)] = \mathrm{logit}(\pi_i) = u_i + \alpha, \tag{28} \end{equation}\]

10.2.2 Example: Estimating Basketball Free Throw Success

FreeThrow <- read.table("http://users.stat.ufl.edu/~aa/cat/data/FreeThrow.dat",
                        header = TRUE, stringsAsFactors = TRUE)

FreeThrow %>% 
  filter(row_number() %in% c(1, n()))  
    player  y  T
1  Davis.A 32 39
2 Gobert.R 11 14
library(lme4)
Loading required package: Matrix

Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':

    expand, pack, unpack
# (1|player) = random intercepts for each player
# nAGQ = number of points for adaptive Gaussian quadrature
fit <- 
  glmer(y/T ~ 1 + (1|player), family=binomial, weights = T, nAGQ= 100,
              data = FreeThrow)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 100) [glmerMod]
 Family: binomial  ( logit )
Formula: y/T ~ 1 + (1 | player)
   Data: FreeThrow
Weights: T

     AIC      BIC   logLik deviance df.resid 
    33.8     35.7    -14.9     29.8       18 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.5111 -0.6988  0.1050  0.5467  1.3008 

Random effects:
 Groups Name        Variance Std.Dev.
 player (Intercept) 0.1807   0.4251  
Number of obs: 20, groups:  player, 20

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)    1.174      0.181   6.484 8.95e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fitted(fit)  # estimated prob's for 20 players using predicted random effects
        1         2         3         4         5         6         7         8 
0.7948554 0.7709709 0.6592042 0.7654111 0.7929072 0.7921630 0.7235316 0.7829412 
        9        10        11        12        13        14        15        16 
0.7110775 0.6664667 0.8087839 0.7550424 0.8005694 0.8053209 0.7341520 0.8087839 
       17        18        19        20 
0.7475986 0.7341520 0.7726096 0.7706175 
season <-  c(0.80, 0.77, 0.63, 0.81, 0.84, 0.81, 0.83, 0.78, 0.57, 0.39, 
             0.81, 0.82, 0.81, 0.61, 0.79, 0.74, 0.80, 0.67, 0.77, 0.65)

`Table 10.2` <- bind_cols(Player = word(FreeThrow$player, 1, sep = fixed(".")), T_i = FreeThrow$T, 
                          p_i = round(FreeThrow$y/ FreeThrow$T, 2), 
                          hat_pi_i = round(fitted(fit), 2), pi_i = season) 


library(flextable)
flextable(`Table 10.2`) %>% 
  theme_booktabs() %>% 
  fix_border_issues()  %>%
  set_caption(caption = "Estimates of probability of making a free throw, based on data from centers from week 1 of an NBA season.") %>% 
    set_table_properties(width = .5, layout = "autofit") %>% 
  flextable::compose(part = "header", j = "T_i",
                     value = as_paragraph("T", as_sub("i"))) %>% 
  flextable::compose(part = "header", j = "p_i", 
                     value = as_paragraph("p", as_sub("i"))) %>% 
  flextable::compose(part = "header", j = "hat_pi_i", 
                     value = as_paragraph("\U1D70B\U0302", as_sub("i"))) %>% 
  flextable::compose(part = "header", j = "pi_i", 
                     value = as_paragraph("\U1D70B", as_sub("i"))) 
Table 12: Estimates of probability of making a free throw, based on data from centers from week 1 of an NBA season.

Player

Ti

pi

𝜋̂i

𝜋i

Davis

39

0.82

0.79

0.80

Cousins

49

0.78

0.77

0.77

Whiteside

21

0.52

0.66

0.63

Turner

13

0.77

0.77

0.81

Gasol

19

0.84

0.79

0.84

Valanciunas

14

0.86

0.79

0.81

Towns

3

0.33

0.72

0.83

Embiid

12

0.83

0.78

0.78

Nurkic

19

0.63

0.71

0.57

Drummond

16

0.50

0.67

0.39

Lopez

13

0.92

0.81

0.81

Jokic

3

0.67

0.76

0.82

Dieng

6

1.00

0.80

0.81

Adams

7

1.00

0.81

0.61

Kanter

8

0.62

0.73

0.79

Monroe

13

0.92

0.81

0.74

Horford

6

0.67

0.75

0.80

Vucevic

8

0.62

0.73

0.67

Muscala

10

0.80

0.77

0.77

Gobert

14

0.79

0.77

0.65

# compose is also used by purrr (and igraph)
# Unicode is a hat over the next character and U1D70B is pi
summary(glm(y/T ~ 1, family = binomial, weights = T, data = FreeThrow))

Call:
glm(formula = y/T ~ 1, family = binomial, data = FreeThrow, weights = T)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3217  -0.8336   0.2710   0.9268   1.9710  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.1400     0.1363   8.361   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31.322  on 19  degrees of freedom
Residual deviance: 31.322  on 19  degrees of freedom
AIC: 81.032

Number of Fisher Scoring iterations: 4

10.2.3 Example: Opinions about Legalizing Abortion Revised

Abortion <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Abortion.dat",
                        header = TRUE, stringsAsFactors = TRUE) %>% 
  mutate(sit = factor(situation, levels = c(3,1,2)))

Abortion %>% 
  filter(row_number() %in% c(1:3, n()-2, n()-1, n()))  
  person gender situation response sit
1      1      1         1        1   1
2      1      1         2        1   2
3      1      1         3        1   3
4   1850      0         1        0   1
5   1850      0         2        0   2
6   1850      0         3        0   3
fit <- 
  lme4::glmer(response ~ (1 | person) + sit + gender, family = binomial, 
              nAGQ = 100, data = Abortion)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 100) [glmerMod]
 Family: binomial  ( logit )
Formula: response ~ (1 | person) + sit + gender
   Data: Abortion

     AIC      BIC   logLik deviance df.resid 
  4588.5   4621.6  -2289.3   4578.5     5545 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.7810 -0.1223 -0.1055  0.1396  1.7149 

Random effects:
 Groups Name        Variance Std.Dev.
 person (Intercept) 76.49    8.746   
Number of obs: 5550, groups:  person, 1850

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.61937    0.37816  -1.638    0.101    
sit1         0.83478    0.16007   5.215 1.84e-07 ***
sit2         0.29245    0.15670   1.866    0.062 .  
gender       0.01262    0.48970   0.026    0.979    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
       (Intr) sit1   sit2  
sit1   -0.218              
sit2   -0.211  0.508       
gender -0.725  0.000  0.000

10.2.4 Item Response Models: The Rasch Model

10.2.5 Choice of Marginal Model or Random Effects Model

10.3 Extensions to Multinomial Responses and Multiple Random Effect Terms

10.3.1 Example: Insomnia Study Revisited

Insomnia <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Insomnia.dat",
                        header = TRUE, stringsAsFactors = TRUE)

Insomnia %>% 
  filter(row_number() %in% c(1, 2, n()-1, n()))  
  case treat occasion response
1    1     1        0        1
2    1     1        1        1
3  127     0        0        4
4  127     0        1        4
# response var. from clmm must be a factor
fit <- 
  ordinal::clmm(factor(response) ~ (1|case) + occasion + treat + occasion:treat,
                nAGQ = 20, data = Insomnia)  
summary(fit)
Cumulative Link Mixed Model fitted with the adaptive Gauss-Hermite 
quadrature approximation with 20 quadrature points 

formula: factor(response) ~ (1 | case) + occasion + treat + occasion:treat
data:    Insomnia

 link  threshold nobs logLik  AIC     niter     max.grad cond.H 
 logit flexible  478  -592.97 1199.94 401(1549) 8.28e-05 6.1e+01

Random effects:
 Groups Name        Variance Std.Dev.
 case   (Intercept) 3.628    1.905   
Number of groups:  case 239 

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
occasion       -1.60158    0.28336  -5.652 1.59e-08 ***
treat          -0.05785    0.36630  -0.158  0.87450    
occasion:treat -1.08129    0.38046  -2.842  0.00448 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Threshold coefficients:
    Estimate Std. Error z value
1|2  -3.4896     0.3588  -9.727
2|3  -1.4846     0.2903  -5.114
3|4   0.5613     0.2702   2.078
library(multgee)
fit_multgee <- 
  ordLORgee(response ~ occasion + treat + occasion:treat, id = case,
                 LORstr = "independence", data = Insomnia)

fit_ordinal <- 
  ordinal::clmm(factor(response) ~ (1|case) + occasion + treat + occasion:treat,
                nAGQ = 20, data = Insomnia)

`Table 10.4` <- 
  data.frame(Effect = c("Occasion", "Treatment", "Treatment x Occasion"),
             
             mm_coef = c(coef(fit_multgee)["occasion"],
                         coef(fit_multgee)["treat"],
                         coef(fit_multgee)["occasion:treat"]),
             mm_se = c(coef(summary(fit_multgee))["occasion", "san.se"],
                       coef(summary(fit_multgee))["treat", "san.se"],
                       coef(summary(fit_multgee))["occasion:treat", "san.se"]),
             # note the -1 because of the (weird negative) model specification
             re_coef = c(coef(fit_ordinal)["occasion"],
                         coef(fit_ordinal)["treat"],
                         coef(fit_ordinal)["occasion:treat"]) * -1,
             re_se = c(coef(summary(fit_ordinal))["occasion", "Std. Error"],
                       coef(summary(fit_ordinal))["treat", "Std. Error"],
                       coef(summary(fit_ordinal))["occasion:treat", 
                                                  "Std. Error"])) %>%
  mutate(across(where(is.numeric), round, 3)) %>% 
  mutate(`Marginal Model GEE` = paste0(mm_coef, " (", mm_se, ")")) %>% 
  mutate(`Random Effects Model (GLMM) ML` = paste0(re_coef, " (", re_se, ")")) %>% 
  select(Effect, `Marginal Model GEE`, `Random Effects Model (GLMM) ML`)
  
library(flextable)
flextable(`Table 10.4`) %>% 
   set_caption(caption = "Results of fitting cumulative logit marginal model and random effects model to Table 9.2, with standard errors in parentheses.") %>% 
  set_table_properties(width = .75, layout = "autofit") %>% 
  align(align = "center", part = "all") 
Table 13: Results of fitting cumulative logit marginal model and random effects model to Table 9.2, with standard errors in parentheses.

Effect

Marginal Model GEE

Random Effects Model (GLMM) ML

Occasion

1.038 (0.168)

1.602 (0.283)

Treatment

0.034 (0.238)

0.058 (0.366)

Treatment x Occasion

0.708 (0.244)

1.081 (0.38)

10.3.2 Meta-Analysis: Bivariate Random Effects for Association Heterogeneity

Ulcers <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Ulcers.dat",
                        header = TRUE, stringsAsFactors = TRUE)

Ulcers %>% 
  filter(row_number() %in% c(1, 2, n()-1, n()))  
  study treat  y  n
1     1     1  7 15
2     1     0 11 13
3    41     1  0  9
4    41     0  0 16
fit <- 
  lme4::glmer(y/n ~ (1|study) + treat, family = binomial, weights = n, nAGQ = 50,
              data = Ulcers)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 50) [glmerMod]
 Family: binomial  ( logit )
Formula: y/n ~ (1 | study) + treat
   Data: Ulcers
Weights: n

     AIC      BIC   logLik deviance df.resid 
   306.1    313.3   -150.0    300.1       79 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.1808 -0.7646  0.0229  0.9771  4.5243 

Random effects:
 Groups Name        Variance Std.Dev.
 study  (Intercept) 0.6721   0.8198  
Number of obs: 82, groups:  study, 41

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.3212     0.1500  -2.142   0.0322 *  
treat        -1.1728     0.1176  -9.969   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
      (Intr)
treat -0.313

\[\begin{equation} \mathrm{logit}[P(Y_{i1}= 1)] = u_i + \alpha + (\beta + v_1),\ \ \mathrm{logit}[P(Y_{i2}= 1)] = u_i + \alpha \tag{29} \end{equation}\]

fit2 <-  
  lme4::glmer(y/n ~ (1 + treat|study), family = binomial, weights = n, 
              data = Ulcers)
summary(fit2)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: y/n ~ (1 + treat | study)
   Data: Ulcers
Weights: n

     AIC      BIC   logLik deviance df.resid 
   466.9    476.5   -229.4    458.9       78 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4935 -0.1543  0.0495  0.2850  1.1408 

Random effects:
 Groups Name        Variance Std.Dev. Corr 
 study  (Intercept) 3.307    1.819         
        treat       4.257    2.063    -0.93
Number of obs: 82, groups:  study, 41

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.2494     0.1626  -7.682 1.57e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

10.4 Multilevel (Hierarchical) Models

10.4.1 Example: Two-Level Model for Student Performance

10.4.2 Example: Smoking Prevention and Cessation Study

Smoking <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Smoking.dat",
                        header = TRUE, stringsAsFactors = TRUE)

`Table 10.6` <- Smoking %>% 
  filter(row_number() %in% c(1, 2, n()))  %>% 
  select(-y)

library(flextable)
flextable(`Table 10.6`) %>% 
   set_caption(caption = "Part of smoking prevention and cessation data file.") %>% 
  set_table_properties(width = .75, layout = "autofit") %>% 
  align(align = "center", part = "all") 
Table 14: Part of smoking prevention and cessation data file.

student

school

class

SC

TV

PTHK

THK

1

403

403,101

1

0

2

3

2

403

403,101

1

0

4

4

1,600

515

515,113

0

0

3

3

Smoking <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Smoking.dat",
                        header = TRUE, stringsAsFactors = TRUE)

Smoking %>% 
  filter(row_number() %in% c(1, n()))  
  student school  class SC TV PTHK THK y
1       1    403 403101  1  0    2   3 1
2    1600    515 515113  0  0    3   3 1
fit <-  
  lme4::glmer(y ~ (1|class) + (1|school) + PTHK + SC + TV, family = binomial, 
              data = Smoking)
summary(fit)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: y ~ (1 | class) + (1 | school) + PTHK + SC + TV
   Data: Smoking

     AIC      BIC   logLik deviance df.resid 
  2070.2   2102.5  -1029.1   2058.2     1594 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.9905 -0.8823  0.4645  0.8379  2.0923 

Random effects:
 Groups Name        Variance Std.Dev.
 class  (Intercept) 0.16728  0.4090  
 school (Intercept) 0.06413  0.2532  
Number of obs: 1600, groups:  class, 135; school, 28

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.13163    0.17827  -6.348 2.18e-10 ***
PTHK         0.39512    0.04627   8.539  < 2e-16 ***
SC           0.80014    0.16893   4.737 2.17e-06 ***
TV           0.10786    0.16819   0.641    0.521    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr) PTHK   SC    
PTHK -0.576              
SC   -0.498  0.082       
TV   -0.494  0.023 -0.005
fit.glm <- glm(y ~ PTHK + SC + TV, family = binomial, data = Smoking)
summary(fit.glm)

Call:
glm(formula = y ~ PTHK + SC + TV, family = binomial, data = Smoking)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1313  -1.0944   0.6746   1.0936   1.6744  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.11920    0.13127  -8.526  < 2e-16 ***
PTHK         0.39886    0.04401   9.063  < 2e-16 ***
SC           0.76532    0.10563   7.245 4.32e-13 ***
TV           0.12305    0.10462   1.176     0.24    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2212.5  on 1599  degrees of freedom
Residual deviance: 2077.2  on 1596  degrees of freedom
AIC: 2085.2

Number of Fisher Scoring iterations: 4

10.5 Latent Class Models *

10.5.1 Independence Given a Latent Categorical Variable

10.5.2 Example: Latent Class Model for Rater Agreement

Carcinoma <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Carcinoma.dat",
                        header = TRUE, stringsAsFactors = TRUE)

Carcinoma <- -Carcinoma + 2

Carcinoma %>% 
  filter(row_number() %in% c(1, n()))  

library(poLCA)
poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 1, data = Carcinoma)

poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 2, nrep = 9, data = Carcinoma)

poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 3, nrep = 9, data = Carcinoma)

poLCA(cbind(A, B, C, D, E, F, G) ~ 1, nclass = 4, nrep = 9, data = Carcinoma)

Exercises