5 Building and Applying Logistic Regression Models

5.1 Strategies in Model Selection

5.1.1 How Many Explanatory Variables Can the Model Handle?

5.1.2 Example: Horseshoe Crab Satellites Revisited

\[\mathrm{logit}[P(Y=1)] = \alpha + \beta_1 weight + \beta_2 width + \beta_3 c_2 + \beta_4 c_3 + \beta_5 c_4 + \beta_6 s_2 + \beta_7 s_3\]

Crabs <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Crabs.dat",
  header = TRUE, stringsAsFactors = TRUE
)

fit <- glm(y ~ weight + width + factor(color) + factor(spine),
  family = binomial, data = Crabs
)

summary(fit)

Call:
glm(formula = y ~ weight + width + factor(color) + factor(spine), 
    family = binomial, data = Crabs)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1977  -0.9424   0.4849   0.8491   2.1198  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)  
(Intercept)    -8.06501    3.92855  -2.053   0.0401 *
weight          0.82578    0.70383   1.173   0.2407  
width           0.26313    0.19530   1.347   0.1779  
factor(color)2 -0.10290    0.78259  -0.131   0.8954  
factor(color)3 -0.48886    0.85312  -0.573   0.5666  
factor(color)4 -1.60867    0.93553  -1.720   0.0855 .
factor(spine)2 -0.09598    0.70337  -0.136   0.8915  
factor(spine)3  0.40029    0.50270   0.796   0.4259  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 225.76  on 172  degrees of freedom
Residual deviance: 185.20  on 165  degrees of freedom
AIC: 201.2

Number of Fisher Scoring iterations: 4
pValue <- 1 - pchisq(225.76 - 185.20, 172 - 165)
format(round(pValue, 8), scientific = FALSE)
[1] "0.00000098"
library(car)
Anova(fit)
Analysis of Deviance Table (Type II tests)

Response: y
              LR Chisq Df Pr(>Chisq)  
weight          1.4099  1    0.23507  
width           1.7968  1    0.18010  
factor(color)   7.5958  3    0.05515 .
factor(spine)   1.0091  2    0.60377  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.1.3 Stepwise Variable Selection Algorithms

5.1.4 Surposeful Selection of Explanatory Variables

5.1.5 Example: Variable Selection for Horseshoe Crabs

5.1.6 AIC and the Bias/Variance Tradeoff

\[\mathrm{AIC = -2(log\ likelihood) + 2(number\ of\ parameters\ in\ model).}\]

Crabs <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Crabs.dat",
  header = TRUE, stringsAsFactors = TRUE
)

fit <- glm(y ~ width + factor(color), family = binomial, data = Crabs)

-2*logLik(fit)
'log Lik.' 187.457 (df=5)
AIC(fit)  # adds 2(number of parameters) = 2(5) = 10  to -2*logLik(fit)
[1] 197.457
fit <- glm(y ~ width + factor(color) + factor(spine), 
           family = binomial, data = Crabs)

library(MASS)
stepAIC(fit)
Start:  AIC=200.61
y ~ width + factor(color) + factor(spine)

                Df Deviance    AIC
- factor(spine)  2   187.46 197.46
<none>               186.61 200.61
- factor(color)  3   194.43 202.43
- width          1   208.83 220.83

Step:  AIC=197.46
y ~ width + factor(color)

                Df Deviance    AIC
<none>               187.46 197.46
- factor(color)  3   194.45 198.45
- width          1   212.06 220.06

Call:  glm(formula = y ~ width + factor(color), family = binomial, data = Crabs)

Coefficients:
   (Intercept)           width  factor(color)2  factor(color)3  factor(color)4  
     -11.38519         0.46796         0.07242        -0.22380        -1.32992  

Degrees of Freedom: 172 Total (i.e. Null);  168 Residual
Null Deviance:      225.8 
Residual Deviance: 187.5    AIC: 197.5
library(tidyverse)
# need response variable in last column of data file
Crabs2 <- Crabs %>% 
  select(weight, width, color, spine, y)

library(leaps)
library(bestglm)
bestglm(Crabs2, family = binomial, IC = "AIC")  # can also use IC="BIC"
Morgan-Tatar search since family is non-gaussian.
AIC
BICq equivalent for q in (0.477740316103793, 0.876695783647898)
Best Model:
               Estimate Std. Error   z value     Pr(>|z|)
(Intercept) -10.0708390  2.8068339 -3.587971 3.332611e-04
width         0.4583097  0.1040181  4.406056 1.052696e-05
color        -0.5090467  0.2236817 -2.275763 2.286018e-02

5.2 Model Checking

5.2.1 Goodness of Fit: Model Comparison Using the Deviance

\[G^2 = 2\sum\mathrm{observed[log(observed/fitted)]}\]

5.2.2 Example: Goodness of Fit for Marijuana Use Survey

library(tidyverse)
Marijuana <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Marijuana.dat",
                        header = TRUE, stringsAsFactors = TRUE)
Marijuana
   race gender yes  no
1 white female 420 620
2 white   male 483 579
3 other female  25  55
4 other   male  32  62
fit <- glm(yes/(yes+no) ~ gender + race, weights = yes + no, family = binomial, 
           data = Marijuana)
summary(fit)  # deviance info is extracted on the next two lines

Call:
glm(formula = yes/(yes + no) ~ gender + race, family = binomial, 
    data = Marijuana, weights = yes + no)

Deviance Residuals: 
       1         2         3         4  
-0.04513   0.04402   0.17321  -0.15493  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.83035    0.16854  -4.927 8.37e-07 ***
gendermale   0.20261    0.08519   2.378  0.01739 *  
racewhite    0.44374    0.16766   2.647  0.00813 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 12.752784  on 3  degrees of freedom
Residual deviance:  0.057982  on 1  degrees of freedom
AIC: 30.414

Number of Fisher Scoring iterations: 3
fit$deviance  # deviance goodness-of-fit statistic
[1] 0.05798151
fit$df.residual  # residual df
[1] 1
1 - pchisq(fit$deviance, fit$df.residual )
[1] 0.8097152
fitted(fit)
        1         2         3         4 
0.4045330 0.4541297 0.3035713 0.3480244 
library(dplyr)

Marijuana %>% 
  mutate(fit.yes = (yes+no)*fitted(fit)) %>% 
  mutate(fit.no = (yes+no)*(1-fitted(fit))) %>% 
  select(race, gender, yes, fit.yes, no, fit.no)
   race gender yes   fit.yes  no    fit.no
1 white female 420 420.71429 620 619.28571
2 white   male 483 482.28571 579 579.71429
3 other female  25  24.28571  55  55.71429
4 other   male  32  32.71429  62  61.28571

5.2.3 Goodness of Fit: Grouped versus Ungrouped Data and Continuous Predictors

5.2.4 Residuals for Logistic Models with Categorical Predictors

\[\mathrm{Standardized\ residual} = \frac{y_i-n_i\hat\pi_i}{SE}.\] ### 5.2.5 Example: Graduate Admissoins at University of Florida

library(tidyverse)
Admissions <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Admissions.dat",
                        header = TRUE, stringsAsFactors = TRUE)

theModel <- glm(yes/(yes+no) ~ department, family = binomial, 
                data = Admissions, 
                weights = yes+no)

theResiduals <- tibble(`Std. Res.` = rstandard(theModel, type = "pearson")) %>% 
  mutate(`Std. Res.` = round(`Std. Res.`, 2)) %>% 
  dplyr::filter(row_number() %% 2 == 0)  # every other record is female


theTable <- Admissions %>%                  
  mutate(gender = case_when(gender == 1 ~ "Female",
                            gender == 0 ~ "Male")) %>% 
  pivot_wider(id_cols = department, names_from = gender, 
              values_from = c(yes, no), names_sep = "") %>% 
  select(department, yesFemale, noFemale, yesMale, noMale) %>% 
  rename("Dept" = department, "Females (Yes)" = yesFemale, "Females (No)" =
           noFemale, "Males (Yes)" = yesMale, "Males (No)" = noMale) %>% 
  bind_cols(theResiduals)

knitr::kable(theTable)
Dept Females (Yes) Females (No) Males (Yes) Males (No) Std. Res.
anthropol 32 81 21 41 -0.76
astronomy 6 0 3 8 2.87
chemistry 12 43 34 110 -0.27
classics 3 1 4 0 -1.07
communicat 52 149 5 10 -0.63
computersci 8 7 6 12 1.16
english 35 100 30 112 0.94
geography 9 1 11 11 2.17
geology 6 3 15 6 -0.26
germanic 17 0 4 1 1.89
history 9 9 21 19 -0.18
latinamer 26 7 25 16 1.65
linguistics 21 10 7 8 1.37
mathematics 25 18 31 37 1.29
philosophy 3 0 9 6 1.34
physics 10 11 25 53 1.32
polisci 25 34 39 49 -0.23
psychology 2 123 4 41 -2.27
religion 3 3 0 2 1.26
romancelang 29 13 6 3 0.14
sociology 16 33 7 17 0.30
statistics 23 9 36 14 -0.01
zoology 4 62 10 54 -1.76

\[logit(\pi_{ik}) = \alpha + \beta_k.\]

5.2.6 Standardized versus Pearson and Deviance Residuals

fit <- glm(yes/(yes+no) ~ gender + race, weights = yes + no, family = binomial, 
           data = Marijuana)

summary(fit)

Call:
glm(formula = yes/(yes + no) ~ gender + race, family = binomial, 
    data = Marijuana, weights = yes + no)

Deviance Residuals: 
       1         2         3         4  
-0.04513   0.04402   0.17321  -0.15493  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.83035    0.16854  -4.927 8.37e-07 ***
gendermale   0.20261    0.08519   2.378  0.01739 *  
racewhite    0.44374    0.16766   2.647  0.00813 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 12.752784  on 3  degrees of freedom
Residual deviance:  0.057982  on 1  degrees of freedom
AIC: 30.414

Number of Fisher Scoring iterations: 3
knitr::kable(
bind_cols(Standardized = rstandard(fit, type = "pearson"), 
          Pearson = residuals(fit, type = "pearson"),
          deviance = residuals(fit, type = "deviance"), 
          `std dev.` = rstandard(fit, type = "deviance"),
          Race = Marijuana$race, Gender = Marijuana$gender) %>% 
  mutate_if(is.numeric, round, digits = 3)
)
Standardized Pearson deviance std dev. Race Gender
-0.241 -0.045 -0.045 -0.241 white female
0.241 0.044 0.044 0.241 white male
0.241 0.174 0.173 0.240 other female
-0.241 -0.155 -0.155 -0.241 other male

5.2.7 Influence Diagnostics for Logistic Regression

5.2.8 Example: Heart Disease and Blood Pressure

HeartBP <- read.table("http://users.stat.ufl.edu/~aa/cat/data/HeartBP.dat",
                      header = TRUE, stringsAsFactors = TRUE)

theModel <- glm(y/n ~ bp, family = binomial, data = HeartBP, weights = n)
summary(theModel)

Call:
glm(formula = y/n ~ bp, family = binomial, data = HeartBP, weights = n)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.0617  -0.5977  -0.2245   0.2140   1.8501  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.082033   0.724320  -8.397  < 2e-16 ***
bp           0.024338   0.004843   5.025 5.03e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 30.0226  on 7  degrees of freedom
Residual deviance:  5.9092  on 6  degrees of freedom
AIC: 42.61

Number of Fisher Scoring iterations: 4
predicted <- round(fitted(theModel)* HeartBP$n, 1)
std_res <- round(rstandard(theModel, type = "pearson"),2)

#influence.measures(theModel)
rDfbeta <- data.frame(dfbetas(theModel)) %>% 
  select(bp) %>% 
  transmute(DFbeta = round(bp, 2))

HeartBP %>% 
  rename("Blood Presure" = bp,
         "Sample size" = n,
         "Observed Disease" = y) %>% 
  bind_cols("Fitted Disease" = predicted,
            "Standardized Residual" = std_res,
            "Dfbeta (not SAS)" = rDfbeta)
  Blood Presure Sample size Observed Disease Fitted Disease
1         111.5         156                3            5.2
2         121.5         252               17           10.6
3         131.5         284               12           15.1
4         141.5         271               16           18.1
5         151.5         139               12           11.6
6         161.5          85                8            8.9
7         176.5          99               16           14.2
8         191.5          43                8            8.4
  Standardized Residual DFbeta
1                 -1.11   0.56
2                  2.37  -2.24
3                 -0.95   0.34
4                 -0.57   0.08
5                  0.13   0.01
6                 -0.33  -0.06
7                  0.65   0.38
8                 -0.18  -0.11
library(ggthemes)
predicted <- predict(theModel, type = "response")
HeartBP %>% 
  mutate(proportion=y/n) %>% 
  ggplot(aes(x= bp, y = proportion)) +
  geom_point(shape = 4) +
  geom_smooth(aes(y = predicted)) +
  theme_few() +
  scale_x_continuous(breaks = seq(110, 200, by = 10), limits = c(110, 200)) +
  ggtitle("Figure 5.1")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

5.3 Infinite Estimates in Logistic Regression

5.3.1 Complete and Quasi-Complete Separation: Perfeect Discrination

library(tidyverse)
library(ggthemes)
data.frame (x = c(10, 20, 30, 40, 60, 70, 80, 90),
                         y = c(0, 0, 0, 0, 1, 1, 1, 1)) %>% 
  ggplot(aes(x = x, y = y)) +
  geom_point() +
  theme_few() +
  ggtitle("Figure 5.2") +
  scale_y_continuous(breaks = c(0,1)) +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.y = element_text(angle = 0, vjust = 0.5),
        plot.title = element_text(hjust = 0.5)) 

5.3.2 Example: Infinite Estimate for Toy Example

x <- c(10, 20, 30, 40, 60, 70, 80, 90)
y <- c(0, 0, 0, 0, 1, 1, 1, 1)

fit <- glm(y ~ x, family = binomial)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# P-value for Wald test of H0: beta = 0
# res deviance = 0 means perfect fit
# Fisher iterations: 25 shows very slow convergence
summary(fit)

Call:
glm(formula = y ~ x, family = binomial)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-1.045e-05  -2.110e-08   0.000e+00   2.110e-08   1.045e-05  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)   -118.158 296046.187       0        1
x                2.363   5805.939       0        1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1.1090e+01  on 7  degrees of freedom
Residual deviance: 2.1827e-10  on 6  degrees of freedom
AIC: 4

Number of Fisher Scoring iterations: 25
# maximized log-likelihood = 0, so maximized likelihood = 1
logLik(fit)
'log Lik.' -1.09134e-10 (df=2)
#library(car)
# P-value for likelihood-ratio test of beta = 0 is more sensitive than p-value of 1 for Wald test
car::Anova(fit)
Analysis of Deviance Table (Type II tests)

Response: y
  LR Chisq Df Pr(>Chisq)    
x    11.09  1  0.0008678 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(profileModel) # ordinary confint function fails for infinite estimates
confintModel(fit, objective = "ordinaryDeviance", method = "zoom")
Preliminary iteration .. Done

Profiling for parameter (Intercept) ... Done
Profiling for parameter x ... Done
Zooming for parameter (Intercept) ...
Zooming for parameter x ...
                 Lower    Upper
(Intercept)       -Inf -2.66963
x           0.05876605      Inf
attr(,"fitted object")
fit

5.3.3 Sparse Data and Infinite Effects with Categorical Predictors

5.3.4 Example: Risk Factors for Endometrial Cancer Grade

library(tidyverse)
Endo <- read.table("http://users.stat.ufl.edu/~aa/cat/data/Endometrial.dat",
                   header = TRUE, stringsAsFactors = TRUE)

Endo %>% 
  filter(row_number() %in% c(1, 2, n())) 
  NV PI   EH HG
1  0 13 1.64  0
2  0 16 2.26  0
3  0 33 0.85  1
xtabs(~NV + HG, data = Endo) # quasi-complete separation when NV = 1, no HG=0 cases occur
   HG
NV   0  1
  0 49 17
  1  0 13
fit <- glm(HG ~ NV + PI + EH, family = binomial, data = Endo)

# NV's true estimate is infinity
summary(fit)

Call:
glm(formula = HG ~ NV + PI + EH, family = binomial, data = Endo)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.50137  -0.64108  -0.29432   0.00016   2.72777  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    4.30452    1.63730   2.629 0.008563 ** 
NV            18.18556 1715.75089   0.011 0.991543    
PI            -0.04218    0.04433  -0.952 0.341333    
EH            -2.90261    0.84555  -3.433 0.000597 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 104.903  on 78  degrees of freedom
Residual deviance:  55.393  on 75  degrees of freedom
AIC: 63.393

Number of Fisher Scoring iterations: 17
logLik(fit)
'log Lik.' -27.69663 (df=4)
library(car)
Anova(fit) # NV P= .0022 vs Wald 0.99
Analysis of Deviance Table (Type II tests)

Response: HG
   LR Chisq Df Pr(>Chisq)    
NV   9.3576  1   0.002221 ** 
PI   0.9851  1   0.320934    
EH  19.7606  1  8.777e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(profileModel)  # ordinary confint function fails for infinite est
confintModel(fit, objective = "ordinaryDeviance", method = "zoom") # 95% profile likelihood CI for beta 1 
Preliminary iteration .... Done

Profiling for parameter (Intercept) ... Done
Profiling for parameter NV ... Done
Profiling for parameter PI ... Done
Profiling for parameter EH ... Done
Zooming for parameter (Intercept) ...
Zooming for parameter NV ...
Zooming for parameter PI ...
Zooming for parameter EH ...
                 Lower       Upper
(Intercept)  1.4290183  7.94717425
NV           1.2856920         Inf
PI          -0.1370717  0.03810556
EH          -4.7866976 -1.43682244
attr(,"fitted object")
fit
library(detectseparation) # new package
# 0 denotes finite est, Inf denotes infinite est.
glm(HG ~ NV + PI + EH, family = binomial, data = Endo, method = "detectSeparation")
Implementation: ROI | Solver: lpsolve 
Separation: TRUE 
Existence of maximum likelihood estimates
(Intercept)          NV          PI          EH 
          0         Inf           0           0 
0: finite value, Inf: infinity, -Inf: -infinity

5.4 Bayesian Inference, Penalized Likelihood, and Conditional Likelihood for Logistic Regression *

5.4.1 Bayesian Modeling: Specification of Prior Distributions

5.4.2 Example: Risk Factors for Endometrial Caner Revisited

Endo2 <- 
  Endo %>% 
  mutate(PI2 = scale(PI),
         EH2 = scale(EH),
         NV2 = NV - 0.5) 
  
fit.ML <- glm(HG ~ NV2 + PI2 + EH2, family = binomial, data = Endo2)
summary(fit.ML)

Call:
glm(formula = HG ~ NV2 + PI2 + EH2, family = binomial, data = Endo2)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.50137  -0.64108  -0.29432   0.00016   2.72777  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)    7.8411   857.8755   0.009 0.992707    
NV2           18.1856  1715.7509   0.011 0.991543    
PI2           -0.4217     0.4432  -0.952 0.341333    
EH2           -1.9219     0.5599  -3.433 0.000597 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 104.903  on 78  degrees of freedom
Residual deviance:  55.393  on 75  degrees of freedom
AIC: 63.393

Number of Fisher Scoring iterations: 17
library(MCMCpack)  # b0 = prior mean, B0 = prior precision = 1/variance
Loading required package: coda
##
## Markov Chain Monte Carlo Package (MCMCpack)
## Copyright (C) 2003-2023 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
##
## Support provided by the U.S. National Science Foundation
## (Grants SES-0350646 and SES-0350613)
##
fitBayes <- MCMClogit(HG ~ NV2 + PI2 + EH2, mcmc = 100000, b0 = 0, B0 = 0.01, 
                      data = Endo2) # prior var = 1/0.01 = 100, sd = 10

summary(fitBayes) # posterior deviation

Iterations = 1001:101000
Thinning interval = 1 
Number of chains = 1 
Sample size per chain = 1e+05 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

               Mean     SD Naive SE Time-series SE
(Intercept)  3.2356 2.5948 0.008206       0.026682
NV2          9.1626 5.1639 0.016330       0.053109
PI2         -0.4711 0.4546 0.001437       0.006376
EH2         -2.1332 0.5879 0.001859       0.008450

2. Quantiles for each variable:

               2.5%     25%     50%     75%   97.5%
(Intercept) -0.3391  1.2619  2.7161  4.6972  9.4865
NV2          2.1105  5.2437  8.1064 12.1133 21.6077
PI2         -1.4163 -0.7676 -0.4445 -0.1538  0.3538
EH2         -3.3781 -2.5087 -2.0977 -1.7227 -1.0786
mean(fitBayes[,2] < 0)  # probability below 0 for 2n model parameter (NV2)
[1] 0.00016

5.4.3 Penalized Likelihood Reduces Bias in Logistic Regression

\[L^*(\beta) = L(\beta) - s(\beta),\]

5.4.4 Example: Risk Factors for Endometrial Cancer Revisited

library(logistf)
fit.penalized <- logistf(HG ~ NV2 + PI2 + EH2, family=binomial, data=Endo2)
options(digits = 4)
summary(fit.penalized)
logistf(formula = HG ~ NV2 + PI2 + EH2, data = Endo2, family = binomial)

Model fitted by Penalized ML
Coefficients:
               coef se(coef) lower 0.95 upper 0.95   Chisq         p method
(Intercept)  0.3080   0.7585    -0.9755     2.7888  0.1690 6.810e-01      2
NV2          2.9293   1.4650     0.6097     7.8546  6.7985 9.124e-03      2
PI2         -0.3474   0.3788    -1.2443     0.4045  0.7468 3.875e-01      2
EH2         -1.7243   0.4990    -2.8903    -0.8162 17.7593 2.507e-05      2

Method: 1-Wald, 2-Profile penalized log-likelihood, 3-None

Likelihood ratio test=43.66 on 3 df, p=1.786e-09, n=79
Wald test = 21.67 on 3 df, p = 7.641e-05
options(digits = 7)

5.4.5 Conditional Likelihood and Conditional Logistic Regression

\[\mathrm{logit}[P(Y_{it} = 1)] = \alpha_i + \beta x_{it},\]

5.4.6 Conditional Logistic Regression and Exact Tests for Contingency Tables

5.6 Sample Size and Power for Logistic Regression *

5.6.1 Sample Size for Comparing Two Proportions

5.6.2 Sample Size in Logistic Regression Modeling

5.6.3 Example: Modeling the Probability of Heart Disease