Quasi-likelihood models

With the generalized linear model, you have the option of choosing a model that works well with a variety of different distributions (normal, Poisson, Gamma, etc.). But you also have the option of working outside of these distributions. You can fit models that do not specify the distribution, but which fit a particular transformation of the linear effect and particular mean-variance relationship.

library(broom)

Warning: package 'broom' was built under R version 4.5.1

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(foreign)
library(magrittr)

Link and variance functions

A distribution is said to be in the exponential family

$ln f(y, \theta)=d(\theta)e(y)+g(\theta)+h(y)$

but sometimes this is rewritten in the form

$ln f(y, \theta)=\frac{\theta y+b(\theta)}{a(\phi)}+c(y, \phi)$

where $\phi$ is a dispersion or scale parameter. This is called the canonical form. There are names for these parameters and functions.

$\theta$ is the natural parameter,
$\phi$ is the dispersion parameter,
$a( ) is the variance function, and
$g( ) = b’( )^{-1} is the link function.

If you also define

$\mu=E[Y]$

then the link function, g, and its inverse, b, define how the regression model works.

For Poisson regression, the link function is

$ln(f(y, \lambda) = ln(\frac{\lambda^y e^{-\lambda}}{y!})=\lambda ln(\lambda)-\lambda-ln(y!)$

To write this in the canonical form, you need to define the natural parameter as

$\theta=ln(\lambda)$

and

$a(\phi)=1$

The link function is

$g(\mu)=ln(\mu)$

and the variance function is

$V(\mu)=\mu$

These functions are derived from the form of the Poisson distribution, and the estimation methods in the generalized linear model depend only on the link and variance functions.

In 1974, Robert Weddeburn developed a theory that would work for any link and variance function, even if it could not be derived from a particular probability distribution. His approach used the same principles as maximum likelihood, but it could not be called a likelihood approach since there was no probability distribution. He called his approach quasi-likelihood and showed that it created estimates with reasonable properties.

The quasi Poisson model

If you modify the variance function of Poisson regression, from

$V(\mu)=\mu$

$V(\mu)=\phi \mu$

and keep the link function the same, then you get a quasi Poisson model. There is no simple probability distribution that would produce this link and variance function, but it seems like a more reasonable approach, because there is more flexibility.

For technical reasons, the parameter $\phi$ has to be greater than one, implying that the dispersion of the quasiPoisson model is greater than the dispersion of the Poissson model. The quasiPoisson model is often called an overdispersion model.

There are two ways to estimate $\phi$. The first is very simple and direct.

$\hat{\phi}= dispersion / df$

The amount of overdispersion is equal to much much larger the dispersion is than the degrees of freedom.

A second approach estimates $\phi$ iteratively along with the other terms in the regression model.

One important limitation of the quasi likelihood approach is that there is no natural way to estimate AIC (Akaike’s Information Criteria) an important statistic for comparing different models.

Start with Poisson regression

It is easiest to understand the quasiPoisson model by comparing it to the Poisson model.

Francis Huang has a nice web page showing two datasets and how to analyze them using the Poisson regression model as well as several more sophisticated models. The first data set is available for anyone to download. The link I originally used no longer works.

fn <- "http://faculty.missouri.edu/huangf/data/poisson/articles.csv"
articles <- read.csv(fn)
head(articles)

Here is a different link for the same file, used at German Rodriguez’s website. I’ve placed a copy on my website in case this link breaks. Also, this dataset is available with the pscl library of R under the name, bioChemists.

fn <- "http://www.pmean.com/00files/couart2.dta"
articles <- read.dta(fn)
head(articles)

  art   fem     mar kid5  phd ment
1   0   Men Married    0 2.52    7
2   0 Women  Single    0 2.05    6
3   0 Women  Single    0 3.75    6
4   0   Men Married    1 1.18    3
5   0 Women  Single    0 3.75   26
6   0 Women Married    2 3.59    2

The variables are + gender (fem), + marital status (mar), + number of children (kid5), + prestige of graduate program (phd), + the number of articles published by the individual’s mentor (ment) + the number of articles published by the scientist (art: the outcome).

To fit a Poisson regression model in R, use the glm function with the argument, family=“poisson”.

pois1 <- glm(art ~ fem + ment + phd + mar + kid5, family=poisson, data = articles)
summary(pois1)


Call:
glm(formula = art ~ fem + ment + phd + mar + kid5, family = poisson, 
    data = articles)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.304617   0.102981   2.958   0.0031 ** 
femWomen    -0.224594   0.054613  -4.112 3.92e-05 ***
ment         0.025543   0.002006  12.733  < 2e-16 ***
phd          0.012823   0.026397   0.486   0.6271    
marMarried   0.155243   0.061374   2.529   0.0114 *  
kid5        -0.184883   0.040127  -4.607 4.08e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: 3314.1

Number of Fisher Scoring iterations: 5

The interpretation of the coefficients should be done after exponentiating their values.

pois1 %>%
  tidy %>%
  mutate(lower_limit=estimate-1.96*std.error) %>%
  mutate(upper_limit=estimate+1.96*std.error) %>%
  select(term, estimate, lower_limit, upper_limit) %>%
  mutate(estimate=exp(estimate)) %>%
  mutate(lower_limit=exp(lower_limit)) %>%
  mutate(upper_limit=exp(upper_limit)) -> pois1_coefficients
pois1_coefficients

# A tibble: 6 × 4
  term        estimate lower_limit upper_limit
  <chr>          <dbl>       <dbl>       <dbl>
1 (Intercept)    1.36        1.11        1.66 
2 femWomen       0.799       0.718       0.889
3 ment           1.03        1.02        1.03 
4 phd            1.01        0.962       1.07 
5 marMarried     1.17        1.04        1.32 
6 kid5           0.831       0.768       0.899

Fitting a quasi-Poisson model

quasi1 <- glm(art ~ fem + ment + phd + mar + kid5, family=quasipoisson, data = articles)
summary(quasi1)


Call:
glm(formula = art ~ fem + ment + phd + mar + kid5, family = quasipoisson, 
    data = articles)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.304617   0.139273   2.187 0.028983 *  
femWomen    -0.224594   0.073860  -3.041 0.002427 ** 
ment         0.025543   0.002713   9.415  < 2e-16 ***
phd          0.012823   0.035700   0.359 0.719544    
marMarried   0.155243   0.083003   1.870 0.061759 .  
kid5        -0.184883   0.054268  -3.407 0.000686 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 1.829006)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

Notice that there is no value listed for AIC.

quasi1 %>%
  tidy %>%
  mutate(lower_limit=estimate-1.96*std.error) %>%
  mutate(upper_limit=estimate+1.96*std.error) %>%
  select(term, estimate, lower_limit, upper_limit) %>%
  mutate(estimate=exp(estimate)) %>%
  mutate(lower_limit=exp(lower_limit)) %>%
  mutate(upper_limit=exp(upper_limit)) -> quasi1_coefficients
quasi1_coefficients

# A tibble: 6 × 4
  term        estimate lower_limit upper_limit
  <chr>          <dbl>       <dbl>       <dbl>
1 (Intercept)    1.36        1.03        1.78 
2 femWomen       0.799       0.691       0.923
3 ment           1.03        1.02        1.03 
4 phd            1.01        0.944       1.09 
5 marMarried     1.17        0.993       1.37 
6 kid5           0.831       0.747       0.924

If you compare these confidence intervals to the earlier ones, they are wider. This reflects the overdispersion that is only accounted for properly by the quasi Poisson model.

References

David Lillis. Generalized Linear Models in R, Part 7: Checking for Overdispersion in Count Regression. The Analysis Factor blog. Available in html format

An earlier version of this page was published on new.pmean.com.