cumulative               package:VGAM               R Documentation

_O_r_d_i_n_a_l _R_e_g_r_e_s_s_i_o_n _w_i_t_h _C_u_m_u_l_a_t_i_v_e _P_r_o_b_a_b_i_l_i_t_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a cumulative link regression model to a (preferably ordered)
     factor response.

_U_s_a_g_e:

     cumulative(link = "logit", earg = list(),
                parallel = FALSE, reverse = FALSE,
                mv = FALSE, intercept.apply = FALSE)

_A_r_g_u_m_e_n_t_s:

     In the following, the response Y is assumed to be a factor with
     ordered values 1,2,...,J+1. Hence M is the number of
     linear/additive predictors eta_j; for 'cumulative()' M=J.

    link: Link function applied to the J cumulative probabilities.  See
          'Links' for more choices, e.g., for the cumulative
          'probit'/'cloglog'/'cauchit'/... models.

    earg: List. Extra argument for the link function. See 'earg' in
          'Links' for general information.

parallel: A logical  or formula specifying which terms have
          equal/unequal coefficients. See below for more information
          about the parallelism assumption.

 reverse: Logical. By default, the cumulative probabilities used are
          P(Y<=1), P(Y<=2), ..., P(Y<=J). If 'reverse' is 'TRUE', then 
          P(Y>=2), P(Y>=3), ..., P(Y>=J+1) will be used.

          This should be set to 'TRUE' for 'link=' 'golf', 'polf',
          'nbolf'. For these links the cutpoints must be an increasing
          sequence; if 'reverse=FALSE' for then the cutpoints must be
          an decreasing sequence.

      mv: Logical. Multivariate response? If 'TRUE' then the input
          should be a matrix with values 1,2,...,L, where L=J+1 is the
          number of levels. Each column of the matrix is a response,
          i.e., multivariate response. A suitable matrix can be
          obtained from 'Cut'.

intercept.apply: Logical. Whether the 'parallel' argument should be
          applied to the intercept term. This should be set to 'TRUE'
          for 'link=' 'golf', 'polf', 'nbolf'.

_D_e_t_a_i_l_s:

     This 'VGAM' family function fits the class of cumulative link
     models to (hopefully) an ordinal response. By default, the
     non-parallel cumulative logit model is fitted, i.e.,

                        eta_j = logit(P[Y<=j])

     where j=1,2,...,M and the eta_j are not constrained to be
     parallel. This is also known as the _non-proportional odds model_.
     If the logit link is replaced by a complementary log-log link 
     ('cloglog') then this is known as the _proportional-hazards
     model_.

     In almost all the literature, the constraint matrices associated
     with this family of models are known. For example, setting
     'parallel=TRUE' will make all constraint matrices (except for the
     intercept) equal to a vector of M 1's. If the constraint matrices
     are equal, unknown and to be estimated, then this can be achieved
     by fitting the model as a reduced-rank vector generalized linear
     model (RR-VGLM; see 'rrvglm'). Currently, reduced-rank vector
     generalized additive models (RR-VGAMs) have not been implemented
     here.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm', 'rrvglm' and 'vgam'.

_W_a_r_n_i_n_g:

     No check is made to verify that the response is ordinal; see
     'ordered'.

_N_o_t_e:

     The response should be either a matrix of counts (with row sums
     that are all positive), or a factor. In both cases, the 'y' slot
     returned by 'vglm'/'vgam'/'rrvglm' is the matrix of counts. The
     formula must contain an intercept term. Other 'VGAM' family
     functions for an ordinal response include 'acat', 'cratio',
     'sratio'.

     For a nominal (unordered) factor response, the multinomial logit
     model ('multinomial') is more appropriate.

     With the logit link, setting 'parallel=TRUE' will fit a
     proportional odds model. Note that the 'TRUE' here does not apply
     to the intercept term.  In practice, the validity of the
     proportional odds assumption needs to be checked, e.g., by a
     likelihood ratio test (LRT). If acceptable on the data, then
     numerical problems are less likely to occur during the fitting,
     and there are less parameters. Numerical problems occur when the
     linear/additive predictors cross, which results in probabilities
     outside of (0,1); setting 'parallel=TRUE' will help avoid this
     problem.

     Here is an example of the usage of the 'parallel' argument. If
     there are covariates 'x2', 'x3' and 'x4', then 'parallel = TRUE ~
     x2 + x3 -1' and 'parallel = FALSE ~ x4' are equivalent. This would
     constrain the regression coefficients for 'x2' and 'x3' to be
     equal; those of the intercepts and 'x4' would be different.

     If the data is inputted in _long_ format (not _wide_ format, as in
     'pneumo' below) and the self-starting initial values are not good
     enough then try using 'mustart', 'coefstart' and/or  'etatstart'.
     See the example below.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Agresti, A. (2002) _Categorical Data Analysis_, 2nd ed. New York:
     Wiley.

     Dobson, A. J. (2001) _An Introduction to Generalized Linear
     Models_, 2nd ed. Boca Raton: Chapman & Hall/CRC Press.

     McCullagh, P. and Nelder, J. A. (1989) _Generalized Linear
     Models_, 2nd ed. London: Chapman & Hall.

     Simonoff, J. S. (2003) _Analyzing Categorical Data_, New York:
     Springer-Verlag.

     Yee, T. W. and Wild, C. J. (1996) Vector generalized additive
     models. _Journal of the Royal Statistical Society, Series B,
     Methodological_, *58*, 481-493.

     Documentation accompanying the 'VGAM' package at <URL:
     http://www.stat.auckland.ac.nz/~yee> contains further information
     and examples.

_S_e_e _A_l_s_o:

     'margeff', 'acat', 'cratio', 'sratio', 'multinomial', 'pneumo',
     'Links', 'logit', 'probit', 'cloglog', 'cauchit', 'golf', 'polf',
     'nbolf', 'logistic1'.

_E_x_a_m_p_l_e_s:

     # Fit the proportional odds model, p.179, in McCullagh and Nelder (1989)
     pneumo = transform(pneumo, let=log(exposure.time))
     (fit = vglm(cbind(normal, mild, severe) ~ let,
                 cumulative(parallel=TRUE, reverse=TRUE), pneumo))
     fit@y   # Sample proportions
     weights(fit, type="prior")   # Number of observations
     coef(fit, matrix=TRUE)
     constraints(fit)   # Constraint matrices

     # Check that the model is linear in let ----------------------
     fit2 = vgam(cbind(normal, mild, severe) ~ s(let, df=2),
                 cumulative(reverse=TRUE), pneumo)
     ## Not run: 
     plot(fit2, se=TRUE, overlay=TRUE, lcol=1:2, scol=1:2)
     ## End(Not run)

     # Check the proportional odds assumption with a LRT ----------
     (fit3 = vglm(cbind(normal, mild, severe) ~ let,
                  cumulative(parallel=FALSE, reverse=TRUE), pneumo))
     1 - pchisq(2*(logLik(fit3)-logLik(fit)),
                df=length(coef(fit3))-length(coef(fit)))

     # A factor() version of fit ----------------------------------
     # This is in long format (cf. wide format above)
     nobs = round(fit@y * c(weights(fit, type="prior")))
     sumnobs = colSums(nobs) # apply(nobs, 2, sum)

     pneumo.long = data.frame(symptoms=ordered(rep(rep(colnames(nobs),
                                       nrow(nobs)),
                                       times=c(t(nobs))),
                                       levels = colnames(nobs)),
                              let = rep(rep(with(pneumo, let), each=ncol(nobs)),
                                        times=c(t(nobs))))
     with(pneumo.long, table(let, symptoms)) # check it; should be same as pneumo

     (fit.long1 = vglm(symptoms ~ let, data=pneumo.long,
                  cumulative(parallel=TRUE, reverse=TRUE), trace=TRUE))
     coef(fit.long1, matrix=TRUE) # Should be same as coef(fit, matrix=TRUE)
     # Could try using mustart if fit.long1 failed to converge.
     mymustart = matrix(sumnobs / sum(sumnobs),
                        nrow(pneumo.long), ncol(nobs), byrow=TRUE)
     fit.long2 = vglm(symptoms ~ let,
                     fam = cumulative(parallel=TRUE, reverse=TRUE),
                     mustart=mymustart, data = pneumo.long, trace=TRUE)
     coef(fit.long2, matrix=TRUE) # Should be same as coef(fit, matrix=TRUE)

