multinomial               package:VGAM               R Documentation

_M_u_l_t_i_n_o_m_i_a_l _L_o_g_i_t _M_o_d_e_l

_D_e_s_c_r_i_p_t_i_o_n:

     Fits a multinomial logit model to a (preferably unordered) factor
     response.

_U_s_a_g_e:

     multinomial(zero = NULL, parallel = FALSE, nointercept = NULL,
                 refLevel = "last")

_A_r_g_u_m_e_n_t_s:

     In the following, the response Y is assumed to be a factor with
     unordered values 1,2,...,M+1, so that M is the number of
     linear/additive predictors eta_j.

    zero: An integer-valued vector specifying which linear/additive
          predictors are modelled as intercepts only. Any values must
          be from the set {1,2,...,M}. The default value means none are
          modelled as intercept-only terms.

parallel: A logical, or formula specifying which terms have
          equal/unequal coefficients.

nointercept: An integer-valued vector specifying which linear/additive
          predictors have no intercepts. Any values must be from the
          set {1,2,...,M}.

refLevel: Either a single positive integer or a value of the factor. If
          an integer then it specifies which column of the response
          matrix is the reference or baseline level. The default is the
          last one (the (M+1)th one). If used, this argument will be
          often assigned the value '1'. If inputted as a value of a
          factor then beware of missing values of certain levels of the
          factor ('drop.unused.levels=TRUE' or
          'drop.unused.levels=FALSE'). See the example below.

_D_e_t_a_i_l_s:

     The default model can be written

                    eta_j =  log(P[Y=j]/ P[Y=M+1])

     where eta_j is the jth linear/additive predictor. Here, j=1,...,M,
     and eta_{M+1} is 0 by definition.  That is, the last level of the
     factor, or last column of the response matrix, is taken as the
     reference level or baseline-this is for identifiability of the
     parameters. The reference or baseline level can be changed with
     the 'refLevel' argument.

     In almost all the literature, the constraint matrices associated
     with this family of models are known. For example, setting
     'parallel=TRUE' will make all constraint matrices (except for the
     intercept) equal to a vector of M 1's.  If the constraint matrices
     are unknown and to be estimated, then this can be achieved by
     fitting the model as a reduced-rank vector generalized linear
     model (RR-VGLM; see 'rrvglm').  In particular, a multinomial logit
     model with unknown constraint matrices is known as a _stereotype_
     model (Anderson, 1984), and can be fitted with 'rrvglm'.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm', 'rrvglm' and 'vgam'.

_W_a_r_n_i_n_g:

     No check is made to verify that the response is nominal.

     The arguments 'zero' and 'nointercept' can be inputted with values
     that fail. For example, 'multinomial(zero=2, nointercept=1:3)'
     means the second linear/additive predictor is identically zero,
     which will cause a failure.

     Be careful about the use of other potentially contradictory
     constraints, e.g., 'multinomial(zero=2, parallel = TRUE ~ x3)'. 
     If in doubt, apply 'constraints()' to the fitted object to check.

_N_o_t_e:

     The response should be either a matrix of counts (with row sums
     that are all positive), or a factor. In both cases, the 'y' slot
     returned by 'vglm'/'vgam'/'rrvglm' is the matrix of sample
     proportions.

     The multinomial logit model is more appropriate for a nominal
     (unordered) factor response than for an ordinal (ordered) factor
     response. Models more suited for the latter include those based on
     cumulative probabilities, e.g., 'cumulative'.

     'multinomial' is prone to numerical difficulties if the groups are
     separable and/or the fitted probabilities are close to 0 or 1. The
     fitted values returned are estimates of the probabilities P[Y=j]
     for j=1,...,M+1.

     Here is an example of the usage of the 'parallel' argument. If
     there are covariates 'x2', 'x3' and 'x4', then 'parallel = TRUE ~
     x2 + x3 - 1' and 'parallel = FALSE ~ x4' are equivalent. This
     would constrain the regression coefficients for 'x2' and 'x3' to
     be equal; those of the intercepts and 'x4' would be different.

     In Example 4 below, a conditional logit model is fitted to an
     artificial data set that explores how cost and travel time affect
     people's decision about how to travel to work.  Walking is the
     baseline group. The variable 'Cost.car' is the difference between
     the cost of travel to work by car and walking, etc.  The variable
     'Time.car' is the difference between the travel duration/time to
     work by car and walking, etc. For other details about the 'xij'
     argument see 'vglm.control' and 'fill'.

     The 'multinom' function in the 'nnet' package uses the first level
     of the factor as baseline, whereas the last level of the factor is
     used here. Consequently the estimated  regression coefficients
     differ.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Yee, T. W. and Hastie, T. J. (2003) Reduced-rank vector
     generalized linear models. _Statistical Modelling_,  *3*, 15-41.

     McCullagh, P. and Nelder, J. A. (1989) _Generalized Linear
     Models_, 2nd ed. London: Chapman & Hall.

     Agresti, A. (2002) _Categorical Data Analysis_, 2nd ed. New York:
     Wiley.

     Hastie, T. J., Tibshirani, R. J. and Friedman, J. H. (2009) _The
     Elements of Statistical Learning: Data Mining, Inference and
     Prediction_, 2nd ed. New York: Springer-Verlag.

     Simonoff, J. S. (2003) _Analyzing Categorical Data_, New York:
     Springer-Verlag.

     Anderson, J. A. (1984) Regression and ordered categorical
     variables.  _Journal of the Royal Statistical Society, Series B,
     Methodological_, *46*, 1-30.

     Documentation accompanying the 'VGAM' package at <URL:
     http://www.stat.auckland.ac.nz/~yee> contains further information
     and examples.

_S_e_e _A_l_s_o:

     'margeff', 'cumulative', 'acat', 'cratio', 'sratio', 'dirichlet',
     'dirmultinomial', 'rrvglm', 'fill1', 'Multinomial', 'iris'. The
     author's homepage has further documentation about categorical data
     analysis using 'VGAM'.

_E_x_a_m_p_l_e_s:

     # Example 1: fit a multinomial logit model to Edgar Anderson's iris data
     data(iris)
     ## Not run: 
     fit = vglm(Species ~ ., multinomial, iris)
     coef(fit, matrix=TRUE) 
     ## End(Not run)

     # Example 2a: a simple example 
     ymat = t(rmultinom(10, size = 20, prob=c(0.1,0.2,0.8))) # Counts
     fit = vglm(ymat ~ 1, multinomial)
     head(fitted(fit))   # Proportions
     fit@prior.weights # Not recommended for extraction of prior weights
     weights(fit, type="prior", matrix=FALSE) # The better method
     fit@y   # Sample proportions
     constraints(fit)   # Constraint matrices

     # Example 2b: Different reference level used as the baseline 
     fit2 = vglm(ymat ~ 1, multinomial(refLevel=2))
     coef(fit2, matrix=TRUE)
     coef(fit , matrix=TRUE) # Easy to reconcile this output with fit2

     # Example 2c: Different input to Example 2a but same result
     w = apply(ymat, 1, sum) # Prior weights
     yprop = ymat / w    # Sample proportions
     fitprop = vglm(yprop ~ 1, multinomial, weights=w)
     head(fitted(fitprop))   # Proportions
     weights(fitprop, type="prior", matrix=FALSE)
     fitprop@y # Same as the input

     # Example 3: The response is a factor.
     nn = 10
     dframe3 = data.frame(yfactor = gl(3, nn, labels=c("Control","Trt1","Trt2")),
                          x = runif(3 * nn))
     myrefLevel = with(dframe3, yfactor[12])
     fit3a = vglm(yfactor ~ x, multinomial(refLevel=myrefLevel), data=dframe3)
     fit3b = vglm(yfactor ~ x, multinomial(refLevel=2), data=dframe3)
     coef(fit3a, matrix=TRUE)  # "Treatment1" is the reference level
     coef(fit3b, matrix=TRUE)  # "Treatment1" is the reference level
     margeff(fit3b)

     # Example 4: Fit a rank-1 stereotype model 
     data(car.all)
     fit4 = rrvglm(Country ~ Width + Height + HP, multinomial, car.all, Rank=1)
     coef(fit4)   # Contains the C matrix
     constraints(fit4)$HP     # The A matrix 
     coef(fit4, matrix=TRUE)  # The B matrix
     Coef(fit4)@C             # The C matrix 
     ccoef(fit4)              # Better to get the C matrix this way
     Coef(fit4)@A             # The A matrix 
     svd(coef(fit4, matrix=TRUE)[-1,])$d    # This has rank 1; = C 

     # Example 5: The use of the xij argument (aka conditional logit model)
     set.seed(111)
     nn = 100  # Number of people who travel to work
     M = 3  # There are M+1 models of transport to go to work
     ymat = matrix(0, nn, M+1)
     ymat[cbind(1:nn, sample(x=M+1, size=nn, replace=TRUE))] = 1
     dimnames(ymat) = list(NULL, c("bus","train","car","walk"))
     gotowork = data.frame(cost.bus  = runif(nn), time.bus  = runif(nn), 
                           cost.train= runif(nn), time.train= runif(nn),
                           cost.car  = runif(nn), time.car  = runif(nn),
                           cost.walk = runif(nn), time.walk = runif(nn))
     gotowork = round(gotowork, dig=2) # For convenience
     gotowork = transform(gotowork,
                          Cost.bus   = cost.bus   - cost.walk,
                          Cost.car   = cost.car   - cost.walk,
                          Cost.train = cost.train - cost.walk,
                          Cost       = cost.train - cost.walk, # for labelling
                          Time.bus   = time.bus   - time.walk,
                          Time.car   = time.car   - time.walk,
                          Time.train = time.train - time.walk,
                          Time       = time.train - time.walk) # for labelling
     fit = vglm(ymat ~ Cost + Time,
                multinomial(parall=TRUE ~ Cost + Time - 1),
                xij = list(Cost ~ Cost.bus + Cost.train + Cost.car,
                           Time ~ Time.bus + Time.train + Time.car),
                form2 =  ~ Cost + Cost.bus + Cost.train + Cost.car +
                           Time + Time.bus + Time.train + Time.car,
                data=gotowork, trace=TRUE)
     head(model.matrix(fit, type="lm"))   # LM model matrix
     head(model.matrix(fit, type="vlm"))  # Big VLM model matrix
     coef(fit)
     coef(fit, matrix=TRUE)
     constraints(fit)
     summary(fit)
     max(abs(predict(fit)-predict(fit, new=gotowork))) # Should be 0

