negbinomial               package:VGAM               R Documentation

_N_e_g_a_t_i_v_e _B_i_n_o_m_i_a_l _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Maximum likelihood estimation of the two parameters of a negative
     binomial distribution.

_U_s_a_g_e:

     negbinomial(lmu = "loge", lk = "loge",
                 emu =list(), ek=list(),
                 ik = NULL, nsimEIM=100,
                 cutoff = 0.995, Maxiter=5000, 
                 deviance.arg = FALSE, method.init=1,
                 shrinkage.init=0.95, zero = -2)

_A_r_g_u_m_e_n_t_s:

 lmu, lk: Link functions applied to the mu and k parameters. See
          'Links' for more choices. Note that the k parameter is the
          'size' argument of  'rnbinom' etc.

 emu, ek: List. Extra argument for each of the links. See 'earg' in
          'Links' for general information.

      ik: Optional initial values for k. If failure to converge occurs
          try different values (and/or use 'method.init'). For a
          S-column response, 'ik' can be of length S. A value 'NULL'
          means an initial value for each response is computed
          internally using a range of values. This argument is ignored
          if used within 'cqo'; see  the 'iKvector' argument of
          'qrrvglm.control' instead.

 nsimEIM: This argument is used for computing the diagonal element of
          the _expected information matrix_ (EIM) corresponding to k.
          See 'CommonVGAMffArguments' for more information and the note
          below.

  cutoff: Used in the finite series approximation. A numeric which is
          close to 1 but never exactly 1. Used to specify how many
          terms of the infinite series for computing the second
          diagonal element of the EIM are actually used. The sum of the
          probabilites are added until they reach this value or more
          (but no more than 'Maxiter' terms allowed). It is like
          specifying 'p' in an imaginary function 'qnegbin(p)'.

 Maxiter: Used in the finite series approximation. Integer. The maximum
          number of terms allowed when computing the second diagonal
          element of the EIM. In theory, the value involves an infinite
          series. If this argument is too small then the value may be
          inaccurate.

deviance.arg: Logical. If 'TRUE', the deviance function is attached to
          the object. Under ordinary circumstances, it should be left
          alone because it really assumes the index parameter is at the
          maximum likelihood estimate. Consequently, one cannot use
          that criterion to minimize within the IRLS algorithm. It
          should be set 'TRUE' only when used with 'cqo'  under the
          fast algorithm.

method.init: An integer with value '1' or '2' or '3' which specifies
          the initialization method for the mu parameter. If failure to
          converge occurs try another value and/or else specify a value
          for 'shrinkage.init' and/or else specify a value for 'ik'.

shrinkage.init: How much shrinkage is used when initializing mu. The
          value must be between 0 and 1 inclusive, and a value of 0
          means the individual response values are used, and a value of
          1 means the median or mean is used. This argument is used in
          conjunction with 'method.init'. If convergence failure occurs
          try setting this argument to 1.

    zero: Integer valued vector, usually assigned -2 or 2 if used at
          all.  Specifies which of the two linear/additive predictors
          are modelled as an intercept only. By default, the k
          parameter (after 'lk' is applied) is modelled as a single
          unknown number that is estimated.  It can be modelled as a
          function of the explanatory variables by setting 'zero=NULL'.
          A negative value means that the value is recycled, so setting
          -2 means all k are intercept-only.

_D_e_t_a_i_l_s:

     The negative binomial distribution can be motivated in several
     ways, e.g., as a Poisson distribution with a mean that is gamma
     distributed. There are several common parametrizations of the
     negative binomial distribution. The one used here uses the mean mu
     and an _index_ parameter k, both which are positive. Specifically,
     the density of a random variable Y is 

       f(y;mu,k) = C_{y}^{y + k - 1} [mu/(mu+k)]^y [k/(k+mu)]^k

     where y=0,1,2,..., and mu > 0 and k > 0. Note that the dispersion
     parameter is  1/k, so that as k approaches infinity the negative
     binomial distribution approaches a Poisson distribution. The
     response has variance Var(Y)=mu*(1+mu/k). When fitted, the
     'fitted.values' slot of the object contains the estimated value of
     the mu parameter, i.e., of the mean E(Y).

     The negative binomial distribution can be coerced into the
     classical GLM framework, with one of the parameters being of
     interest and the other treated as a nuisance/scale parameter (and
     implemented in the MASS library). This 'VGAM' family function
     'negbinomial' treats both parameters on the same footing, and
     estimates them both by full maximum likelihood estimation.
     Simulated Fisher scoring is employed as the default (see the
     'nsimEIM' argument).

     The parameters mu and k are independent (diagonal EIM), and the
     confidence region for k is extremely skewed so that its standard
     error is often of no practical use. The parameter 1/k has been
     used as a measure of aggregation.

     This 'VGAM' function handles _multivariate_ responses, so that a
     matrix can be used as the response. The number of columns is the
     number of species, say, and setting 'zero=-2' means that _all_
     species have a k equalling a (different) intercept only.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm' and 'vgam'.

_W_a_r_n_i_n_g:

     The Poisson model corresponds to k equalling infinity. If the data
     is Poisson or close to Poisson, numerical problems will occur.
     Possibly choosing a log-log link may help in such cases, otherwise
     use 'poissonff'.

     This function is fragile; the maximum likelihood estimate of the
     index parameter is fraught (see Lawless, 1987). In general, the
     'quasipoissonff' is more robust than this function. Assigning
     values to the 'ik' argument may lead to a local solution, and
     smaller values are preferred over large values when using this
     argument.

     Yet to do: write a family function which uses the methods of
     moments estimator for k.

_N_o_t_e:

     Suppose the response is called 'ymat'. The diagonal element of the
     _expected information matrix_ (EIM) for parameter k involves an
     infinite series; consequently simulated Fisher scoring (see
     'nsimEIM') is the default. This algorithm should definitely be
     used if 'max(ymat)' is large, e.g., 'max(ymat) > 300' or there are
     any outliers in 'ymat'. A second algorithm involving a finite
     series approximation can be invoked by setting 'nsimEIM = NULL'.
     Then the arguments 'Maxiter' and 'cutoff' are pertinent.

     Regardless of the algorithm used, convergence problems may occur,
     especially when the response has large outliers or is large in
     magnitude. If convergence failure occurs, try using arguments (in
     recommended decreasing order) 'nsimEIM', 'shrinkage.init',
     'method.init', 'Maxiter',  'cutoff', 'ik', 'zero'.

     This function can be used by the fast algorithm in 'cqo', however,
     setting 'EqualTolerances=TRUE' and 'ITolerances=FALSE' is
     recommended.

     In the first example below (Bliss and Fisher, 1953), from each of
     6 McIntosh apple trees in an orchard that had been sprayed, 25
     leaves were randomly selected. On each of the leaves, the number
     of adult female European red mites were counted.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Lawless, J. F. (1987) Negative binomial and mixed Poisson
     regression. _The Canadian Journal of Statistics_ *15*, 209-225.

     Hilbe, J. M. (2007) _Negative Binomial Regression_. Cambridge:
     Cambridge University Press.

     Bliss, C. and Fisher, R. A. (1953) Fitting the negative binomial
     distribution to biological data. _Biometrics_ *9*, 174-200.

_S_e_e _A_l_s_o:

     'quasipoissonff', 'poissonff', 'cao', 'cqo', 'zinegbinomial',
     'posnegbinomial', 'invbinomial', 'rnbinom', 'nbolf'.

_E_x_a_m_p_l_e_s:

     # Example 1: apple tree data
     y = 0:7
     w = c(70, 38, 17, 10, 9, 3, 2, 1)
     fit = vglm(y ~ 1, negbinomial, weights=w)
     summary(fit)
     coef(fit, matrix=TRUE)
     Coef(fit)

     # Example 2: simulated data with multivariate response
     x = runif(n <- 500)
     y1 = rnbinom(n, mu=exp(3+x), size=exp(1)) # k is size
     y2 = rnbinom(n, mu=exp(2-x), size=exp(0))
     fit = vglm(cbind(y1,y2) ~ x, negbinomial, trace=TRUE)
     coef(fit, matrix=TRUE)

     # Example 3: large counts so definitely use the nsimEIM argument
     x = runif(n <- 500)
     y = rnbinom(n, mu=exp(12+x), size=exp(1)) # k is size
     range(y)  # Large counts
     fit = vglm(y ~ x, negbinomial(nsimEIM=100), trace=TRUE)
     coef(fit, matrix=TRUE)

