mbinomial                package:VGAM                R Documentation

_T_h_e _M_a_t_c_h_e_d _B_i_n_o_m_i_a_l _D_i_s_t_r_i_b_u_t_i_o_n _F_a_m_i_l_y _F_u_n_c_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Estimation of a binomial regression in a matched case-control
     study.

_U_s_a_g_e:

     mbinomial(mvar=NULL, link="logit", earg=list(),
               parallel = TRUE, smallno = .Machine$double.eps^(3/4))

_A_r_g_u_m_e_n_t_s:

    mvar: Formula specifying the matching variable. This shows which
          observation belongs to which matching set. The intercept
          should be suppressed from the formula, and  the term must be
          a 'factor'.

    link: Parameter link function applied to the probability. See
          'Links' for more choices.

    earg: List. Extra arguments for the links. See 'earg' in 'Links'
          for general information.

parallel: This should always be set 'TRUE' otherwise there will be too
          many parameters to estimate. See 'CommonVGAMffArguments' for
          more information.

 smallno: Numeric, a small positive value. For a specific observation,
          used to nullify the linear/additive predictors that are not
          needed.

_D_e_t_a_i_l_s:

     By default, this 'VGAM' family function fits a logistic regression
     model to a binary response from a matched case-control study.
     Here, each case (Y=1) is matched with one or more controls (Y=0)
     with respect to some matching variables (confounders). For
     example, the first matched set is all women aged from 20 to 25,
     the second matched set is women aged between 26 to 30, etc. The
     logistic regression has a different intercept for each matched set
     but the other regression coefficients are assumed to be the same
     across matched sets ('parallel=TRUE').

     Let C be the number of matched sets. This 'VGAM' family function
     uses a trick by allowing M, the number of linear/additive
     predictors, to be equal to C, and then nullifying all but one of
     them for a particular observation. The term specified by the
     'mvar' argument must be a 'factor'. Consequently, the model matrix
     contains an intercept plus one column for each level of the factor
     (except the first (this is the default in R)). Altogether there
     are C columns. The algorithm here constructs a different
     constraint matrix for each of the C columns.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm' and 'vgam'.

_W_a_r_n_i_n_g:

     Both the memory requirements and computational time of this 'VGAM'
     family function grows very quickly with respect to the number of
     matched sets. For example, the large model matrix of a data set
     with 100 matched sets consisting of one case and one control per
     set will take up at least (about) 20Mb of memory. For a constant
     number of cases and controls per matched set, the memory
     requirements are O(C^3) and the the computational time is O(C^4)
     flops.

     The example below has been run successfully with 'n=700' (this
     corresponds to C=350) but only on a big machine and it took over
     10 minutes. The large model matrix was 670Mb.

_N_o_t_e:

     The response is assumed to be in a format that can also be
     inputted into 'binomialff'.

_A_u_t_h_o_r(_s):

     Thomas W. Yee

_R_e_f_e_r_e_n_c_e_s:

     Section 8.2 of  Hastie, T. J. and Tibshirani, R. J. (1990)
     _Generalized Additive Models_, London: Chapman & Hall.

     Pregibon, D. (1984) Data analytic methods for matched case-control
     studies. _Biometrics_, *40*, 639-651.

     Chapter 7 of  Breslow, N. E. and Day, N. E. (1980) _Statistical
     Methods in Cancer Research I: The Analysis of Case-Control
     Studies_. Lyon: International Agency for Research on Cancer.

     Holford, T. R. and White, C. and Kelsey, J. L. (1978) Multivariate
     analysis for matched case-control studies. _American Journal of
     Epidemiology_, *107*, 245-256.

_S_e_e _A_l_s_o:

     'binomialff'.

_E_x_a_m_p_l_e_s:

     ## Not run: 
     # Cf. Hastie and Tibshirani (1990) p.209. The variable n must be even.
     # Here, the intercept for each matched set accounts for x3 which is
     # the confounder or matching variable.
     n = 700 # Requires a big machine with lots of memory. Expensive wrt time
     n = 100 # This requires a reasonably big machine.
     mydat = data.frame(x2 = rnorm(n), x3 = rep(rnorm(n/2), each=2))
     xmat = with(mydat, cbind(x2,x3))
     mydat = transform(mydat, eta = -0.1 + 0.2 * x2 + 0.3 * x3)
     etamat = with(mydat, matrix(eta, n/2, 2))
     condmu = exp(etamat[,1]) / (exp(etamat[,1]) + exp(etamat[,2]))
     y1 = ifelse(runif(n/2) < condmu, 1, 0)
     y = cbind(y1, 1-y1)
     mydat = transform(mydat, y = c(y1, 1-y1),
                              ID = factor(c(row(etamat))))
     fit = vglm(y ~ 1 + ID + x2, trace=TRUE,
                fam = mbinomial(mvar = ~ ID - 1), data=mydat)
     dimnames(coef(fit, mat=TRUE))
     coef(fit, mat=TRUE)
     summary(fit)
     head(fitted(fit))
     objsizemb = function(object) round(object.size(object) / 2^20, dig=2)
     objsizemb(fit) # in Mb

     VLMX = model.matrix(fit, type="vlm")  # The big model matrix
     dim(VLMX)
     objsizemb(VLMX) # in Mb
     rm(VLMX)
     ## End(Not run)

