mix2normal1               package:VGAM               R Documentation

_M_i_x_t_u_r_e _o_f _T_w_o _U_n_i_v_a_r_i_a_t_e _N_o_r_m_a_l _D_i_s_t_r_i_b_u_t_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     Estimates the five parameters of a mixture of two univariate 
     normal distributions by maximum likelihood estimation.

_U_s_a_g_e:

     mix2normal1(lphi="logit", lmu="identity", lsd="loge",
                 ephi=list(), emu1=list(), emu2=list(), esd1=list(), esd2=list(),
                 iphi=0.5, imu1=NULL, imu2=NULL, isd1=NULL, isd2=NULL,
                 qmu=c(0.2, 0.8), equalsd=TRUE, nsimEIM=100, zero=1)

_A_r_g_u_m_e_n_t_s:

    lphi: Link function for the parameter phi. See 'Links' for more
          choices.

     lmu: Link function applied to each mu parameter. See 'Links' for
          more choices.

     lsd: Link function applied to each sd parameter. See 'Links' for
          more choices.

ephi, emu1, emu2, esd1, esd2: List. Extra argument for each of the
          links. See 'earg' in 'Links' for general information. If
          'equalsd=TRUE' then 'esd1' must equal 'esd2'.

    iphi: Initial value for phi, whose value must lie between 0 and 1.

imu1, imu2: Optional initial value for mu1 and mu2. The default is to
          compute initial values internally using the argument 'qmu'.

isd1, isd2: Optional initial value for sd1 and sd2. The default is to
          compute initial values internally based on the argument
          'qmu'. Currently these are not great, therefore using these
          arguments  where practical is a good idea.

     qmu: Vector with two values giving the probabilities relating to
          the sample quantiles for obtaining initial values for mu1 and
          mu2. The two values are fed in as the 'probs' argument into
          'quantile'.

 equalsd: Logical indicating whether the two standard deviations should
          be  constrained to be equal. If 'TRUE' then the appropriate
          constraint matrices will be used.

 nsimEIM: See 'CommonVGAMffArguments'.

    zero: An integer specifying which linear/additive predictor is
          modelled as intercepts only.  If given, the value or values
          must be from the set 1,2,...,5.  The default is the first one
          only, meaning phi is a single parameter even when there are
          explanatory variables. Set 'zero=NULL' to model all
          linear/additive predictors as functions of the explanatory
          variables. See 'CommonVGAMffArguments' for more information.

_D_e_t_a_i_l_s:

     The probability density function can be loosely written as 

           f(y) = phi * N(mu1, sd1) + (1-phi) * N(mu2, sd2)

     where phi is the probability an observation belongs to the first
     group. The parameters mu1 and mu2 are the means, and  sd1 and sd2
     are the standard deviations. The parameter phi satisfies 0 < phi <
     1. The mean of Y is phi*mu1 + (1-phi)*mu2 and this is returned as
     the fitted values. By default, the five linear/additive predictors
     are (logit(phi), mu1, log(sd1), mu2, log(sd2))^T. If
     'equalsd=TRUE' then sd1=sd2 is enforced.

_V_a_l_u_e:

     An object of class '"vglmff"' (see 'vglmff-class'). The object is
     used by modelling functions such as 'vglm', and 'vgam'.

_W_a_r_n_i_n_g:

     Numerical problems can occur and half-stepping is not uncommon. If
     failure to converge occurs, try inputting better initial values,
     e.g., by using 'iphi', 'qmu', 'imu1', 'imu2', 'isd1', 'isd2', etc.

     This 'VGAM' family function should be used with care.

_N_o_t_e:

     Fitting this model successfully to data can be difficult due to
     numerical problems and ill-conditioned data.  It pays to fit the
     model several times with different initial values and check that
     the best fit looks reasonable. Plotting the results is
     recommended. This function works better as mu1 and mu2 become more
     different.

     Convergence can be slow, especially when the two component
     distributions are not well separated. The default control argument
     'trace=TRUE' is to encourage monitoring convergence. Having
     'equalsd=TRUE' often makes the overall optimization problem
     easier.

_A_u_t_h_o_r(_s):

     T. W. Yee

_R_e_f_e_r_e_n_c_e_s:

     McLachlan, G. J. and Peel, D. (2000) _Finite Mixture Models_. New
     York: Wiley.

     Everitt, B. S. and Hand, D. J. (1981) _Finite Mixture
     Distributions_. London: Chapman & Hall.

_S_e_e _A_l_s_o:

     'normal1', 'Normal', 'mix2poisson'.

_E_x_a_m_p_l_e_s:

     n = 1000
     mu1 =  99
     mu2 = 150
     sd1 = sd2 = exp(3)
     (phi = logit(-1, inverse=TRUE))
     y = ifelse(runif(n) < phi, rnorm(n, mu1, sd1), rnorm(n, mu2, sd2))

     fit = vglm(y ~ 1, mix2normal1(equalsd=TRUE))

     # Compare the results
     cf = coef(fit)
     round(rbind('Estimated'=c(logit(cf[1], inv=TRUE),
         cf[2], exp(cf[3]), cf[4]), 'Truth'=c(phi, mu1, sd1, mu2)), dig=2)

     ## Not run: 
     # Plot the results
     xx = seq(min(y), max(y), len=200)
     plot(xx, (1-phi)*dnorm(xx, mu2, sd2), type="l", xlab="y",
          main="Red=estimate, blue=truth", col="blue", ylab="Density")
     phi.est = logit(coef(fit)[1], inverse=TRUE)
     sd.est = exp(coef(fit)[3])
     lines(xx, phi*dnorm(xx, mu1, sd1), col="blue")
     lines(xx, phi.est * dnorm(xx, Coef(fit)[2], sd.est), col="red")
     lines(xx, (1-phi.est) * dnorm(xx, Coef(fit)[4], sd.est), col="red")
     abline(v=Coef(fit)[c(2,4)], lty=2, col="red")
     abline(v=c(mu1, mu2), lty=2, col="blue")
     ## End(Not run)

