seqhap              package:haplo.stats              R Documentation

_S_e_q_u_e_n_t_i_a_l _H_a_p_l_o_t_y_p_e _S_c_a_n _A_s_s_o_c_i_a_t_i_o_n _A_n_a_l_y_s_i_s _f_o_r _C_a_s_e-_C_o_n_t_r_o_l _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     Seqhap implements sequential haplotype scan methods to perform
     association analyses for case-control data.  When evaluating each
     locus, loci that contribute additional information to haplotype 
     associations with disease status will be added sequentially.  This
     conditional evaluation is based on the  Mantel-Haenszel (MH) test.
      Two sequential methods are provided, a sequential haplotype
     method and a sequential summary method, as well as results based
     on the traditional single-locus method.  Currently, seqhap only
     works with bialleleic loci (single nucleotide polymorphisms,  or
     SNPs) and binary traits.

_U_s_a_g_e:

     seqhap(y, geno, pos, locus.label=NA, weight=NULL, 
            mh.threshold=3.84, r2.threshold=0.95, haplo.freq.min=0.005, 
            miss.val=c(0, NA), sim.control=score.sim.control(),
            control=haplo.em.control())

_A_r_g_u_m_e_n_t_s:

       y: vector of binary response (1=case, 0=control). The length is
          equal  to the number of rows in geno.  

    geno: matrix of alleles, such that each locus has a pair of
          adjacent columns of alleles, and the order of columns
          corresponds to the order of loci on a chromosome. If there
          are K loci, then ncol(geno)=2*K. Rows represent the alleles
          for each subject. Currently, only bi-allelic loci (SNPs) are
          allowed.  

     pos: vector of physical positions (or relative physical positions)
          for loci. If there are K loci, length(pos)=K. The scale (in
          kb, bp, or etc.) doesn't affect the results. 

locus.label : vector of labels for the set of loci  

  weight: weights for observations (rows of geno matrix). 

mh.threshold : threshold for the Mantel-Haenszel statistic that
          evaluates whether a locus contributes additional information
          of haplotype association to disease, conditional on current
          haplotypes. The default is 3.84, which is the 95th percentile
          of the chi-square distribution with 1 degree  of freedom.  

r2.threshold : threshold for a locus to be skipped. When scanning locus
          k, loci with correlations r-squared (the square of the
          Pearson's correlation) greater than r2.threshold with locus k
          will be ignored, so that the haplotype growing process
          continues for markers that are further away from locus k. 

haplo.freq.min : the minimum haplotype frequency for a haplotype to be
          included in the association tests. The haplotype frequency is
          based on the EM algorithm that estimates haplotype
          frequencies independent of trait.  

miss.val : vector of values that represent missing alleles. 

sim.control: A list of control parameters to determine how simulations
          are performed for permutation p-values, similar to the
          strategy in haplo.score.  The list is created by the function
          score.sim.control and the default values of this function can
          be  changed as desired.  Permutations are performed until a
          p.threshold accuracy rate is met for the three region-based
          p-values calculated in seqhap. See score.sim.control for
          details. 

 control: A list of parameters that control the EM algorithm for
          estimating  haplotype frequencies when phase is unknown.  The
          list is created by  the function haplo.em.control - see this
          function for more details.  

_V_a_l_u_e:

     list with components:

converge: indicator of convergence of the EM algorithm (see haplo.em); 
          1 = converge, 0=failed 

locus.label: vector of labels for loci 

     pos: chromosome positions for loci, same as input. 

   n.sim: number of permutations performed for emperical p-values 

  inlist: matrix that shows which loci are combined for association
          analysis in the sequential scan. The non-zero values of the
          kth row of inlist are the indices of the loci combined when
          scanning locus k.  

chi.stat: chi-square statistics of single-locus analysis. 

chi.p.point: permuted pointwise p-values of single-locus analysis. 

chi.p.region: permuted regional p-value of single-locus analysis. 

hap.stat: chi-square statistics of sequential haplotype analysis. 

  hap.df: degrees of freedom of sequential haplotype analysis. 

hap.p.point: permuted pointwise p-values of sequential haplotype
          analysis. 

hap.p.region: permuted region p-value of sequential haplotype analysis. 

sum.stat: chi-square statistics of sequential summary analysis. 

  sum.df: degrees of freedom of sequential summary analysis. 

sum.p.point: permuted pointwise p-values of sequential summary
          analysis. 

sum.p.region: permuted regional p-value of sequential summary analysis. 

_R_e_f_e_r_e_n_c_e_s:

     Yu Z, Schaid DJ. (2007) Sequential haplotype scan methods for
     association analysis. Genet Epidemiol, in print.

_S_e_e _A_l_s_o:

     'haplo.em', 'print.seqhap', 'plot.seqhap', 'score.sim.control'

_E_x_a_m_p_l_e_s:

     # load example data with response and genotypes. 
     setupData(seqhap.dat)
     mydata.y <- seqhap.dat[,1]
     mydata.x <- seqhap.dat[,-1]
     # load positions
     setupData(seqhap.pos)
     pos=seqhap.pos$pos
     # run seqhap with default settings
     myobj <- seqhap(y=mydata.y, geno=mydata.x, pos=pos)
     print.seqhap(myobj)

