Processing math: 52%
  • Model
  • Priors for (β,μ,τ)
  • Inference

Last updated: 2020-02-03

Checks: 2 0

Knit directory: Note/

This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory

Untracked files:
    Untracked:  analysis/pi-MASS.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd f4e42db zouyuxin 2020-02-03 wflow_publish(“analysis/Bim-Bam.Rmd”)
html e4bec1a zouyuxin 2020-02-03 Build site.
Rmd c595afe zouyuxin 2020-02-03 wflow_publish(“analysis/Bim-Bam.Rmd”)

Servin and Stephens (2007) intriduces single effect model. The model includes intercept and they give an improper prior on it.

Model

The model is yi=μ+jxijβj+ϵi,ϵiN(0,1τ) yi is the phenotype measurement for individual i, μ is the phenotype mean of individuals carrying the ‘reference’ genotype. They assume the three possible genotypes at each SNP (major allele homozygote, heterozygote, and minor allele homozygote) have effects 0, a+ak and 2a, respectively. They achieve this by including two columns in the design matrix for each SNP, one column being the genotypes (coded as 0, 1, or 2 copies of the minor allele), and the other being indicators (0 or 1) for whether the genotype is heterozygous. The effect of SNP j is then determined by a pair of regression coefficients (βj1,βj2), which are, respectively, the SNP additive effect aj and dominance effect dj=ajkj.

Priors for (β,μ,τ)

  1. Inference should not depend on the units in which the phenotype is measured.
  2. Even if the phenotype is affected by SNPs in this region, the majority of SNPs will likely not be causal

Phenotype mean and variance

D1: Jeffrey’s prior P(μ,τ)1/τ

D2: τΓ(κ/2,λ/2)μ|τN(0,σ2μ/τ)

Prior on SNP effect

The prior on the SNP effects has two components: a prior on which SNPs are QTNs (quantitative trait nucleotides) and a prior on the QTN effect sizes.

They assume with some probability, p0, none of the SNPs is a QTN. With probability (1p0), they assume there are l QTNs, where l has some distribution p(l) on {1,2,,ns}, where ns denotes the number of SNPs in the region.

If SNP j is a QTN, then its effect is modeled by two parameters, aj and dj=ajkj. The prior on aj is N(0,σ2a/τ).

D1: kjN(0,σ2k)

D2: kj is not independent of aj djN(0,σ2d/τ),σd=0.5σa

Assuming there is a single causative SNP in the model, with prior D2, the posterior is τ|y,GΓ(n+κ2,12(y

B = \Omega X^\intercal y, \Omega = (\nu^{-1} + X^\intercal X)^{-1}, \nu = diag(\sigma_{\mu}^2, \sigma_{a}^2, \sigma_d^2)

As \sigma_{\mu} \rightarrow \infty, \lambda \rightarrow 0,

  1. The posterior mean for \beta changes appropriately with shifts in y. That is, adding c to each element of y will add c to the first element of B. \Omega^{-1}(B + (c, 0, 0)^\intercal) = X^\intercal (y + c 1).

  2. The posterior for \tau is invariant to shifts in y. y − XB does not change with shifts in y, and y − XB is orthogonal to 1.

  3. The posterior for \tau scales appropriately with y.

  4. The posterior for \beta changes appropriately with shifts and scaling of y.

P(y|\tau, G) = \frac{P(y|\beta, \tau, G)P(\beta|\tau)}{P(\beta|y, \tau, G)} = (2\pi)^{-n/2} \tau^{n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}}\exp\left[-\frac{\tau}{2} (y^\intercal y - B^\intercal \Omega^{-1} B)\right] P(y | G) = \int_{0}^{\infty} P(y | \tau, G) P(\tau) d \tau = (2\pi)^{-n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}} \int_0^{\infty} \tau^{(n + \kappa)/2 - 1} \exp\left[-\frac{\tau}{2} (y^\intercal y - B^\intercal \Omega^{-1} B + \lambda)\right] d \tau \\ = (2\pi)^{-n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}} (\frac{\lambda}{2})^{\kappa/2} \frac{\Gamma((n+\kappa)/2)}{\Gamma(\kappa/2)} \left(\frac{y^\intercal y - B^\intercal \Omega^{-1} B + \lambda}{2}\right)^{-(n+\kappa)/2}

Under the null, we can set \nu = \sigma_{\mu}^2. P_0(y|G) = (2\pi)^{-n/2} \frac{\Omega_0^{1/2}}{\sigma_{\mu}} (\frac{\lambda}{2})^{\kappa/2} \frac{\Gamma((n+\kappa)/2)}{\Gamma(\kappa/2)} \left(\frac{y^\intercal y - \Omega_0n^2\bar{y}^2 + \lambda}{2}\right)^{-(n+\kappa)/2} where \Omega_0 = (\frac{1}{\sigma_{\mu}^2} + n)^{-1} The BF is BF = \frac{|\Omega|^{1/2}}{\Omega_0^{1/2}} \frac{1}{\sigma_a \sigma_d} \left[ \frac{y^\intercal y - B^\intercal \Omega^{-1} B + \lambda}{y^\intercal y - \Omega_0n^2\bar{y}^2 + \lambda} \right]^{-(n+\kappa)/2}

Taking limit, \lim_{\lambda \rightarrow 0 \ \ \kappa \rightarrow 0} BF = \frac{|\Omega|^{1/2}}{\Omega_0^{1/2}} \frac{1}{\sigma_a \sigma_d} \left[ \frac{y^\intercal y - B^\intercal \Omega^{-1} B}{y^\intercal y - \Omega_0n^2\bar{y}^2} \right]^{-n/2} Finally, taking limit when \sigma_{\mu} \rightarrow \infty.

Inference

Detecting association \text{BF} = \frac{P(y | G, H_1)}{P(y | G, H_0)} H_0: a_j = d_j = 0 \quad \forall j. When we allow at most one QTN, BF becomes \text{BF} = \frac{1}{n_s} \sum_{j=1}^{n_s} \frac{P(y | G, H_j)}{P(y | G, H_0)}

Explaining and interpreting associations

To ‘explain’ observed associations we compute posterior distributions for SNP effects (a_j + d_j and 2a_j for the heterozygote and minor-allele homozygote, respectively), with particular focus on the posterior probability that each SNP is a QTN P(a_j \neq 0).

Servin, Bertrand, and Matthew Stephens. 2007. “Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits.” PLoS Genetics 3 (7). Public Library of Science.