Last updated: 2020-02-03

Checks: 2 0

Knit directory: Note/

This reproducible R Markdown analysis was created with workflowr (version 1.5.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.Rhistory

Untracked files:
    Untracked:  analysis/pi-MASS.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd f4e42db zouyuxin 2020-02-03 wflow_publish(“analysis/Bim-Bam.Rmd”)
html e4bec1a zouyuxin 2020-02-03 Build site.
Rmd c595afe zouyuxin 2020-02-03 wflow_publish(“analysis/Bim-Bam.Rmd”)

Servin and Stephens (2007) intriduces single effect model. The model includes intercept and they give an improper prior on it.

Model

The model is \[ y_i = \mu + \sum_j x_{ij} \beta_{j} + \epsilon_i, \quad \epsilon_i \sim N(0, \frac{1}{\tau}) \] \(y_i\) is the phenotype measurement for individual \(i\), \(\mu\) is the phenotype mean of individuals carrying the ‘reference’ genotype. They assume the three possible genotypes at each SNP (major allele homozygote, heterozygote, and minor allele homozygote) have effects 0, \(a + ak\) and \(2a\), respectively. They achieve this by including two columns in the design matrix for each SNP, one column being the genotypes (coded as 0, 1, or 2 copies of the minor allele), and the other being indicators (0 or 1) for whether the genotype is heterozygous. The effect of SNP \(j\) is then determined by a pair of regression coefficients (\(\beta_{j1}, \beta_{j2}\)), which are, respectively, the SNP additive effect \(a_j\) and dominance effect \(d_j = a_j k_j\).

Priors for (\(\beta, \mu, \tau\))

  1. Inference should not depend on the units in which the phenotype is measured.
  2. Even if the phenotype is affected by SNPs in this region, the majority of SNPs will likely not be causal

Phenotype mean and variance

\(D_1\): Jeffrey’s prior \[ P(\mu, \tau) \propto 1/\tau \]

\(D_2\): \[\begin{align*} \tau &\sim \Gamma(\kappa/2, \lambda/2) \\ \mu | \tau &\sim N(0, \sigma_{\mu}^2/\tau) \end{align*}\]

Prior on SNP effect

The prior on the SNP effects has two components: a prior on which SNPs are QTNs (quantitative trait nucleotides) and a prior on the QTN effect sizes.

They assume with some probability, \(p_0\), none of the SNPs is a QTN. With probability \((1 - p_0)\), they assume there are \(l\) QTNs, where \(l\) has some distribution \(p(l)\) on \(\{1, 2, \cdots, n_s\}\), where \(n_s\) denotes the number of SNPs in the region.

If SNP j is a QTN, then its effect is modeled by two parameters, \(a_j\) and \(d_j = a_j k_j\). The prior on \(a_j\) is \(N(0, \sigma_a^2/\tau)\).

\(D_1\): \[ k_j \sim N(0, \sigma_k^2) \]

\(D_2\): \(k_j\) is not independent of \(a_j\) \[ d_j \sim N(0, \sigma_d^2/\tau), \sigma_d = 0.5 \sigma_a \]

Assuming there is a single causative SNP in the model, with prior \(D_2\), the posterior is \[\begin{align*} \tau | y, G &\sim \Gamma(\frac{n+\kappa}{2}, \frac{1}{2}(y^\intercal y - B^\intercal \Omega^{-1} B + \lambda)) \\ \beta | \tau, y, G &\sim N(B, \Omega/\tau) \end{align*}\]

\(B = \Omega X^\intercal y\), \(\Omega = (\nu^{-1} + X^\intercal X)^{-1}\), \(\nu = diag(\sigma_{\mu}^2, \sigma_{a}^2, \sigma_d^2)\)

As \(\sigma_{\mu} \rightarrow \infty\), \(\lambda \rightarrow 0\),

  1. The posterior mean for \(\beta\) changes appropriately with shifts in y. That is, adding \(c\) to each element of \(y\) will add \(c\) to the first element of \(B\). \(\Omega^{-1}(B + (c, 0, 0)^\intercal) = X^\intercal (y + c 1)\).

  2. The posterior for \(\tau\) is invariant to shifts in y. \(y − XB\) does not change with shifts in y, and \(y − XB\) is orthogonal to 1.

  3. The posterior for \(\tau\) scales appropriately with y.

  4. The posterior for \(\beta\) changes appropriately with shifts and scaling of y.

\[ P(y|\tau, G) = \frac{P(y|\beta, \tau, G)P(\beta|\tau)}{P(\beta|y, \tau, G)} = (2\pi)^{-n/2} \tau^{n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}}\exp\left[-\frac{\tau}{2} (y^\intercal y - B^\intercal \Omega^{-1} B)\right] \] \[ P(y | G) = \int_{0}^{\infty} P(y | \tau, G) P(\tau) d \tau = (2\pi)^{-n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}} \int_0^{\infty} \tau^{(n + \kappa)/2 - 1} \exp\left[-\frac{\tau}{2} (y^\intercal y - B^\intercal \Omega^{-1} B + \lambda)\right] d \tau \\ = (2\pi)^{-n/2} \frac{|\Omega|^{1/2}}{|\nu|^{1/2}} (\frac{\lambda}{2})^{\kappa/2} \frac{\Gamma((n+\kappa)/2)}{\Gamma(\kappa/2)} \left(\frac{y^\intercal y - B^\intercal \Omega^{-1} B + \lambda}{2}\right)^{-(n+\kappa)/2} \]

Under the null, we can set \(\nu = \sigma_{\mu}^2\). \[ P_0(y|G) = (2\pi)^{-n/2} \frac{\Omega_0^{1/2}}{\sigma_{\mu}} (\frac{\lambda}{2})^{\kappa/2} \frac{\Gamma((n+\kappa)/2)}{\Gamma(\kappa/2)} \left(\frac{y^\intercal y - \Omega_0n^2\bar{y}^2 + \lambda}{2}\right)^{-(n+\kappa)/2} \] where \(\Omega_0 = (\frac{1}{\sigma_{\mu}^2} + n)^{-1}\) The BF is \[ BF = \frac{|\Omega|^{1/2}}{\Omega_0^{1/2}} \frac{1}{\sigma_a \sigma_d} \left[ \frac{y^\intercal y - B^\intercal \Omega^{-1} B + \lambda}{y^\intercal y - \Omega_0n^2\bar{y}^2 + \lambda} \right]^{-(n+\kappa)/2} \]

Taking limit, \[ \lim_{\lambda \rightarrow 0 \ \ \kappa \rightarrow 0} BF = \frac{|\Omega|^{1/2}}{\Omega_0^{1/2}} \frac{1}{\sigma_a \sigma_d} \left[ \frac{y^\intercal y - B^\intercal \Omega^{-1} B}{y^\intercal y - \Omega_0n^2\bar{y}^2} \right]^{-n/2} \] Finally, taking limit when \(\sigma_{\mu} \rightarrow \infty\).

Inference

Detecting association \[ \text{BF} = \frac{P(y | G, H_1)}{P(y | G, H_0)} \] \(H_0: a_j = d_j = 0 \quad \forall j\). When we allow at most one QTN, BF becomes \[ \text{BF} = \frac{1}{n_s} \sum_{j=1}^{n_s} \frac{P(y | G, H_j)}{P(y | G, H_0)} \]

Explaining and interpreting associations

To ‘explain’ observed associations we compute posterior distributions for SNP effects (\(a_j + d_j\) and \(2a_j\) for the heterozygote and minor-allele homozygote, respectively), with particular focus on the posterior probability that each SNP is a QTN \(P(a_j \neq 0)\).

Servin, Bertrand, and Matthew Stephens. 2007. “Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits.” PLoS Genetics 3 (7). Public Library of Science.