Last updated: 2018-09-05
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180529)
The command set.seed(20180529)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: ecdacb8
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: docs/.DS_Store
Ignored: docs/figure/
Unstaged changes:
Modified: analysis/_site.yml
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | ecdacb8 | zouyuxin | 2018-09-05 | wflow_publish(c(“analysis/VB_theory.Rmd”, “analysis/susie_summarySTAT.Rmd”, “analysis/SUmofSIngleEffect.Rmd”)) |
\[ \begin{align*} \mathbf{y} &= \sum_{l=1}^{L} X \boldsymbol{\beta}_{l} + \boldsymbol{\epsilon} \quad \boldsymbol{\epsilon}\sim N_{n}(0, \frac{1}{\tau}I) \\ \boldsymbol{\beta}_{l} &= \boldsymbol{\gamma}_{l} \beta_{l} \\ \boldsymbol{\gamma}_{l} &\sim Multinomial(1,\boldsymbol{\pi}) \\ \beta_{l} &\sim N(0, \frac{1}{\tau_{l}}) \end{align*} \]
The model is a sum of single effect regression model.
Note that exactly one element of each \(\boldsymbol{\beta}_{l}\) is non-zero, and the different vectors \(\boldsymbol{\beta}_{1}, \cdots, \boldsymbol{\beta}_{L}\) are independent. Thus at most L covariates have non-zero coefficients in this model.
Note that if \(L \ll p\) then the SuSiE model is approximately equivalent to a standard Bayesian multiple regression model in which L randomly-chosen covariates have non-zero coefficients. The only difference is that with some (small) probability some of the \(\boldsymbol{\beta}_{l}\) in the SuSiE model may overlap in which coordinate is non-zero, and so the number of non-zero effects has some (small) probability to be less than L.
The different parameter structure of the SuSiE model leads to a very simple and intuitive fitting procedure. This procedure provide not only point estimates (approximate posterior means) for \(\boldsymbol{\beta}\), but a variational approximation to the posterior distribution. Crucially, this variational approximation is both easy to interpret, and capable of capturing important dependencies among elements of \(\boldsymbol{\beta}\) in the posterior distribution.
When L = 1, it is the single effect regression model. We know \[ \log p(\mathbf{y}) = L_{1}(q(\boldsymbol{\beta})) = \log \left[\sum_{j=1}^{p} \pi_{j} BF(\mathbf{y}, \mathbf{x}_{j})\right] + \log N(\mathbf{y}; 0, \frac{1}{\tau}I) \]
Moreover, \[ \begin{align*} L_{1}(q(\boldsymbol{\beta}); \mathbf{y}) &= \int q(\boldsymbol{\beta}) \log \frac{p(\boldsymbol{\beta}, \mathbf{y})}{q(\boldsymbol{\beta})} d \boldsymbol{\beta} \\ &= \mathbb{E}_{q}(\log p(\mathbf{y}|\boldsymbol{\beta})) + \mathbb{E}_{q}\left[\log \frac{p(\boldsymbol{\beta})}{q(\boldsymbol{\beta})}\right] \\ &= -\frac{n}{2}\log(2\pi) + \frac{n}{2}\log \tau - \frac{\tau \mathbf{y}^{T}\mathbf{y}}{2} - \frac{\tau}{2}\mathbb{E}_{q}\left[ \boldsymbol{\beta}^{T}X^{T}X\boldsymbol{\beta}\right] + \tau \mathbf{y}^{T}X\mathbb{E}_{q}\left[\boldsymbol{\beta}\right] + \mathbb{E}_{q}\left[\log \frac{p(\boldsymbol{\beta})}{q(\boldsymbol{\beta})}\right] \\ &= C_{1} - \frac{\tau}{2}\mathbb{E}_{q}\left[ \boldsymbol{\beta}^{T}X^{T}X\boldsymbol{\beta}\right] + \tau \mathbf{y}^{T}X\mathbb{E}_{q}\left[\boldsymbol{\beta}\right] + \mathbb{E}_{q}\left[\log \frac{p(\boldsymbol{\beta})}{q(\boldsymbol{\beta})}\right] \end{align*} \] which can be maximized by setting q to be the posterior.
When L = 2, \[ \mathbf{y} = X \boldsymbol{\beta}_{1} + X \boldsymbol{\beta}_{2} +\boldsymbol{\epsilon} \]
\[ L_{2}(q(\boldsymbol{\beta}); \mathbf{y}) = \mathbb{E}_{q}(\log p(\mathbf{y}|\boldsymbol{\beta}_{1}, \boldsymbol{\beta}_{1})) + \mathbb{E}_{q_{1}}\left[\log \frac{p(\boldsymbol{\beta}_{1})}{q(\boldsymbol{\beta}_{1})}\right] + \mathbb{E}_{q_{2}}\left[\log \frac{p(\boldsymbol{\beta}_{2})}{q(\boldsymbol{\beta}_{2})}\right] \]
We first conditional on the first effect by treating \(q_{1}\) fixed and maximize over \(q_{2}\), \[ \begin{align*} L_{2}(q_{2}(\boldsymbol{\beta}); \mathbf{y}) &= -\frac{n}{2}\log(2\pi) + \frac{n}{2}\log \tau - \frac{\tau \mathbf{y}_{2}^{T}\mathbf{y}_{2}}{2} + \mathbb{E}_{q_{1}}\left[\log \frac{p(\boldsymbol{\beta}_{1})}{q_{1}(\boldsymbol{\beta}_{1})}\right] - \frac{\tau}{2}\mathbb{E}_{q_{2}}\left[ \boldsymbol{\beta}_{2}^{T}X^{T}X\boldsymbol{\beta}_{2}\right] + \tau \mathbf{y}_{2}^{T}X\mathbb{E}_{q_{2}}\left[\boldsymbol{\beta}_{2}\right] + \mathbb{E}_{q_{2}}\left[\log \frac{p(\boldsymbol{\beta}_{2})}{q_{2}(\boldsymbol{\beta}_{2})}\right] \\ &= C_{2} - \frac{\tau}{2}\mathbb{E}_{q_{2}}\left[ \boldsymbol{\beta}_{2}^{T}X^{T}X\boldsymbol{\beta}_{2}\right] + \tau \mathbf{y}_{2}^{T}X\mathbb{E}_{q_{2}}\left[\boldsymbol{\beta}_{2}\right] + \mathbb{E}_{q_{2}}\left[\log \frac{p(\boldsymbol{\beta}_{2})}{q_{2}(\boldsymbol{\beta}_{2})}\right] \end{align*} \]
\(\mathbf{y}_{2} = \mathbf{y} - X\mathbb{E}_{q_{1}}\left[\boldsymbol{\beta}_{1}\right]\)
\[ L_{2}(q_{2}(\boldsymbol{\beta}); \mathbf{y}) \propto L_{1}(q_{2}(\boldsymbol{\beta}); \mathbf{y}_{2}) \] which can be maximized by setting \(q_{2}\) to be the posterior.
In general, the lower bound is \[ L(q(\boldsymbol{\beta})) = -\frac{n}{2}\log(2\pi) + \frac{n}{2}\log \tau - \mathbb{E}_{q}\left[\frac{\tau}{2}\|\mathbf{y} - \sum_{l=1}^{L}X\boldsymbol{\beta}_{l}\|^{2}\right] + \sum_{l=1}^{L} \mathbb{E}_{q_{l}}\left[\log \frac{p(\boldsymbol{\beta}_{l})}{q_{l}(\boldsymbol{\beta}_{l})}\right] \]
For each l single effect regression model, \[ \begin{align*} \log p(\mathbf{y}_{resid,-l}) &= L_{l}(q_{l}(\boldsymbol{\beta})) = -\frac{n}{2}\log(2\pi) + \frac{n}{2}\log \tau - \mathbb{E}_{q_{l}}\left[\frac{\tau}{2}\|\mathbf{y} - X\boldsymbol{\beta}_{l}\|^{2}\right] + \mathbb{E}_{q_{l}}\left[\log \frac{p(\boldsymbol{\beta}_{l})}{q_{l}(\boldsymbol{\beta}_{l})}\right] \\ \mathbb{E}_{q_{l}}\left[\log \frac{p(\boldsymbol{\beta}_{l})}{q_{l}(\boldsymbol{\beta}_{l})}\right] &= \log p(\mathbf{y}_{resid,-l}) + \frac{n}{2}\log(2\pi) - \frac{n}{2}\log \tau + \mathbb{E}_{q_{l}}\left[\frac{\tau}{2}\|\mathbf{y}_{resid,-l} - X\boldsymbol{\beta}_{l}\|^{2}\right] \end{align*} \]
\[ \mathbb{E}_{q}\left[\|\mathbf{y} - X\boldsymbol{\beta}\|^{2}\right] = \mathbf{y}^{T}\mathbf{y} - 2\mathbf{y}^{T}X\mathbb{E}_{q}\left[\boldsymbol{\beta}\right] + \mathbb{E}_{q}\left[\boldsymbol{\beta}^{T}X^{T}X\boldsymbol{\beta} \right] = \mathbf{y}^{T}\mathbf{y} - 2\mathbf{y}^{T}X\mathbb{E}_{q}\left[\boldsymbol{\beta}\right] + (X^{2})^{T}\mathbb{E}_{q}\left[\boldsymbol{\beta}\boldsymbol{\beta}^{T}\right] \] Let \(\mathbf{r}_{l} = \mathbb{E}_{q}(\boldsymbol{\beta}_{l})\), \(\mathbf{r} = \sum_{l=1}^{L}\mathbf{r}_{l}\)
\[ \begin{align*} \mathbb{E}_{q}\left( (\sum_{l=1}^{L}X\boldsymbol{\beta}_{l})^{T}(\sum_{l=1}^{L}X\boldsymbol{\beta}_{l}) \right) &= \sum_{l,l'}\mathbb{E}(\boldsymbol{\beta}_{l}^{T}X^{T}X\boldsymbol{\beta}_{l'}) \\ &= \sum_{l,l'} \mathbf{r}_{l}^{T}X^{T}X\mathbf{r}_{l'} - \sum_{l}\mathbf{r}_{l}^{T}X^{T}X\mathbf{r}_{l} + \sum_{l}\mathbb{E}(\boldsymbol{\beta}_{l}^{T}X^{T}X\boldsymbol{\beta}_{l}) \\ &= \|X\mathbf{r}\|^2 - \sum_{l=1}^{L}\|X\mathbf{r}_{l}\|^{2} + \sum_{l}\mathbb{E}(\boldsymbol{\beta}_{l}^{T}X^{T}X\boldsymbol{\beta}_{l}) \\ &= \|X\mathbf{r}\|^2 - \sum_{l=1}^{L}\|X\mathbf{r}_{l}\|^{2} + \sum_{l}\sum_{j=1}^{p}(X^{T}X)_{jj}\mathbb{E}(\beta_{lj}^2)\\ \end{align*} \]
\[ \begin{align*} \mathbb{E}_{q}\left[\|\mathbf{y} - \sum_{l=1}^{L}X\boldsymbol{\beta}_{l}\|^{2}\right] &= \mathbf{y}^{T}\mathbf{y} - 2\mathbf{y}^{T}X\sum_{l=1}^{L}\mathbb{E}_{q}(\boldsymbol{\beta}_{l}) + \mathbb{E}_{q}\left( (\sum_{l=1}^{L}X\boldsymbol{\beta}_{l})^{T}(\sum_{l=1}^{L}X\boldsymbol{\beta}_{l}) \right) \\ &= \mathbf{y}^{T}\mathbf{y} - 2\mathbf{y}^{T}X\mathbf{r} + \|X\mathbf{r}\|^2 - \sum_{l=1}^{L}\|X\mathbf{r}_{l}\|^{2} + \sum_{l}\sum_{j=1}^{p}(X^{T}X)_{jj}\mathbb{E}(\beta_{lj}^2) \\ &= \|\mathbf{y} - X\mathbf{r}\|^{2} - \sum_{l=1}^{L}\|X\mathbf{r}_{l}\|^{2} + \sum_{l}\sum_{j=1}^{p}(X^{T}X)_{jj}\mathbb{E}(\beta_{lj}^2) \end{align*} \]
\[ L(q(\boldsymbol{\beta})) = -\frac{n}{2}\log(2\pi) + \frac{n}{2}\log \tau - \frac{\tau}{2}\left[\|\mathbf{y} - X\mathbf{r}\|^{2} - \sum_{l=1}^{L}\|X\mathbf{r}_{l}\|^{2} + \sum_{l=1}^{L}\sum_{j=1}^{p}(X^{T}X)_{jj}\mathbb{E}(\beta_{lj}^2)\right] + \sum_{l=1}^{L} \left[ \log p(\mathbf{y}_{resid,-l}) + \frac{n}{2}\log(2\pi) - \frac{n}{2}\log \tau + \frac{\tau}{2}\left(\mathbf{y}_{resid,-l}^{T}\mathbf{y}_{resid,-l} - 2\mathbf{y}_{resid,-l}^{T}X\mathbb{E}_{q}\left[\boldsymbol{\beta}_{l}\right] + (X^{2})^{T}\mathbb{E}_{q}\left[\boldsymbol{\beta}_{l}\boldsymbol{\beta}_{l}^{T}\right] \right) \right] \]
Updates in each iteration:
\[ \beta_{lj}|\gamma_{lj}=1 \sim_{q_{l}} N(\mu_{lj}, s_{lj}^{2}) \quad q(\gamma_{l}) = \alpha_{lj} \]
\[ s_{lj}^{2} = \left(\tau_{l} + \tau (X^{T}X)_{jj}\right)^{-1} \]
\[ \mu_{lj} = \frac{\tau(X^{T}\mathbf{y}_{res,-l})_{j}}{\tau_{l} + \tau (X^{T}X)_{jj}} = s_{lj}^{2}\tau (X^{T}\mathbf{y}_{res,-l})_{j} = s_{lj}^{2}\tau [(X^{T}\mathbf{y})_{j} - \sum_{k\neq j}(X^{T}X)_{kj}\alpha_{lk}\mu_{lk}] \]
\[ \alpha_{lj} \propto \pi_{j} N(\frac{(X^{T}\mathbf{y}_{res,-l})_{j}}{(X^{T}X)_{jj}}; 0, \frac{1}{\tau(X^{T}X)_{jj}} + \frac{1}{\tau_{l}} ) \]
\[ \frac{\alpha_{lj}}{1-\alpha_{lj}} = \frac{\pi_{j}}{1-\pi_{j}} \sqrt{\tau_{l}s_{lj}^2} \exp(\frac{\mu_{lj}^{2}}{2s_{lj}^{2}}) = \frac{\pi_{j}}{1-\pi_{j}} BF(\mathbf{y}, \mathbf{x}_{j}) \]
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_1.1.1 Rcpp_0.12.18 digest_0.6.15
[4] rprojroot_1.3-2 R.methodsS3_1.7.1 backports_1.1.2
[7] git2r_0.23.0 magrittr_1.5 evaluate_0.11
[10] stringi_1.2.4 whisker_0.3-2 R.oo_1.22.0
[13] R.utils_2.6.0 rmarkdown_1.10 tools_3.5.1
[16] stringr_1.3.1 yaml_2.2.0 compiler_3.5.1
[19] htmltools_0.3.6 knitr_1.20
This reproducible R Markdown analysis was created with workflowr 1.1.1