Last updated: 2018-10-12
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180529)
The command set.seed(20180529)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: af6e01c
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: docs/.DS_Store
Ignored: docs/figure/Test.Rmd/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | af6e01c | zouyuxin | 2018-10-12 | wflow_publish(c(“analysis/susie_z_imple.Rmd”, “analysis/susie_bhat_imple.Rmd”)) |
html | ca79ff3 | zouyuxin | 2018-10-10 | Build site. |
Rmd | be44281 | zouyuxin | 2018-10-10 | wflow_publish(“analysis/susie_z_imple.Rmd”) |
html | a990c1f | zouyuxin | 2018-10-08 | Build site. |
html | 3e79af1 | zouyuxin | 2018-10-08 | Build site. |
Rmd | cc8bca3 | zouyuxin | 2018-10-08 | wflow_publish(“analysis/susie_z_imple.Rmd”) |
We illustrate the methods in SER model. It’s straight forward to implement in the general susie
model.
\[ \begin{align*} \mathbf{y}_{c} &= X_{c} \boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \boldsymbol{\epsilon}\sim N_{n}(0, \frac{1}{\tau}I) \\ \boldsymbol{\beta} &= \boldsymbol{\gamma} \beta \\ \boldsymbol{\gamma} &\sim Multinomial(1,\boldsymbol{\pi}) \\ \beta &\sim N(0, \frac{1}{\tau_{0}}) \end{align*} \] where \(\mathbf{y}_{c}\) and \(X_{c}\) are centered. By default we set \(\frac{1}{\tau} = var(\mathbf{y})\), \(\frac{1}{\tau_{0}} = \sigma_{\beta}^{2}var(\mathbf{y})\).
We use \(\mathbf{y}\) and \(X\) to denote the original data. \(\mathbf{y}_c\) and \(X_c\) denote the centered data. \(\mathbf{y}_{cs}\) and \(X_{cs}\) denote the centered and scaled data.
Data:
library(susieR)
set.seed(1)
n = 800
p = 1000
beta = rep(0,p)
beta[1] = 1
beta[50] = 1
beta[300] = 1
beta[400] = 1
X = matrix(rnorm(n*p),nrow=n,ncol=p)
y = c(X %*% beta + rnorm(n))
X.cs = susieR:::safe_colScale(X, center=TRUE, scale=TRUE)
X.c = susieR:::safe_colScale(X, center=TRUE, scale=FALSE)
# z scores
# summary stat
bhat = numeric(p)
se = numeric(p)
sigmahat = numeric(p)
zhat = numeric(p)
for(j in 1:p){
m = summary(lm(y~X[,j]))
bhat[j] = m$coefficients[2,1]
se[j] = m$coefficients[2,2]
sigmahat[j] = m$sigma
zhat[j] = abs(qnorm(m$coefficients[2,4]/2)) * sign(m$coefficients[2,1])
}
R = cor(X)
With z scores, we don’t care whether the summary statisitcs are from the scaled or unscaled \(X\), \(y\).
\[ \hat{z}_{j} = \frac{\hat{\beta}_{j}}{\hat{s}_{j}} = \frac{\mathbf{x}_{cj}^{T}\mathbf{y}_{c}}{\hat{\sigma}_{j}\sqrt{\mathbf{x}_{cj}^{T}\mathbf{x}_{cj}}} = \frac{\mathbf{x}_{csj}^{T}\frac{1}{\alpha_y}\mathbf{y}_{c}}{\frac{1}{\alpha_y}\hat{\sigma}_{j}\sqrt{\mathbf{x}_{csj}^{T}\mathbf{x}_{csj}}} \]
\(\mathbf{x}_{.j}\) could be \(\mathbf{x}_{cj}\) or \(\mathbf{x}_{csj}\)
As \(n \rightarrow \infty\) \[ \hat{\sigma}_{j}^{2} \rightarrow \sigma_{j}^{2} \quad \hat{z}_{j}\rightarrow z_{j} = \frac{\mathbf{x}_{.j}^{T}\mathbf{y}_{c}}{\sigma_{j}\sqrt{\mathbf{x}_{.j}^{T}\mathbf{x}_{.j}}} \]
We know \(z_{j}\) for \(j = 1,\cdots,p\). If we set \(\sigma_{j}^{2} = c\), \(\forall j\), we are conservative when making inference of effect.
Assume \[ z_{j} = \frac{\mathbf{x}_{.j}^{T}\mathbf{y}_{c}}{\sqrt{c}\sqrt{\mathbf{x}_{.j}^{T}\mathbf{x}_{.j}}} \]
Let \(XtX = (n-1)R\). Since we know R, \(\mathbf{x}_{.j}\) is \(\mathbf{x}_{csj}\)
\[ X_{cs}^{T}\mathbf{y}_{c} = \sqrt{c} \sqrt{diag(\mathbf{x}_{.j}^{T}\mathbf{x}_{csj})}\mathbf{z} \\ \frac{1}{\sqrt{n-1}}X_{cs}^{T}\frac{1}{\sqrt{c}}\mathbf{y}_{c} = \mathbf{z} \]
This is equivalent to \[ \begin{align*} \frac{1}{\sqrt{c}}\mathbf{y}_{c} &= \frac{1}{\sqrt{n-1}}X_{cs} \boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \boldsymbol{\epsilon}\sim N_{n}(0, I) \\ \boldsymbol{\beta} &= \boldsymbol{\gamma} \beta \quad \boldsymbol{\gamma} \sim Multinomial(1,\boldsymbol{\pi}) \quad \beta \sim N(0, (n-1)\sigma_{\beta}^2) \end{align*} \] We have no information about n, so we estimate the prior variance in susie.
Fit susie model using individual level data:
res0 = susieR::susie(X.cs/sqrt(n-1), (y-mean(y))/sd(y),
estimate_residual_variance = FALSE,
estimate_prior_variance = TRUE,
intercept = FALSE, standardize = TRUE,
max_iter = 2)
Fit susie model using summary statistics with z and R:
res1 = susie_z(z = zhat, R = crossprod(X.c),
max_iter = 2)
Compare the fitted results:
all.equal(res0$alpha, res1$alpha)
[1] "Mean relative difference: 5.479774e-06"
all.equal(res0$pip, res1$pip)
[1] "Mean relative difference: 5.479774e-06"
{plot(coef(res0), coef(res1))
abline(0,1)}
Version | Author | Date |
---|---|---|
ca79ff3 | zouyuxin | 2018-10-10 |
a990c1f | zouyuxin | 2018-10-08 |
Fit susie model using summary statistics with z and covariance matrix:
res2 = susie_z(z = zhat, R = cov(X),
max_iter = 2)
The output is same as the one with R.
all.equal(res1, res2)
[1] TRUE
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] susieR_0.4.30.0332
loaded via a namespace (and not attached):
[1] workflowr_1.1.1 Rcpp_0.12.19 matrixStats_0.54.0
[4] lattice_0.20-35 digest_0.6.15 rprojroot_1.3-2
[7] R.methodsS3_1.7.1 grid_3.5.1 backports_1.1.2
[10] git2r_0.23.0 magrittr_1.5 evaluate_0.11
[13] stringi_1.2.4 whisker_0.3-2 R.oo_1.22.0
[16] R.utils_2.6.0 Matrix_1.2-14 rmarkdown_1.10
[19] tools_3.5.1 stringr_1.3.1 yaml_2.2.0
[22] compiler_3.5.1 htmltools_0.3.6 knitr_1.20
This reproducible R Markdown analysis was created with workflowr 1.1.1