Last updated: 2018-10-29
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180529)
The command set.seed(20180529)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 6f56652
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: docs/.DS_Store
Ignored: docs/figure/Test.Rmd/
Untracked files:
Untracked: analysis/mashMean.Rmd
Unstaged changes:
Modified: analysis/mashrEMlk.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 6f56652 | zouyuxin | 2018-10-29 | wflow_publish(“analysis/susie_bhat_imple.Rmd”) |
html | 50fd59d | zouyuxin | 2018-10-12 | Build site. |
Rmd | af6e01c | zouyuxin | 2018-10-12 | wflow_publish(c(“analysis/susie_z_imple.Rmd”, “analysis/susie_bhat_imple.Rmd”)) |
html | a990c1f | zouyuxin | 2018-10-08 | Build site. |
Rmd | 69abc7c | zouyuxin | 2018-10-08 | wflow_publish(c(“analysis/susie_z_imple.Rmd”, “analysis/susie_bhat_imple.Rmd”)) |
html | b1d36f4 | zouyuxin | 2018-10-08 | Build site. |
Rmd | 664133b | zouyuxin | 2018-10-08 | wflow_publish(“analysis/susie_bhat_imple.Rmd”) |
We illustrate the methods in SER model. It’s straight forward to implement in the general susie
model.
\[ \begin{align*} \mathbf{y}_{c} &= X_{c} \boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \boldsymbol{\epsilon}\sim N_{n}(0, \frac{1}{\tau}I) \\ \boldsymbol{\beta} &= \boldsymbol{\gamma} \beta \\ \boldsymbol{\gamma} &\sim Multinomial(1,\boldsymbol{\pi}) \\ \beta &\sim N(0, \frac{1}{\tau_{0}}) \end{align*} \] where \(\mathbf{y}_{c}\) and \(X_{c}\) are centered. By default we set \(\frac{1}{\tau} = var(\mathbf{y})\), \(\frac{1}{\tau_{0}} = \sigma_{\beta}^{2}var(\mathbf{y})\).
We use \(\mathbf{y}\) and \(X\) to denote the original data. \(\mathbf{y}_c\) and \(X_c\) denote the centered data. \(\mathbf{y}_{cs}\) and \(X_{cs}\) denote the centered and scaled data.
Data:
library(susieR)
set.seed(1)
n = 800
p = 1000
beta = rep(0,p)
beta[1] = 1
beta[50] = 1
beta[300] = 1
beta[400] = 1
X = matrix(rnorm(n*p),nrow=n,ncol=p)
y = c(X %*% beta + rnorm(n))
X.cs = susieR:::safe_colScale(X, center=TRUE, scale=TRUE)
X.c = susieR:::safe_colScale(X, center=TRUE, scale=FALSE)
# summary stat
bhat = numeric(p)
se = numeric(p)
sigmahat = numeric(p)
for(j in 1:p){
m = summary(lm(y~X[,j]))
bhat[j] = m$coefficients[2,1]
se[j] = m$coefficients[2,2]
sigmahat[j] = m$sigma
}
R = cor(X)
We convert the \(XtX\) to Cor (R), since the variance from \(XtX\) might be wrong. We can infer the diagonal element of \(XtX\) using bhat and shat.
\[ \hat{z}_{j} = \frac{\hat{\beta}_{j}}{\hat{s}_{j}} = \frac{\mathbf{x}_{cj}^{T}\mathbf{y}_{c}}{\hat{\sigma}_{j}\sqrt{\mathbf{x}_{cj}^{T}\mathbf{x}_{cj}}} = \frac{\mathbf{x}_{csj}^{T}\frac{1}{\alpha_{y}}\mathbf{y}_{c}}{\frac{1}{\alpha_{y}}\hat{\sigma}_{j}\sqrt{\mathbf{x}_{csj}^{T}\mathbf{x}_{csj}}} \]
\(\hat{\Sigma} = diag(\hat{\sigma}_{1}^2, \cdots, \hat{\sigma}_{p}^2)\)
\[ R_{j}^2 = \frac{\hat{z}_{j}^2}{\hat{z}_{j}^2+n-2} \]
Then \[ 1-R_{j}^{2} = \frac{RSS}{SST} = \frac{\hat{\sigma}_{j}^2(n-2)}{\mathbf{y}_{c}^{T}\mathbf{y}_{c}} \quad \hat{\sigma}_{j}^2 = \frac{\mathbf{y}_{c}^{T}\mathbf{y}_{c}(1-R_{j}^2)}{n-2} \] \[ \hat{\sigma}_{j}^2 = var(\mathbf{y}_{c})\frac{(n-1)(1-R_{j}^2)}{n-2} = var(\mathbf{y}_{c})\tilde{\sigma}_{j}^2 \] \[ \Rightarrow \quad \hat{s}_{j}^{2} = \frac{\hat{\sigma}_{j}^2}{\mathbf{x}_{.j}^{T}\mathbf{x}_{.j}} = var(\mathbf{y}_{c})\frac{\tilde{\sigma}_{j}^2}{\mathbf{x}_{.j}^{T}\mathbf{x}_{.j}} \]
\[ \mathbf{x}_{.j}^{T}\mathbf{x}_{.j} = var(\mathbf{y}_{c})\frac{\tilde{\sigma}_{j}^2}{\hat{s}_{j}^{2}} \quad \mathbf{x}_{.j}^{T}\mathbf{y} = var(\mathbf{y}_{c})\frac{\tilde{\sigma}_{j}^2}{\hat{s}_{j}} \hat{z}_{j} \]
Using \(X_{.}^{T}X_{.}\) and \(X_{.}\mathbf{y}_{c}\) in susie_ss
is fitting \[
\begin{align*}
\mathbf{y}_c &= X_{.} \boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \boldsymbol{\epsilon}\sim N_{n}(0, \frac{1}{\tau}I) \\
\boldsymbol{\beta} &= \boldsymbol{\gamma} \beta \\
\boldsymbol{\gamma} &\sim Multinomial(1,\boldsymbol{\pi}) \\
\beta &\sim N(0, \frac{1}{\tau_{0}})
\end{align*}
\]
In susieR
, if there is no information about the variance of y, we estimate the coefficients in the standardized X, y scale. Otherwise, we estimate the coefficients in the same scale of bhat, shat.
Fit susie model using individual level data (non standardize):
res0 = susie(X = X, Y = y,
estimate_residual_variance = FALSE, estimate_prior_variance = FALSE,
intercept = TRUE, standardize = FALSE,
scaled_prior_variance = 0.2, max_iter = 3)
Fit susie model using summary statistics (non standardize):
res1 = susie_bhat(bhat = bhat, shat = se, R = crossprod(X.c), n = n, var_y = var(y),
standardize = FALSE,
estimate_prior_variance = FALSE,
scaled_prior_variance = 0.2, max_iter = 3)
Compare the fitted \(\alpha\) and coefficients:
all.equal(res1$alpha, res0$alpha)
[1] TRUE
plot(coef(res1), coef(res0))
Version | Author | Date |
---|---|---|
a990c1f | zouyuxin | 2018-10-08 |
Fit susie model using individual level data (standardize):
res2 = susie(X=X.cs, Y = (y-mean(y))/sd(y),
estimate_residual_variance = FALSE, estimate_prior_variance = FALSE,
intercept = FALSE, standardize = FALSE, scaled_prior_variance = 0.2, max_iter = 3)
Fit susie model using summary statistics (standardize):
res3 = susie_bhat(bhat = bhat, shat = se, R = cor(X), n = n,
estimate_prior_variance = FALSE,
scaled_prior_variance = 0.2, max_iter = 3)
Compare the fitted \(\alpha\) and coefficients:
all.equal(res2$alpha, res3$alpha)
[1] TRUE
plot(coef(res2), coef(res3))
Version | Author | Date |
---|---|---|
a990c1f | zouyuxin | 2018-10-08 |
If we know z scores (intead of bhat, shat), with information about sample size n, we can fit susie
model as follows
z_scores = bhat/se
res4 = susie_z(z_scores, R = cor(X), n = n,
estimate_prior_variance = FALSE,
scaled_prior_variance = 0.2, max_iter = 3)
Compare the fitted \(\alpha\) and coefficients:
all.equal(res2$alpha, res4$alpha)
[1] TRUE
plot(coef(res2), coef(res4))
Version | Author | Date |
---|---|---|
50fd59d | zouyuxin | 2018-10-12 |
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] susieR_0.6.0.0364
loaded via a namespace (and not attached):
[1] workflowr_1.1.1 Rcpp_0.12.19 matrixStats_0.54.0
[4] lattice_0.20-35 digest_0.6.18 rprojroot_1.3-2
[7] R.methodsS3_1.7.1 grid_3.5.1 backports_1.1.2
[10] magrittr_1.5 git2r_0.23.0 evaluate_0.12
[13] stringi_1.2.4 whisker_0.3-2 R.oo_1.22.0
[16] R.utils_2.7.0 Matrix_1.2-14 rmarkdown_1.10
[19] tools_3.5.1 stringr_1.3.1 yaml_2.2.0
[22] compiler_3.5.1 htmltools_0.3.6 knitr_1.20
[25] expm_0.999-3
This reproducible R Markdown analysis was created with workflowr 1.1.1