Last updated: 2018-10-24
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(1)
The command set.seed(1)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 0fe65ad
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/.Rhistory
Ignored: analysis/include/.DS_Store
Ignored: code/.DS_Store
Ignored: data/.DS_Store
Ignored: docs/.DS_Store
Ignored: output/.DS_Store
Untracked files:
Untracked: analysis/Classify.Rmd
Untracked: analysis/EstimateCorMaxEMGD.Rmd
Untracked: analysis/EstimateCorMaxGD.Rmd
Untracked: analysis/HierarchicalFlashSim.Rmd
Untracked: analysis/MashLowSignalGTEx4.Rmd
Untracked: analysis/Mash_GTEx.Rmd
Untracked: analysis/MeanAsh.Rmd
Untracked: analysis/OutlierDetection.Rmd
Untracked: analysis/OutlierDetection2.Rmd
Untracked: analysis/OutlierDetection3.Rmd
Untracked: analysis/OutlierDetection4.Rmd
Untracked: analysis/mash_missing_row.Rmd
Untracked: code/GTExNullModel.R
Untracked: code/MashClassify.R
Untracked: code/MashCorResult.R
Untracked: code/MashCormVResult.R
Untracked: code/MashNULLCorResult.R
Untracked: code/MashSource.R
Untracked: code/Weight_plot.R
Untracked: code/addemV.R
Untracked: code/estimate_cor.R
Untracked: code/generateDataV.R
Untracked: code/johnprocess.R
Untracked: code/mV.R
Untracked: code/sim_mean_sig.R
Untracked: code/summary.R
Untracked: data/Blischak_et_al_2015/
Untracked: data/scale_data.rds
Untracked: docs/figure/Classify.Rmd/
Untracked: docs/figure/OutlierDetection.Rmd/
Untracked: docs/figure/OutlierDetection2.Rmd/
Untracked: docs/figure/OutlierDetection3.Rmd/
Untracked: docs/figure/Test.Rmd/
Untracked: docs/figure/mash_missing_whole_row_5.Rmd/
Untracked: docs/include/
Untracked: output/AddEMV/
Untracked: output/CovED_UKBio_strong.rds
Untracked: output/CovED_UKBio_strong_Z.rds
Untracked: output/Flash_UKBio_strong.rds
Untracked: output/GTExNULLres/
Untracked: output/GTEx_2.5_nullData.rds
Untracked: output/GTEx_2.5_nullModel.rds
Untracked: output/GTEx_2.5_nullPermData.rds
Untracked: output/GTEx_2.5_nullPermModel.rds
Untracked: output/GTEx_3.5_nullData.rds
Untracked: output/GTEx_3.5_nullModel.rds
Untracked: output/GTEx_3.5_nullPermData.rds
Untracked: output/GTEx_3.5_nullPermModel.rds
Untracked: output/GTEx_3_nullData.rds
Untracked: output/GTEx_3_nullModel.rds
Untracked: output/GTEx_3_nullPermData.rds
Untracked: output/GTEx_3_nullPermModel.rds
Untracked: output/GTEx_4.5_nullData.rds
Untracked: output/GTEx_4.5_nullModel.rds
Untracked: output/GTEx_4.5_nullPermData.rds
Untracked: output/GTEx_4.5_nullPermModel.rds
Untracked: output/GTEx_4_nullData.rds
Untracked: output/GTEx_4_nullModel.rds
Untracked: output/GTEx_4_nullPermData.rds
Untracked: output/GTEx_4_nullPermModel.rds
Untracked: output/MASH.10.em2.result.rds
Untracked: output/MASH.10.mle.result.rds
Untracked: output/MashCorSim--midway/
Untracked: output/Mash_EE_Cov_0_plusR1.rds
Untracked: output/UKBio_mash_model.rds
Untracked: output/mVIterations/
Untracked: output/mVUlist/
Untracked: output/result.em.rds
Unstaged changes:
Deleted: analysis/EstimateCorEM3.Rmd
Modified: analysis/EstimateCorMaxEM2.Rmd
Modified: analysis/Mash_UKBio.Rmd
Modified: analysis/mash_missing_samplesize.Rmd
Modified: output/Flash_T2_0.rds
Modified: output/Flash_T2_0_mclust.rds
Modified: output/Mash_model_0_plusR1.rds
Modified: output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 0fe65ad | zouyuxin | 2018-10-24 | wflow_publish(“analysis/EstimateCorEM2.Rmd”) |
html | c3d87aa | zouyuxin | 2018-10-09 | Build site. |
Rmd | 8a7ee7c | zouyuxin | 2018-10-09 | wflow_publish(c(“analysis/EstimateCorEM.Rmd”, “analysis/EstimateCorEM2.Rmd”, “analysis/EstimateCorEM3.Rmd”)) |
html | e5e7da7 | zouyuxin | 2018-10-09 | Build site. |
Rmd | 4c4955b | zouyuxin | 2018-10-09 | wflow_publish(“analysis/EstimateCorEM2.Rmd”) |
html | 3c9f55e | zouyuxin | 2018-10-09 | Build site. |
Rmd | 596f070 | zouyuxin | 2018-10-09 | wflow_publish(“analysis/EstimateCorEM2.Rmd”) |
library(mashr)
Loading required package: ashr
source('../code/generateDataV.R')
source('../code/summary.R')
We use EM algorithm to update \(\rho\).
B is the \(n\times R\) true value matrix. \(\mathbf{z}\) is a length n vector.
\[ P(\hat{B},B, \mathbf{z}|\rho, \pi) = \prod_{j=1}^{J} \prod_{p=0}^{P}\left[\pi_{p}N(\hat{\mathbf{b}}_{j}; \mathbf{b}_{j}, \mathbf{V})N(\mathbf{b}_{i}; 0, \Sigma_{p})\right]^{\mathbb{I}(z_{i}=p)} \]
\[ \begin{align*} P(z_{j}=p, \mathbf{b}_{j}|\hat{\mathbf{b}}_{j}) &= \frac{P(z_{j}=p, \mathbf{b}_{j},\hat{\mathbf{b}}_{j})}{P(\hat{\mathbf{b}}_{j})} = \frac{P(\hat{\mathbf{b}}_{j}|\mathbf{b}_{j})P(\mathbf{b}_{j}|z_{j}=p) P(z_{j}=p)}{P(\hat{\mathbf{b}}_{j})} \\ &= \frac{\pi_{p} N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{b}_{j},\mathbf{V})N_{R}(\mathbf{b}_{j}; \mathbf{0}, \Sigma_{p})}{\sum_{p'}\pi_{p'} N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{0},\mathbf{V} + \Sigma_{p'})} \\ &= \frac{\pi_{p} N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{0},\mathbf{V} + \Sigma_{p})}{\sum_{p'}\pi_{p'} N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{0},\mathbf{V} + \Sigma_{p'})} \frac{N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{b}_{j},\mathbf{V})N_{R}(\mathbf{b}_{j}; \mathbf{0}, \Sigma_{p})}{N_{R}(\hat{\mathbf{b}}_{j}; \mathbf{0},\mathbf{V} + \Sigma_{p})} \\ &= \gamma_{jp} P(\mathbf{b}_{j}|z_{j}=p, \hat{\mathbf{b}}_{j}) \\ &= P(z_{j}=p|\hat{\mathbf{b}}_{j}) P(\mathbf{b}_{j}|z_{j}=p, \hat{\mathbf{b}}_{j}) \end{align*} \]
\[ \mathbb{E}_{\mathbf{z}, \mathbf{B}|\hat{\mathbf{B}}}\log p(\hat{\mathbf{B}}, \mathbf{B}, \mathbf{z}) = \sum_{j=1}^{J} \sum_{p=1}^{P} \gamma_{jp} \left[\log \pi_{p} - \frac{1}{2}\log |\mathbf{V}| - \frac{1}{2}\mathbb{E}_{\mathbf{b}_{j}|\hat{\mathbf{b}}_{j}, z_{j}=p}\left[(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})^{T}\mathbf{V}^{-1}(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})\right] \\ - \frac{1}{2}\log |\Sigma_{p}| - \frac{1}{2}\mathbb{E}_{\mathbf{b}_{j}|\hat{\mathbf{b}}_{j}, z_{j}=p}\left[\mathbf{b}_{j}^{T}\Sigma_{p}^{-1}\mathbf{b}_{j} \right] \right] \]
\[ \begin{align*} f(\mathbf{V}) &= \sum_{j=1}^{J} \sum_{p=1}^{P} \gamma_{jp} \left[- \frac{1}{2}\log |\mathbf{V}| - \frac{1}{2}\mathbb{E}_{\mathbf{b}_{j}|\hat{\mathbf{b}}_{j}, z_{j}=p}\left[(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})^{T}\mathbf{V}^{-1}(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})\right]\right] \\ &= \sum_{j=1}^{J} \left[- \frac{1}{2}\log |\mathbf{V}| - \frac{1}{2}\mathbb{E}_{\mathbf{b}_{j}|\hat{\mathbf{b}}_{j}}\left[(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})^{T}\mathbf{V}^{-1}(\hat{\mathbf{b}}_{j}-\mathbf{b}_{j})\right]\right] \end{align*} \]
V has a specific form: \[ V = \left( \begin{matrix}1 & \rho \\ \rho & 1 \end{matrix} \right) \]
Let \(\mu_{j} = \mathbb{E}_{b_{j}|\hat{b}_{j}}(b_{j})\) \[ \begin{align*} \log N(\hat{b}_{j}; b_{j}, V) &= -\frac{1}{2}\log |V| - \frac{1}{2}(\hat{b}_{j}-b_{j})^{T}V^{-1}(\hat{b}_{j}-b_{j}) \\ &= -\frac{1}{2}\log |V| - \frac{1}{2}\hat{b}_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}b_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}\hat{b}_{j}^{T}V^{-1}b_{j} -\frac{1}{2}b_{j}^{T}V^{-1}b_{j} \\ \mathbb{E}_{b_{j}|\hat{b}_{j}} \log N(\hat{b}_{j}; b_{j}, V) &= -\frac{1}{2}\log |V| - \frac{1}{2}\hat{b}_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}\mu_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}\hat{b}_{j}^{T}V^{-1}\mu_{j} -\frac{1}{2}tr(V^{-1}\mathbb{E}_{b_{j}|\hat{b}_{j}}(b_{j}b_{j}^{T})) \end{align*} \]
\[ \begin{align*} \mathbb{E}_{b_{j}|\hat{b}_{j}} \log N(\hat{b}_{j}; b_{j}, V) &= -\frac{1}{2}\log |V| - \frac{1}{2}\hat{b}_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}\mu_{j}^{T}V^{-1}\hat{b}_{j} + \frac{1}{2}\hat{b}_{j}^{T}V^{-1}\mu_{j} -\frac{1}{2}tr(V^{-1}\mathbb{E}_{b_{j}|\hat{b}_{j}}(b_{j}b_{j}^{T})) \\ &= -\frac{1}{2}\log(1-\rho^2) - \frac{1}{2(1-\rho^2)}\left(\hat{b}_{j1}^2 + \hat{b}_{j2}^2 - 2\hat{b}_{j1}\hat{b}_{j2}\rho -2\hat{b}_{j1} \mu_{j1} -2\hat{b}_{j2} \mu_{j2} + 2\hat{b}_{j2}\mu_{j1}\rho + 2\hat{b}_{j1}\mu_{j2}\rho + \mathbb{E}(b_{j1}^2|\hat{b}_{j}) + \mathbb{E}(b_{j2}^2|\hat{b}_{j}) - 2\rho\mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) \end{align*} \]
\[ f(\rho) = \sum_{j=1}^{J} -\frac{1}{2}\log(1-\rho^2) - \frac{1}{2(1-\rho^2)}\left(\hat{b}_{j1}^2 + \hat{b}_{j2}^2 - 2\hat{b}_{j1}\hat{b}_{j2}\rho -2\hat{b}_{j1} \mu_{j1} -2\hat{b}_{j2} \mu_{j2} + 2\hat{b}_{j2}\mu_{j1}\rho + 2\hat{b}_{j1}\mu_{j2}\rho + \mathbb{E}(b_{j1}^2|\hat{b}_{j}) + \mathbb{E}(b_{j2}^2|\hat{b}_{j}) - 2\rho\mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) \]
\[ \begin{align*} f(\rho)' &= \sum_{j=1}^{J} \frac{\rho}{1-\rho^2} -\frac{\rho}{(1-\rho^2)^2}\left( \hat{b}_{j1}^2 + \hat{b}_{j2}^2 -2\hat{b}_{j1} \mu_{j1} -2\hat{b}_{j2} \mu_{j2} + \mathbb{E}(b_{j1}^2|\hat{b}_{j}) + \mathbb{E}(b_{j2}^2|\hat{b}_{j}) \right) -\frac{\rho^2+1}{(1-\rho^2)^2}\left( -\hat{b}_{j1}\hat{b}_{j2} + \hat{b}_{j1}\mu_{j2} +\hat{b}_{j2}\mu_{j1} - \mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) = 0 \\ 0 &= \rho(1-\rho^2)n - \rho \sum_{j=1}^{n} \left( \hat{b}_{j1}^2 + \hat{b}_{j2}^2 -2\hat{b}_{j1} \mu_{j1} -2\hat{b}_{j2} \mu_{j2} + \mathbb{E}(b_{j1}^2|\hat{b}_{j}) + \mathbb{E}(b_{j2}^2|\hat{b}_{j}) \right) - (\rho^2 + 1) \sum_{j=1}^{n} \left( -\hat{b}_{j1}\hat{b}_{j2} + \hat{b}_{j1}\mu_{j2} +\hat{b}_{j2}\mu_{j1} - \mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) \\ 0 &=-n\rho^{3} - \rho^2 \sum_{j=1}^{n} \left( -\hat{b}_{j1}\hat{b}_{j2} + \hat{b}_{j1}\mu_{j2} +\hat{b}_{j2}\mu_{j1} - \mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) - \rho \sum_{j=1}^{n} \left( \hat{b}_{j1}^2 + \hat{b}_{j2}^2 -2\hat{b}_{j1} \mu_{j1} -2\hat{b}_{j2} \mu_{j2} + \mathbb{E}(b_{j1}^2|\hat{b}_{j}) + \mathbb{E}(b_{j2}^2|\hat{b}_{j}) -1\right) - \sum_{j=1}^{n} \left( -\hat{b}_{j1}\hat{b}_{j2} + \hat{b}_{j1}\mu_{j2} +\hat{b}_{j2}\mu_{j1} - \mathbb{E}(b_{j1}b_{j2}|\hat{b}_{j}) \right) \end{align*} \] The polynomial has either 1 or 3 real roots in (-1, 1).
Given \(\rho\), we estimate \(\boldsymbol{\pi}\) by max loglikelihood (convex problem)
Algorithm:
Input: X, Ulist, init_rho
Given rho, estimate pi by max loglikelihood (convex problem)
Compute loglikelihood
delta = 1
while delta > tol
M step: update rho
Given rho, estimate pi by max loglikelihood (convex problem)
Compute loglikelihood
Update delta
#' @param rho the off diagonal element of V, 2 by 2 correlation matrix
#' @param Ulist a list of covariance matrices, U_{k}
get_sigma <- function(rho, Ulist){
V <- matrix(c(1,rho,rho,1), 2,2)
lapply(Ulist, function(U) U + V)
}
penalty <- function(prior, pi_s){
subset <- (prior != 1.0)
sum((prior-1)[subset]*log(pi_s[subset]))
}
#' @title compute log likelihood
#' @param L log likelihoods,
#' where the (i,k)th entry is the log probability of observation i
#' given it came from component k of g
#' @param p the vector of mixture proportions
#' @param prior the weight for the penalty
compute.log.lik <- function(lL, p, prior){
p = normalize(pmax(0,p))
temp = log(exp(lL$loglik_matrix) %*% p)+lL$lfactors
return(sum(temp) + penalty(prior, p))
# return(sum(temp))
}
normalize <- function(x){
x/sum(x)
}
mixture.M.rho.times <- function(X, Ulist, init_rho=0, tol=1e-5, prior = c('nullbiased', 'uniform')){
times = length(init_rho)
result = list()
loglik = c()
rho = c()
time.t = c()
converge.status = c()
for(i in 1:times){
out.time = system.time(result[[i]] <- mixture.M.rho(X, Ulist,
init_rho=init_rho[i],
prior=prior,
tol=tol))
time.t = c(time.t, out.time['elapsed'])
loglik = c(loglik, tail(result[[i]]$loglik, 1))
rho = c(rho, result[[i]]$rho)
}
if(abs(max(loglik) - min(loglik)) < 1e-4){
status = 'global'
}else{
status = 'local'
}
ind = which.max(loglik)
return(list(result = result[[ind]], status = status, loglik = loglik, rho=rho, time = time.t))
}
mixture.M.rho <- function(X, Ulist, init_rho=0, tol=1e-5, prior = c('nullbiased', 'uniform')) {
prior <- match.arg(prior)
m.model = fit_mash(X, Ulist, rho = init_rho, prior=prior)
pi_s = get_estimated_pi(m.model, dimension = 'all')
prior.v <- mashr:::set_prior(length(pi_s), prior)
# compute loglikelihood
loglik <- c()
loglik <- c(loglik, get_loglik(m.model)+penalty(prior.v, pi_s))
delta.ll <- 1
niter <- 0
rho = init_rho
while(delta.ll > tol){
# max_rho
rho <- E_rho(X, m.model)
m.model = fit_mash(X, Ulist, rho, prior=prior)
pi_s = get_estimated_pi(m.model, dimension = 'all')
loglik <- c(loglik, get_loglik(m.model)+penalty(prior.v, pi_s))
# Update delta
delta.ll <- loglik[length(loglik)] - loglik[length(loglik)-1]
niter <- niter + 1
}
return(list(pihat = normalize(pi_s), rho = rho, loglik=loglik))
}
E_rho <- function(X, m.model){
n = nrow(X)
post.m = m.model$result$PosteriorMean
post.sec = plyr::laply(1:n, function(i) m.model$result$PosteriorCov[,,i] + tcrossprod(post.m[i,])) # nx2x2 array
temp2 = -sum(X[,1]*X[,2]) + sum(X[,1]*post.m[,2]) + sum(X[,2]*post.m[,1]) - sum(post.sec[,1,2])
temp1 = sum(X[,1]^2 + X[,2]^2) - 2*sum(X[,1]*post.m[,1]) - 2*sum(X[,2]*post.m[,2]) + sum(post.sec[,1,1] + post.sec[,2,2])
rts = polyroot(c(temp2, temp1-n, temp2, n))
# check complex number
is.real = abs(Im(rts))<1e-12
if(sum(is.real) == 1){
return(Re(rts[is.real]))
}else{
print('3 real roots')
return(Re(rts))
}
}
fit_mash <- function(X, Ulist, rho, prior=c('nullbiased', 'uniform')){
m.data = mashr::mash_set_data(Bhat=X, Shat=1, V = matrix(c(1, rho, rho, 1), 2, 2))
m.model = mashr::mash(m.data, Ulist, prior=prior, verbose = FALSE, outputlevel = 3)
return(m.model)
}
set.seed(1)
n = 4000; p = 2
Sigma = matrix(c(1,0.5,0.5,1),p,p)
U0 = matrix(0,2,2)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[2,2] = 1
U3 = matrix(1,2,2)
Utrue = list(U0=U0, U1=U1, U2=U2, U3=U3)
data = generate_data(n, p, Sigma, Utrue)
m.data = mash_set_data(data$Bhat, data$Shat)
U.c = cov_canonical(m.data)
result.mrho <- mixture.M.rho.times(m.data$Bhat, U.c)
The estimated \(\rho\) is 0.5060756. The running time is 65.74 seconds.
m.data.mrho = mash_set_data(data$Bhat, data$Shat, V = matrix(c(1,result.mrho$rho,result.mrho$rho,1),2,2))
U.c.mrho = cov_canonical(m.data.mrho)
m.mrho = mash(m.data.mrho, U.c, verbose= FALSE)
null.ind = which(apply(data$B,1,sum) == 0)
The log likelihood is -12302.54. There are 26 significant samples, 0 false positives. The RRMSE is 0.5820749.
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mashr_0.2.18.0454 ashr_2.2-7
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 knitr_1.20 whisker_0.3-2
[4] magrittr_1.5 workflowr_1.1.1 REBayes_1.3
[7] MASS_7.3-50 pscl_1.5.2 doParallel_1.0.14
[10] SQUAREM_2017.10-1 lattice_0.20-35 foreach_1.4.4
[13] plyr_1.8.4 stringr_1.3.1 tools_3.5.1
[16] parallel_3.5.1 grid_3.5.1 R.oo_1.22.0
[19] rmeta_3.0 git2r_0.23.0 htmltools_0.3.6
[22] iterators_1.0.10 assertthat_0.2.0 abind_1.4-5
[25] yaml_2.2.0 rprojroot_1.3-2 digest_0.6.18
[28] Matrix_1.2-14 codetools_0.2-15 R.utils_2.7.0
[31] evaluate_0.12 rmarkdown_1.10 stringi_1.2.4
[34] compiler_3.5.1 Rmosek_8.0.69 backports_1.1.2
[37] R.methodsS3_1.7.1 mvtnorm_1.0-8 truncnorm_1.0-8
This reproducible R Markdown analysis was created with workflowr 1.1.1