Last updated: 2018-10-09

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(1)

    The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: 6a7b52b

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    analysis/.DS_Store
        Ignored:    analysis/.Rhistory
        Ignored:    analysis/figure/
        Ignored:    analysis/include/.DS_Store
        Ignored:    code/.DS_Store
        Ignored:    data/.DS_Store
        Ignored:    docs/.DS_Store
        Ignored:    output/.DS_Store
    
    Untracked files:
        Untracked:  analysis/Classify.Rmd
        Untracked:  analysis/EstimateCorEM3W2.Rmd
        Untracked:  analysis/EstimateCorMaxEMGD.Rmd
        Untracked:  analysis/EstimateCorMaxGD.Rmd
        Untracked:  analysis/EstimateCorPrior.Rmd
        Untracked:  analysis/EstimateCorSol.Rmd
        Untracked:  analysis/HierarchicalFlashSim.Rmd
        Untracked:  analysis/MashLowSignalGTEx4.Rmd
        Untracked:  analysis/Mash_GTEx.Rmd
        Untracked:  analysis/MeanAsh.Rmd
        Untracked:  analysis/OutlierDetection.Rmd
        Untracked:  analysis/OutlierDetection2.Rmd
        Untracked:  analysis/OutlierDetection3.Rmd
        Untracked:  analysis/OutlierDetection4.Rmd
        Untracked:  analysis/mash_missing_row.Rmd
        Untracked:  code/GTExNullModel.R
        Untracked:  code/MASH.result.1.rds
        Untracked:  code/MashClassify.R
        Untracked:  code/MashCorResult.R
        Untracked:  code/MashNULLCorResult.R
        Untracked:  code/MashSource.R
        Untracked:  code/Weight_plot.R
        Untracked:  code/addemV.R
        Untracked:  code/estimate_cor.R
        Untracked:  code/generateDataV.R
        Untracked:  code/johnprocess.R
        Untracked:  code/sim_mean_sig.R
        Untracked:  code/summary.R
        Untracked:  data/Blischak_et_al_2015/
        Untracked:  data/scale_data.rds
        Untracked:  docs/figure/Classify.Rmd/
        Untracked:  docs/figure/OutlierDetection.Rmd/
        Untracked:  docs/figure/OutlierDetection2.Rmd/
        Untracked:  docs/figure/OutlierDetection3.Rmd/
        Untracked:  docs/figure/Test.Rmd/
        Untracked:  docs/figure/mash_missing_whole_row_5.Rmd/
        Untracked:  docs/include/
        Untracked:  output/AddEMV/
        Untracked:  output/CovED_UKBio_strong.rds
        Untracked:  output/CovED_UKBio_strong_Z.rds
        Untracked:  output/Flash_UKBio_strong.rds
        Untracked:  output/GTExNULLres/
        Untracked:  output/GTEx_2.5_nullData.rds
        Untracked:  output/GTEx_2.5_nullModel.rds
        Untracked:  output/GTEx_2.5_nullPermData.rds
        Untracked:  output/GTEx_2.5_nullPermModel.rds
        Untracked:  output/GTEx_3.5_nullData.rds
        Untracked:  output/GTEx_3.5_nullModel.rds
        Untracked:  output/GTEx_3.5_nullPermData.rds
        Untracked:  output/GTEx_3.5_nullPermModel.rds
        Untracked:  output/GTEx_3_nullData.rds
        Untracked:  output/GTEx_3_nullModel.rds
        Untracked:  output/GTEx_3_nullPermData.rds
        Untracked:  output/GTEx_3_nullPermModel.rds
        Untracked:  output/GTEx_4.5_nullData.rds
        Untracked:  output/GTEx_4.5_nullModel.rds
        Untracked:  output/GTEx_4.5_nullPermData.rds
        Untracked:  output/GTEx_4.5_nullPermModel.rds
        Untracked:  output/GTEx_4_nullData.rds
        Untracked:  output/GTEx_4_nullModel.rds
        Untracked:  output/GTEx_4_nullPermData.rds
        Untracked:  output/GTEx_4_nullPermModel.rds
        Untracked:  output/MASH.10.em2.result.rds
        Untracked:  output/MASH.10.mle.result.rds
        Untracked:  output/MASHNULL.V.result.1.rds
        Untracked:  output/MASHNULL.V.result.10.rds
        Untracked:  output/MASHNULL.V.result.11.rds
        Untracked:  output/MASHNULL.V.result.12.rds
        Untracked:  output/MASHNULL.V.result.13.rds
        Untracked:  output/MASHNULL.V.result.14.rds
        Untracked:  output/MASHNULL.V.result.15.rds
        Untracked:  output/MASHNULL.V.result.16.rds
        Untracked:  output/MASHNULL.V.result.17.rds
        Untracked:  output/MASHNULL.V.result.18.rds
        Untracked:  output/MASHNULL.V.result.19.rds
        Untracked:  output/MASHNULL.V.result.2.rds
        Untracked:  output/MASHNULL.V.result.20.rds
        Untracked:  output/MASHNULL.V.result.3.rds
        Untracked:  output/MASHNULL.V.result.4.rds
        Untracked:  output/MASHNULL.V.result.5.rds
        Untracked:  output/MASHNULL.V.result.6.rds
        Untracked:  output/MASHNULL.V.result.7.rds
        Untracked:  output/MASHNULL.V.result.8.rds
        Untracked:  output/MASHNULL.V.result.9.rds
        Untracked:  output/MashCorSim--midway/
        Untracked:  output/Mash_EE_Cov_0_plusR1.rds
        Untracked:  output/UKBio_mash_model.rds
        Untracked:  output/result.em.rds
    
    Unstaged changes:
        Modified:   analysis/EstimateCorMaxEM2.Rmd
        Modified:   analysis/Mash_UKBio.Rmd
        Modified:   analysis/mash_missing_samplesize.Rmd
        Modified:   output/Flash_T2_0.rds
        Modified:   output/Flash_T2_0_mclust.rds
        Modified:   output/Mash_model_0_plusR1.rds
        Modified:   output/PresiAddVarCol.rds
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    Rmd 6a7b52b zouyuxin 2018-10-09 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 43f718f zouyuxin 2018-10-09 Build site.
    Rmd 293f39b zouyuxin 2018-10-09 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 78b075a zouyuxin 2018-08-20 Build site.
    Rmd 88274cc zouyuxin 2018-08-20 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html cb0eb4d zouyuxin 2018-08-16 Build site.
    Rmd 6357604 zouyuxin 2018-08-16 wflow_publish(c(“analysis/EstimateCorIndex.Rmd”, “analysis/EstimateCorMaxMash.Rmd”))
    html 6281062 zouyuxin 2018-08-15 Build site.
    Rmd 3e3e128 zouyuxin 2018-08-15 wflow_publish(c(“analysis/EstimateCor.Rmd”, “analysis/EstimateCorMax.Rmd”,
    html 055978d zouyuxin 2018-08-15 Build site.
    Rmd 40dc525 zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 96ec142 zouyuxin 2018-08-15 Build site.
    Rmd 56743af zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html 802266a zouyuxin 2018-08-15 Build site.
    Rmd c7b3665 zouyuxin 2018-08-15 wflow_publish(“analysis/EstimateCorMaxMash.Rmd”)
    html b9e760f zouyuxin 2018-08-03 Build site.
    Rmd 1e6ff48 zouyuxin 2018-08-03 wflow_publish(c(“analysis/EstimateCorMaxMash.Rmd”))


library(mashr)
Loading required package: ashr
source('../code/generateDataV.R')
source('../code/estimate_cor.R')
source('../code/summary.R')
library(knitr)
library(kableExtra)
library(ggplot2)
library(reshape2)

Apply the max likelihood methods for correlation matrix on mash data.

After we estimate each pairwise correlation, the final resulting \(p\times p\) correlation matrix may not be positive definite. I estimate the nearest PD cor matrix with nearPD function. In the simulations, there is no non pd resulting matrix.

The estimated V from optim(optim function), \(M_{\rho}\) and \(MV\) perform better than the truncated correlation (error, mash log likelihood, ROC).

Comparing the estimated V from optim, \(M_{\rho}\) and \(MV\), \(MV\) algorithm is the fastest one, followed by \(M_{\rho}\). The estimated V from optim and \(M_{\rho}\) are very similar. They have smaller error to the true V than \(MV\), most of the time. The estimated V from \(MV\) has larger log likelihood in the final mash model. In terms of ROC and RRMSE, the estimated V from optim, \(M_{\rho}\) and \(MV\) performs similarly.

Based on the simulation results, I prefer the \(MV\) method.

One example: p = 3

\[ \hat{\beta}|\beta \sim N_{3}(\hat{\beta}; \beta, \left(\begin{matrix} 1 & 0.7 & 0.2 \\ 0.7 & 1 & 0.4 \\ 0.2 & 0.4 & 1 \end{matrix}\right)) \]

\[ \beta \sim \frac{1}{4}\delta_{0} + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 &0\\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}\right)) + \frac{1}{4}N_{3}(0, \left(\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{matrix}\right)) \]

set.seed(1)
Sigma = cbind(c(1,0.7,0.2), c(0.7,1,0.4), c(0.2,0.4,1))
U0 = matrix(0,3,3)
U1 = matrix(0,3,3); U1[1,1] = 1
U2 = diag(3); U2[3,3] = 0
U3 = matrix(1,3,3)
data = generate_data(n=4000, p=3, V=Sigma, Utrue = list(U0=U0, U1=U1,U2=U2,U3=U3))

We find the estimate of V with canonical covariances and the PCA covariances.

m.data = mash_set_data(data$Bhat, data$Shat)
m.1by1 = mash_1by1(m.data)
strong = get_significant_results(m.1by1)

U.pca = cov_pca(m.data, 3, subset = strong)
U.ed = cov_ed(m.data, U.pca, subset = strong)
U.c = cov_canonical(m.data)

The PCA correlation matrices are:

Expand here to see past versions of unnamed-chunk-3-1.png:
Version Author Date
78b075a zouyuxin 2018-08-20

  1. We run the algorithm in estimate cor optim with 3 different initial points for \(\rho\), (-0.5,0,0.5). The \(\rho\) in each iteration is estimated using optim function. The estimated correlation is
Vhat.optim = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'optim')
Vhat.optim$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.7004633 0.1571929
[2,] 0.7004633 1.0000000 0.4174847
[3,] 0.1571929 0.4174847 1.0000000
  1. The result uses algorithm in estimate cor mrho. \(\rho\) in each iteration is the root of a third degree polynomial.
Vhat.mrho = estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol = 1e-4, optmethod = 'mrho')
Vhat.mrho$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.6997223 0.1568695
[2,] 0.6997223 1.0000000 0.4171466
[3,] 0.1568695 0.4171466 1.0000000

The running time (in sec.) for each pairwise correlation is

table = data.frame(rbind(Vhat.optim$ttime, Vhat.mrho$ttime), row.names = c('optim', 'mrho'))
colnames(table) = c('12','13','23')
table %>% kable() %>% kable_styling()
12 13 23
optim 239.918 247.438 280.396
mrho 311.792 169.985 157.315

The time is the total running time with different initial point.

  1. The result uses algorithm in estimate cor mV.
Vhat.mV = estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(3), clusterGeneration::rcorrmatrix(3)), tol = 1e-4, optmethod = 'mV')
Vhat.mV$V
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.6714335 0.1961885
[2,] 0.6714335 1.0000000 0.3782850
[3,] 0.1961885 0.3782850 1.0000000
  1. Using the original truncated correlation:
Vhat.tru = estimate_null_correlation(m.data)
Vhat.tru
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.4296283 0.1222433
[2,] 0.4296283 1.0000000 0.3324459
[3,] 0.1222433 0.3324459 1.0000000

The truncated correlation underestimates the correlations.

  1. mash 1by1: Run ash for each condition, and estimate correlation matrix based on the non-significant samples.
V.mash = cor((data$Bhat/data$Shat)[-strong,])
V.mash
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.5313446 0.2445663
[2,] 0.5313446 1.0000000 0.4490049
[3,] 0.2445663 0.4490049 1.0000000

All the estimated correlation matrices above are positive definite.

Error

Check the estimation error:

FError = c(norm(Vhat.optim$V - Sigma, 'F'),
           norm(Vhat.mrho$V - Sigma, 'F'),
           norm(Vhat.mV$V - Sigma, 'F'),
           norm(Vhat.tru - Sigma, 'F'),
           norm(V.mash - Sigma, 'F'))
OpError = c(norm(Vhat.optim$V - Sigma, '2'),
           norm(Vhat.mrho$V - Sigma, '2'),
           norm(Vhat.mV$V - Sigma, '2'),
           norm(Vhat.tru - Sigma, '2'),
           norm(V.mash - Sigma, '2'))
table = data.frame(FrobeniusError = FError, SpectralError = OpError, row.names = c('optim','mrho','mV','trunc','m.1by1'))
table %>% kable() %>% kable_styling()
FrobeniusError SpectralError
optim 0.0653970 0.0464039
mrho 0.0656403 0.0465097
mV 0.0510317 0.0377794
trunc 0.4091712 0.3049974
m.1by1 0.2562509 0.1915171

mash log likelihood

In mash model, the model with correlation from optim has larger loglikelihood.

m.data.optim = mash_set_data(data$Bhat, data$Shat, V=Vhat.optim$V)
m.model.optim = mash(m.data.optim, c(U.c,U.ed), verbose = FALSE)
m.data.mrho = mash_set_data(data$Bhat, data$Shat, V=Vhat.mrho$V)
m.model.mrho = mash(m.data.mrho, c(U.c,U.ed), verbose = FALSE)
m.data.mV = mash_set_data(data$Bhat, data$Shat, V=Vhat.mV$V)
m.model.mV = mash(m.data.mV, c(U.c,U.ed), verbose = FALSE)
m.data.trunc = mash_set_data(data$Bhat, data$Shat, V=Vhat.tru)
m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)
m.data.1by1 = mash_set_data(data$Bhat, data$Shat, V=V.mash)
m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)
m.data.correct = mash_set_data(data$Bhat, data$Shat, V=Sigma)
m.model.correct = mash(m.data.correct, c(U.c,U.ed), verbose = FALSE)

The results are summarized in table:

null.ind = which(apply(data$B,1,sum) == 0)
V.trun = c(get_loglik(m.model.trunc), length(get_significant_results(m.model.trunc)), sum(get_significant_results(m.model.trunc) %in% null.ind))
V.optim = c(get_loglik(m.model.optim), length(get_significant_results(m.model.optim)), sum(get_significant_results(m.model.optim) %in% null.ind))
V.mrho = c(get_loglik(m.model.mrho), length(get_significant_results(m.model.mrho)), sum(get_significant_results(m.model.mrho) %in% null.ind))
V.mV = c(get_loglik(m.model.mV), length(get_significant_results(m.model.mV)), sum(get_significant_results(m.model.mV) %in% null.ind))
V.1by1 = c(get_loglik(m.model.1by1), length(get_significant_results(m.model.1by1)), sum(get_significant_results(m.model.1by1) %in% null.ind))
V.correct = c(get_loglik(m.model.correct), length(get_significant_results(m.model.correct)), sum(get_significant_results(m.model.correct) %in% null.ind))
temp = cbind(V.optim, V.mrho, V.mV, V.trun, V.1by1, V.correct)
colnames(temp) = c('optim','Mrho','MV', 'Truncate', 'm.1by1', 'True')
row.names(temp) = c('log likelihood', '# significance', '# False positive')
temp %>% kable() %>% kable_styling()
optim Mrho MV Truncate m.1by1 True
log likelihood -17920.2 -17920.1 -17912.63 -17951.46 -17943.49 -17913.58
# significance 146.0 146.0 131.00 85.00 73.00 149.00
# False positive 1.0 1.0 0.00 1.00 0.00 1.00

ROC

m.optim.seq = ROC.table(data$B, m.model.optim)
m.mrho.seq = ROC.table(data$B, m.model.mrho)
m.mV.seq = ROC.table(data$B, m.model.mV)
m.trun.seq = ROC.table(data$B, m.model.trunc)
m.1by1.seq = ROC.table(data$B, m.model.1by1)
m.correct.seq = ROC.table(data$B, m.model.correct)

Expand here to see past versions of unnamed-chunk-19-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09
802266a zouyuxin 2018-08-15

RRMSE

rrmse = rbind(RRMSE(data$B, data$Bhat, list(m.model.optim, m.model.mrho, m.model.mV, m.model.trunc, m.model.1by1, m.model.correct)))
colnames(rrmse) = c('optim','Mrho','MV', 'Truncate','m.1by1','True')
row.names(rrmse) = 'RRMSE'
rrmse %>% kable() %>% kable_styling()
optim Mrho MV Truncate m.1by1 True
RRMSE 0.528246 0.5282098 0.5285035 0.5592648 0.5442074 0.5283068
barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), names.arg = c('optim','Mrho', 'MV','Truncate','m.1by1','True'), las=2, cex.names = 0.7, main='RRMSE')

Expand here to see past versions of unnamed-chunk-21-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09
802266a zouyuxin 2018-08-15

More simulations: p = 5

I randomly generate 10 positive definite correlation matrices, V. The sample size is 4000.

\[ \hat{z}|z \sim N_{5}(z, V) \] \[ z\sim\frac{1}{4}\delta_{0} + \frac{1}{4}N_{5}(0,\left(\begin{matrix} 1 & \mathbf{0}_{1\times 4} \\ \mathbf{0}_{4\times 1} & \mathbf{0}_{4\times 4} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\left(\begin{matrix} \mathbf{1}_{2\times 2} & \mathbf{0}_{1\times 3} \\ \mathbf{0}_{3\times 1} & \mathbf{0}_{3\times 3} \end{matrix}\right)) + \frac{1}{4}N_{5}(0,\mathbf{1}_{5\times 5}) \]

set.seed(100)
n=4000; p = 5
U0 = matrix(0,p,p)
U1 = U0; U1[1,1] = 1
U2 = U0; U2[c(1:2), c(1:2)] = 1
U3 = matrix(1, p,p)
Utrue = list(U0 = U0, U1 = U1, U2 = U2, U3 = U3)
for(t in 1:20){
  Vtrue = clusterGeneration::rcorrmatrix(p)
  data = generate_data(n, p, Vtrue, Utrue)
  # mash cov
  m.data = mash_set_data(Bhat = data$Bhat, Shat = data$Shat)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)

  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)
  U.c = cov_canonical(m.data)
  Vhat.optim <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'optim')
  Vhat.mrho <- estimateV(m.data, c(U.c, U.ed), init_rho = c(-0.5,0,0.5), tol=1e-4, optmethod = 'mrho')
  Vhat.mV <- estimateV(m.data, c(U.c, U.ed), init_V = list(diag(ncol(m.data$Bhat)), clusterGeneration::rcorrmatrix(p), clusterGeneration::rcorrmatrix(p)),tol=1e-4, optmethod = 'mV')
  saveRDS(list(V.true = Vtrue, V.optim = Vhat.optim, V.mrho = Vhat.mrho, V.mV = Vhat.mV, data = data, strong=strong),
          paste0('../output/MASH.result.',t,'.rds'))
}
files = dir("../output/AddEMV/"); files = files[grep("MASH.result",files)]
times = length(files)
result = vector(mode="list",length = times)
for(i in 1:times) {
  result[[i]] = readRDS(paste("../output/AddEMV/", files[[i]], sep=""))
}
optim.pd = numeric(times)
mrho.pd = numeric(times)
for(i in 1:times){
  m.data = mash_set_data(result[[i]]$data$Bhat, result[[i]]$data$Shat)
  
  result[[i]]$V.trun = estimate_null_correlation(m.data, apply_lower_bound = FALSE)
  m.1by1 = mash_1by1(m.data)
  strong = get_significant_results(m.1by1)
  result[[i]]$V.1by1 = cor(m.data$Bhat[-strong,])
  U.c = cov_canonical(m.data)
  U.pca = cov_pca(m.data, 3, subset = strong)
  U.ed = cov_ed(m.data, U.pca, subset = strong)

  m.data.true = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.true)
  m.model.true = mash(m.data.true, c(U.c,U.ed), verbose = FALSE)
  
  m.data.trunc = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.trun)
  m.model.trunc = mash(m.data.trunc, c(U.c,U.ed), verbose = FALSE)
  
  m.data.1by1 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.1by1)
  m.model.1by1 = mash(m.data.1by1, c(U.c,U.ed), verbose = FALSE)
  
  m.data.mV = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.mV$V)
  m.model.mV = mash(m.data.mV, c(U.c,U.ed), verbose = FALSE)
  
  # optim
  
  m.model.optim = m.model.optim.F = m.model.optim.2 = list()
  R <- tryCatch(chol(result[[i]]$V.optim$V),error = function (e) FALSE)
  if(is.matrix(R)){
    optim.pd[i] = 1
    m.data.optim = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.optim$V)
    m.model.optim = mash(m.data.optim, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.optim.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.optim$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.optim.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.optim$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.optim.F = V.optim.near.F
    result[[i]]$V.optim.2 = V.optim.near.2
    
    # mashmodel
    m.data.optim.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.optim.near.F)
    m.model.optim.F = mash(m.data.optim.F, c(U.c,U.ed), verbose = FALSE)
    
    m.data.optim.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.optim.near.2)
    m.model.optim.2 = mash(m.data.optim.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  # Mrho
  m.model.mrho = m.model.mrho.F = m.model.mrho.2 = list()
  R <- tryCatch(chol(result[[i]]$V.mrho$V),error = function (e) FALSE)
  if(is.matrix(R)){
    mrho.pd[i] = 1
    m.data.mrho = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = result[[i]]$V.mrho$V)
    m.model.mrho = mash(m.data.mrho, c(U.c,U.ed), verbose = FALSE)
  }else{
    V.mrho.near.F = as.matrix(Matrix::nearPD(result[[i]]$V.mrho$V, conv.norm.type = 'F', keepDiag = TRUE)$mat)
    V.mrho.near.2 = as.matrix(Matrix::nearPD(result[[i]]$V.mrho$V, conv.norm.type = '2', keepDiag = TRUE)$mat)
    
    result[[i]]$V.mrho.F = V.mrho.near.F
    result[[i]]$V.mrho.2 = V.mrho.near.2
    
    # mashmodel
    m.data.mrho.F = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mrho.near.F)
    m.model.mrho.F = mash(m.data.mrho.F, c(U.c,U.ed), verbose = FALSE)
      
    m.data.mrho.2 = mash_set_data(Bhat = m.data$Bhat, Shat = m.data$Shat, V = V.mrho.near.2)
    m.model.mrho.2 = mash(m.data.mrho.2, c(U.c,U.ed), verbose = FALSE)
  }
  
  result[[i]]$m.model = list(m.model.true = m.model.true, m.model.trunc = m.model.trunc, 
                             m.model.1by1 = m.model.1by1, m.model.mV = m.model.mV,
                             m.model.optim = m.model.optim,
                             m.model.optim.F = m.model.optim.F, m.model.optim.2 = m.model.optim.2, 
                             m.model.mrho = m.model.mrho,
                             m.model.mrho.F = m.model.mrho.F, m.model.mrho.2 = m.model.mrho.2)
}

Error

The Frobenius norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='F', optim.pd = optim.pd[i], mrho.pd = mrho.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'optim','optim.F', 'Mrho', 'Mrho.F', 'MV')
temp = reshape2::melt(temp[,c(-4,-6)])
colnames(temp) = c('Data', 'Method', 'FrobError')
ggplot(temp, aes(x = Data, y=FrobError, group = Method, color = Method)) + geom_line()

Expand here to see past versions of unnamed-chunk-25-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09

The spectral norm is

temp = matrix(0,nrow = times, ncol = 7)
for(i in 1:times){
  temp[i, ] = error.cor(result[[i]], norm.type='2', optim.pd = optim.pd[i], mrho.pd = mrho.pd[i])
}
colnames(temp) = c('Trunc','m.1by1', 'optim','optim.2', 'Mrho', 'Mrho.2', 'MV')
temp = reshape2::melt(temp[,c(-4,-6)])
colnames(temp) = c('Data', 'Method', 'SpecError')
ggplot(temp, aes(x = Data, y=SpecError, group = Method, color = Method)) + geom_line()

Expand here to see past versions of unnamed-chunk-26-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Time

The total running time for each matrix is

optim.time = mrho.time = mV.time = numeric(times)
for(i in 1:times){
  optim.time[i] = sum(result[[i]]$V.optim$ttime)
  mrho.time[i] = sum(result[[i]]$V.mrho$ttime)
  mV.time[i] = sum(result[[i]]$V.mV$ttime)
}
temp = cbind(optim.time, mrho.time, mV.time)
colnames(temp) = c('optim', 'Mrho', 'mV')
row.names(temp) = 1:20

temp = melt(temp)
colnames(temp) = c('Data', 'Method', 'Time')
ggplot(temp, aes(x = Data, y=Time, group = Method, color = Method)) + geom_line()

Expand here to see past versions of unnamed-chunk-27-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09

mash log likelihood

temp = matrix(0,nrow = times, ncol = 10)
for(i in 1:times){
  temp[i, ] = loglik.cor(result[[i]]$m.model, optim.pd = optim.pd[i], mrho.pd = mrho.pd[i])
}
colnames(temp) = c('True', 'Trunc','m.1by1', 'optim','optim.F', 'optim.2', 'Mrho', 'Mrho.F', 'Mrho.2','MV')
temp = melt(temp[,-c(5,6,8,9)])
colnames(temp) = c('Data', 'Method', 'loglikelihood')
ggplot(temp, aes(x = Data, y=loglikelihood, group = Method, color = Method)) + geom_line()

Expand here to see past versions of unnamed-chunk-28-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09

ROC

par(mfrow=c(1,2))
for(i in 1:times){
  plotROC(result[[i]]$data$B, result[[i]]$m.model, optim.pd = optim.pd[i], mrho.pd = mrho.pd[i], title=paste0('Data', i, ' '))
}

Expand here to see past versions of unnamed-chunk-29-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-2.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-3.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-4.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-5.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-6.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-7.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-8.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-9.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-29-10.png:
Version Author Date
43f718f zouyuxin 2018-10-09

RRMSE

par(mfrow=c(1,2))
for(i in 1:times){
  rrmse = rbind(RRMSE(result[[i]]$data$B, result[[i]]$data$Bhat, result[[i]]$m.model))
  barplot(rrmse, ylim=c(0,(1+max(rrmse))/2), las=2, cex.names = 0.7, main='RRMSE')
}

Expand here to see past versions of unnamed-chunk-30-1.png:
Version Author Date
43f718f zouyuxin 2018-10-09
78b075a zouyuxin 2018-08-20

Expand here to see past versions of unnamed-chunk-30-2.png:
Version Author Date
43f718f zouyuxin 2018-10-09
78b075a zouyuxin 2018-08-20

Expand here to see past versions of unnamed-chunk-30-3.png:
Version Author Date
43f718f zouyuxin 2018-10-09
78b075a zouyuxin 2018-08-20

Expand here to see past versions of unnamed-chunk-30-4.png:
Version Author Date
43f718f zouyuxin 2018-10-09
78b075a zouyuxin 2018-08-20

Expand here to see past versions of unnamed-chunk-30-5.png:
Version Author Date
43f718f zouyuxin 2018-10-09
78b075a zouyuxin 2018-08-20

Expand here to see past versions of unnamed-chunk-30-6.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-30-7.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-30-8.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-30-9.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Expand here to see past versions of unnamed-chunk-30-10.png:
Version Author Date
43f718f zouyuxin 2018-10-09

Session information

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reshape2_1.4.3    ggplot2_3.0.0     kableExtra_0.9.0  knitr_1.20       
[5] plyr_1.8.4        mashr_0.2.18.0454 ashr_2.2-19      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19             mvtnorm_1.0-8           
 [3] lattice_0.20-35          Rmosek_8.0.69           
 [5] assertthat_0.2.0         rprojroot_1.3-2         
 [7] digest_0.6.15            foreach_1.4.4           
 [9] truncnorm_1.0-8          R6_2.2.2                
[11] backports_1.1.2          evaluate_0.11           
[13] httr_1.3.1               highr_0.7               
[15] pillar_1.3.0             rlang_0.2.2             
[17] lazyeval_0.2.1           pscl_1.5.2              
[19] rstudioapi_0.7           whisker_0.3-2           
[21] R.utils_2.6.0            R.oo_1.22.0             
[23] Matrix_1.2-14            rmarkdown_1.10          
[25] labeling_0.3             readr_1.1.1             
[27] stringr_1.3.1            REBayes_1.3             
[29] munsell_0.5.0            compiler_3.5.1          
[31] pkgconfig_2.0.2          SQUAREM_2017.10-1       
[33] htmltools_0.3.6          tidyselect_0.2.4        
[35] tibble_1.4.2             workflowr_1.1.1         
[37] codetools_0.2-15         viridisLite_0.3.0       
[39] crayon_1.3.4             dplyr_0.7.6             
[41] withr_2.1.2              MASS_7.3-50             
[43] R.methodsS3_1.7.1        grid_3.5.1              
[45] gtable_0.2.0             git2r_0.23.0            
[47] magrittr_1.5             ExtremeDeconvolution_1.3
[49] scales_1.0.0             stringi_1.2.4           
[51] doParallel_1.0.14        bindrcpp_0.2.2          
[53] xml2_1.2.0               rmeta_3.0               
[55] iterators_1.0.10         tools_3.5.1             
[57] glue_1.3.0               purrr_0.2.5             
[59] hms_0.4.2                parallel_3.5.1          
[61] yaml_2.2.0               colorspace_1.3-2        
[63] rvest_0.3.2              corrplot_0.84           
[65] clusterGeneration_1.3.4  bindr_0.1.1             

This reproducible R Markdown analysis was created with workflowr 1.1.1