Last updated: 2018-05-16

Loading required package: ashr

Simulation Design

In the simplest case, there is no true deviation exists. \[c_{j} = \mu_{j} 1\] \[\hat{c}_{j} \sim N_{8}(c_{j}, \frac{1}{2}I)\] Let L be the contrast matrix. Therefore, \[\hat{\delta}_{j} = L\hat{c}_{j} \sim N_{7}(0, \frac{1}{2}LL')\]

We first generate the data:

set.seed(1)
data = sim_contrast1(nsamp = 10000, ncond = 8)
colnames(data$C) = colnames(data$Chat) = colnames(data$Shat) = c('CTL', paste0('T', 1:7))

Set up the contrast matrix and the mash contrast data object

L = rbind(c(-1,1,0,0,0,0,0,0),
          c(-1,0,1,0,0,0,0,0),
          c(-1,0,0,1,0,0,0,0),
          c(-1,0,0,0,1,0,0,0),
          c(-1,0,0,0,0,1,0,0),
          c(-1,0,0,0,0,0,1,0),
          c(-1,0,0,0,0,0,0,1))
row.names(L) = c('T1-CTL','T2-CTL','T3-CTL','T4-CTL','T5-CTL','T6-CTL','T7-CTL')
mash_data = mash_set_data(Bhat=data$Chat, Shat=data$Shat)
mash_data_L = mash_set_data_contrast(mash_data, L)

Set up the covariance matrices:

There are two types of covariance matrix you can use in mash: “canonical” and “data-driven”.

# canonical
U.c = cov_canonical(mash_data_L)
# data driven
mash_data_L.z = mash_data_L
mash_data_L.z$Bhat = mash_data_L$Bhat/mash_data_L$Shat
m.1by1 = mash_1by1(mash_data_L.z)
strong = get_significant_results(m.1by1,0.05)
# no strong
# no data driven

Fit mashcontrast model

mashcontrast.model = mash(mash_data_L, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 1.04 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.88 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.49 seconds.

length(get_significant_results(mashcontrast.model))

[1] 0

There is no discovery, which is as we expected. The true deviations are zero for all samples.

barplot(get_estimated_pi(mashcontrast.model),las = 2, cex.names = 0.7)

Comparing discover result with \(\hat{\delta}_{j} \sim N_{7}(0,I)\):

Indep.data = mash_set_data(Bhat = data$Chat%*%t(L))
U.c = cov_canonical(Indep.data)
Indep.model = mash(Indep.data, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 0.70 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.64 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.08 seconds.

length(get_significant_results(Indep.model))

[1] 2818

There are more false positives.

barplot(get_estimated_pi(Indep.model),las = 2,cex.names = 0.7)

Comparing discover result with \(\hat{z} \sim N_{7}(0,I)\):

In this case, the observed deviations are truly independent.

# simulate true independent delta
D = matrix(0, nrow=10000, ncol = 7)
Shat = matrix(1, nrow=10000, ncol = 7)
set.seed(1)
E = matrix(rnorm(length(Shat), mean=0, sd=Shat), nrow=10000, ncol = 7)
Dhat = D + E
row_ids = paste0("sample_", 1:nrow(D))
col_ids = paste0("condition_", 1:ncol(D))
rownames(D) = row_ids
colnames(D) = col_ids
rownames(Dhat) = row_ids
colnames(Dhat) = col_ids
rownames(Shat) = row_ids
colnames(Shat) = col_ids

mash_delta = mash_set_data(Bhat = Dhat, Shat = Shat)
U.c = cov_canonical(mash_delta)
mash_delta_model = mash(mash_delta, U.c, algorithm.version = 'R')

 - Computing 10000 x 169 likelihood matrix.
 - Likelihood calculations took 0.83 seconds.
 - Fitting model with 169 mixture components.
 - Model fitting took 1.70 seconds.
 - Computing posterior matrices.
 - Computation allocated took 0.04 seconds.

length(get_significant_results(mash_delta_model))

[1] 0

barplot(get_estimated_pi(mash_delta_model),las = 2,cex.names = 0.7)

From the above comparisons, we show that the wrong assumption \(\hat{\delta}\sim N_{7}(0,I)\) produces many more false positives.

Session information

sessionInfo()

R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mashr_0.2-8 ashr_2.2-7 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16      knitr_1.20        magrittr_1.5     
 [4] REBayes_1.3       MASS_7.3-50       doParallel_1.0.11
 [7] pscl_1.5.2        SQUAREM_2017.10-1 lattice_0.20-35  
[10] foreach_1.4.4     plyr_1.8.4        stringr_1.3.0    
[13] tools_3.4.4       parallel_3.4.4    grid_3.4.4       
[16] rmeta_3.0         htmltools_0.3.6   iterators_1.0.9  
[19] assertthat_0.2.0  yaml_2.1.19       rprojroot_1.3-2  
[22] digest_0.6.15     Matrix_1.2-14     codetools_0.2-15 
[25] evaluate_0.10.1   rmarkdown_1.9     stringi_1.2.2    
[28] compiler_3.4.4    Rmosek_8.0.69     backports_1.1.2  
[31] mvtnorm_1.0-7     truncnorm_1.0-8

Simple Simulation Example

Yuxin Zou

2018-01-10

Simulation Design

Comparing discover result with \(\hat{\delta}_{j} \sim N_{7}(0,I)\):

Comparing discover result with \(\hat{z} \sim N_{7}(0,I)\):

Session information