Last updated: 2019-01-06
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(1) 
The command set.seed(1) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: e518097 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/.Rhistory
    Ignored:    analysis/include/.DS_Store
    Ignored:    code/.DS_Store
    Ignored:    data/.DS_Store
    Ignored:    docs/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    output/.sos/
Untracked files:
    Untracked:  analysis/Classify.Rmd
    Untracked:  analysis/EstimateCorMash.Rmd
    Untracked:  analysis/EstimateCorMaxGD.Rmd
    Untracked:  analysis/EstimateCorMaxMCMash.Rmd
    Untracked:  analysis/HierarchicalFlashSim.Rmd
    Untracked:  analysis/MashLowSignalGTEx4.Rmd
    Untracked:  analysis/Mash_GTEx.Rmd
    Untracked:  analysis/MeanAsh.Rmd
    Untracked:  analysis/OutlierDetection.Rmd
    Untracked:  analysis/OutlierDetection2.Rmd
    Untracked:  analysis/OutlierDetection3.Rmd
    Untracked:  analysis/OutlierDetection4.Rmd
    Untracked:  analysis/mash_missing_row.Rmd
    Untracked:  code/GTExNullModel.R
    Untracked:  code/MashClassify.R
    Untracked:  code/MashCorResult.R
    Untracked:  code/MashCormVResult.R
    Untracked:  code/MashNULLCorResult.R
    Untracked:  code/MashSource.R
    Untracked:  code/Weight_plot.R
    Untracked:  code/addemV.R
    Untracked:  code/dsc-differentV/
    Untracked:  code/dsc-differentV_signal/
    Untracked:  code/estimate_cor.R
    Untracked:  code/generateDataV.R
    Untracked:  code/johnprocess.R
    Untracked:  code/mV.R
    Untracked:  code/sim_mean_sig.R
    Untracked:  code/summary.R
    Untracked:  data/Blischak_et_al_2015/
    Untracked:  data/scale_data.rds
    Untracked:  data/wasp_yuxin/
    Untracked:  docs/figure/Classify.Rmd/
    Untracked:  docs/figure/OutlierDetection.Rmd/
    Untracked:  docs/figure/OutlierDetection2.Rmd/
    Untracked:  docs/figure/OutlierDetection3.Rmd/
    Untracked:  docs/figure/Test.Rmd/
    Untracked:  docs/figure/mash_missing_whole_row_5.Rmd/
    Untracked:  docs/include/
    Untracked:  output/AddEMV/
    Untracked:  output/CovED_UKBio_strong.rds
    Untracked:  output/CovED_UKBio_strong_Z.rds
    Untracked:  output/EstCorMLECompare/
    Untracked:  output/Flash_UKBio_strong.rds
    Untracked:  output/GTExNULLres/
    Untracked:  output/GTEx_2.5_nullData.rds
    Untracked:  output/GTEx_2.5_nullModel.rds
    Untracked:  output/GTEx_2.5_nullPermData.rds
    Untracked:  output/GTEx_2.5_nullPermModel.rds
    Untracked:  output/GTEx_3.5_nullData.rds
    Untracked:  output/GTEx_3.5_nullModel.rds
    Untracked:  output/GTEx_3.5_nullPermData.rds
    Untracked:  output/GTEx_3.5_nullPermModel.rds
    Untracked:  output/GTEx_3_nullData.rds
    Untracked:  output/GTEx_3_nullModel.rds
    Untracked:  output/GTEx_3_nullPermData.rds
    Untracked:  output/GTEx_3_nullPermModel.rds
    Untracked:  output/GTEx_4.5_nullData.rds
    Untracked:  output/GTEx_4.5_nullModel.rds
    Untracked:  output/GTEx_4.5_nullPermData.rds
    Untracked:  output/GTEx_4.5_nullPermModel.rds
    Untracked:  output/GTEx_4_nullData.rds
    Untracked:  output/GTEx_4_nullModel.rds
    Untracked:  output/GTEx_4_nullPermData.rds
    Untracked:  output/GTEx_4_nullPermModel.rds
    Untracked:  output/MASH.10.em2.result.rds
    Untracked:  output/MASH.10.mle.result.rds
    Untracked:  output/MashCorSim--midway/
    Untracked:  output/Mash_EE_Cov_0_plusR1.rds
    Untracked:  output/UKBio_mash_model.rds
    Untracked:  output/WASP/
    Untracked:  output/diff_v/
    Untracked:  output/diff_v_signal/
    Untracked:  output/dsc-mashr-est_v/
    Untracked:  output/mVIterations/
    Untracked:  output/mVMLEsubset/
    Untracked:  output/mVUlist/
    Untracked:  output/result.em.rds
Unstaged changes:
    Modified:   analysis/EstimateCorMaxMVSample.Rmd
    Modified:   analysis/Mash_UKBio.Rmd
    Modified:   analysis/mash_missing_samplesize.Rmd
    Modified:   output/Flash_T2_0.rds
    Modified:   output/Flash_T2_0_mclust.rds
    Modified:   output/Mash_model_0_plusR1.rds
    Modified:   output/PresiAddVarCol.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes. 
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | e518097 | zouyuxin | 2019-01-06 | wflow_publish(“analysis/WASPmash.Rmd”) | 
| html | c40c19d | zouyuxin | 2019-01-06 | Build site. | 
| Rmd | d607f8d | zouyuxin | 2019-01-06 | wflow_publish(“analysis/WASPmash.Rmd”) | 
| html | 71d959e | zouyuxin | 2019-01-06 | Build site. | 
| Rmd | b04fb6c | zouyuxin | 2019-01-06 | wflow_publish(“analysis/WASPmash.Rmd”) | 
Loading required package: ashr
dat = readRDS('../data/wasp_yuxin/fastqtl_to_mash_output/wasp.mash.rds')
dat$strong.z[is.infinite(dat$strong.z)] = sign(dat$strong.z[is.infinite(dat$strong.z)]) * 10
dat$random.z[is.infinite(dat$random.z)] = sign(dat$random.z[is.infinite(dat$random.z)]) * 10
dat$strong.z = dat$strong.z[,c(1,8:16,2:7)]
dat$random.z = dat$random.z[,c(1,8:16,2:7)]
data.random = mash_set_data(dat$random.z)
data.strong = mash_set_data(dat$strong.z)
Flash:
my_init_fn <- function(Y, K = 1) {
  ret = flashr:::udv_si(Y, K)
  pos_sum = sum(ret$v[ret$v > 0])
  neg_sum = -sum(ret$v[ret$v < 0])
  if (neg_sum > pos_sum) {
    return(list(u = -ret$u, d = ret$d, v = -ret$v))
  } else
    return(ret)
}
flash_pipeline = function(data, ...) {
  ## current state-of-the art
  ## suggested by Jason Willwerscheid
  ## cf: discussion section of
  ## https://willwerscheid.github.io/MASHvFLASH/MASHvFLASHnn2.html
  ebnm_fn = "ebnm_ash"
  ebnm_param = list(l = list(mixcompdist = "normal",
                             optmethod = "mixSQP"),
                    f = list(mixcompdist = "+uniform",
                             optmethod = "mixSQP"))
  ##
  fl_g <- flashr:::flash_greedy_workhorse(data,
                                          var_type = "constant",
                                          ebnm_fn = ebnm_fn,
                                          ebnm_param = ebnm_param,
                                          init_fn = "my_init_fn",
                                          stopping_rule = "factors",
                                          tol = 1e-3,
                                          verbose_output = "odF")
  fl_b <- flashr:::flash_backfit_workhorse(data,
                                           f_init = fl_g,
                                           var_type = "constant",
                                           ebnm_fn = ebnm_fn,
                                           ebnm_param = ebnm_param,
                                           stopping_rule = "factors",
                                           tol = 1e-3,
                                           verbose_output = "odF")
  return(fl_b)
}
cov_flash = function(data, subset = NULL, non_canonical = FALSE, save_model = NULL) {
  if(is.null(subset)) subset = 1:mashr:::n_effects(data)
  b.center = apply(data$Bhat, 2, function(x) x - mean(x))
  ## Only keep factors with at least two values greater than 1 / sqrt(n)
  find_nonunique_effects <- function(fl) {
    thresh <- 1/sqrt(ncol(fl$fitted_values))
    vals_above_avg <- colSums(fl$ldf$f > thresh)
    nonuniq_effects <- which(vals_above_avg > 1)
    return(fl$ldf$f[, nonuniq_effects, drop = FALSE])
  }
  fmodel = flash_pipeline(b.center)
  if (non_canonical)
    flash_f = find_nonunique_effects(fmodel)
  else 
    flash_f = fmodel$ldf$f
  ## row.names(flash_f) = colnames(b)
  if (!is.null(save_model)) saveRDS(list(model=fmodel, factors=flash_f), save_model)
  if(ncol(flash_f) == 0){
    U.flash = list("tFLASH" = t(fmodel$fitted_values) %*% fmodel$fitted_values / nrow(fmodel$fitted_values))
  } else{
    U.flash = c(cov_from_factors(t(as.matrix(flash_f)), "FLASH"),
  list("tFLASH" = t(fmodel$fitted_values) %*% fmodel$fitted_values / nrow(fmodel$fitted_values)))
  }
  
  return(U.flash)
}
U.f = cov_flash(data.strong, non_canonical = TRUE, save_model = '../output/WASP/flash_model.rds')
saveRDS(U.f, '../output/WASP/flash_cov.rds')
fl_model = readRDS('../output/WASP/flash_model.rds')$model
factors = readRDS('../output/WASP/flash_model.rds')$factors
par(mfrow = c(1, 3))
for(k in 1:3){
  barplot(factors[,k], main=paste0("Factor ", k), names.arg = 0:15)
}

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
fll_model = flash_pipeline(fl_model$ldf$l)
saveRDS(fll_model, '../output/WASP/flash_loading_model.rds')
U.pca = cov_pca(data.strong, 5)
U.ed = cov_ed(data.strong, c(U.f, U.pca))
U.ed = readRDS('../output/WASP/Ued.rds')
U.c = cov_canonical(data.random)
m.ignore = mash(data.random, c(U.c, U.ed), outputlevel = 1)
m.ignore$result = mash_compute_posterior_matrices(m.ignore, data.strong)
V.simple = estimate_null_correlation_simple(data.random)
data.random.V.simple = mash_update_data(data.random, V = V.simple)
m.simple = mash(data.random.V.simple, c(U.c, U.ed), outputlevel = 1)
data.strong.V.simple = mash_update_data(data.strong, V = V.simple)
m.simple$result = mash_compute_posterior_matrices(m.simple, data.strong.V.simple)
set.seed(1)
random.subset = sample(1:nrow(gtex$random.b),5000)
data.random.s = mash_set_data(gtex$random.b[random.subset,], gtex$random.s[random.subset,])
current = estimate_null_correlation(data.random.s, c(U.c, U.ed), max_iter = 20)
V.current = current$V
data.random.V.current = mash_update_data(data.random, V = V.current)
m.current = mash(data.random.V.current, c(U.c, U.ed), outputlevel = 1)
data.strong = mash_update_data(data.strong, V = V.current)
m.current$result = mash_compute_posterior_matrices(m.current, data.strong)
# read model
m_ignore = readRDS('../output/WASP/m_ignore_post.rds')
m_simple = readRDS('../output/WASP/m_simple_post.rds')
m_current = readRDS('../output/WASP/m_current_post.rds')
colnames(V.simple) = 0:15
row.names(V.simple) = 0:15
corrplot::corrplot(V.simple, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=colorRampPalette(c("blue", "white", "red"))(200), cl.lim = c(-1,1), title = 'Simple', mar=c(0,0,5,0))

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
V.current = readRDS('../output/WASP/currentV.rds')
V.current = V.current$V
colnames(V.current) = 0:15
row.names(V.current) = 0:15
corrplot::corrplot(V.current, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=colorRampPalette(c("blue", "white", "red"))(200), cl.lim = c(-1,1), title = 'Current', mar=c(0,0,5,0))

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
tmp = cbind(c(get_loglik(m_ignore), get_loglik(m_simple), get_loglik(m_current)))
row.names(tmp) = c('Ignore', 'Simple', 'Current')
colnames(tmp) = 'log likelihood'
tmp %>% kable() %>% kable_styling()
| log likelihood | |
|---|---|
| Ignore | -442320.6 | 
| Simple | -429581.7 | 
| Current | -428237.2 | 
par(mfrow=c(1,3))
barplot(get_estimated_pi(m_ignore), las=2, cex.names = 0.7, main = 'Ignore')
barplot(get_estimated_pi(m_simple), las=2, cex.names = 0.7, main = 'Simple')
barplot(get_estimated_pi(m_current), las=2, cex.names = 0.7, main = 'Current')

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
Number of significant:
numsig = c(length(get_significant_results(m_ignore)), 
              length(get_significant_results(m_simple)), 
              length(get_significant_results(m_current)))
tmp = cbind(numsig)
row.names(tmp) = c('Ignore', 'Simple', 'Current')
colnames(tmp) = c('# significance')
tmp %>% kable() %>% kable_styling()
| # significance | |
|---|---|
| Ignore | 5872 | 
| Simple | 2983 | 
| Current | 1617 | 
The intersection of significance results:
length(intersect(get_significant_results(m_simple), get_significant_results(m_current)))
[1] 1526
length(intersect(get_significant_results(m_ignore), get_significant_results(m_simple)))
[1] 2981
length(intersect(get_significant_results(m_current), get_significant_results(m_ignore)))
[1] 1593
stronggene = data.frame(dat$strong.z[739,])
colnames(stronggene) = 'EffectSize'
stronggene$Group = 0:15
stronggene$se = dat$strong.s[739,]
p1 = ggplot(stronggene, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000085491') + ylim(c(-10,-2)) + geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneSimple = data.frame(m_simple$result$PosteriorMean[739,])
colnames(stronggeneSimple) = 'EffectSize'
stronggeneSimple$Group = 0:15
stronggeneSimple$se = m_simple$result$PosteriorSD[739,]
p2 = ggplot(stronggeneSimple, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000085491 Simple') + ylim(c(-10,-2)) + 
  geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneCurrent = data.frame(m_current$result$PosteriorMean[739,])
colnames(stronggeneCurrent) = 'EffectSize'
stronggeneCurrent$Group = 0:15
stronggeneCurrent$se = m_current$result$PosteriorSD[739,]
p3 = ggplot(stronggeneCurrent, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + ylim(c(-10,-2)) + coord_flip() + ggtitle('ENSG00000085491 Current') + 
  geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
grid.arrange(p1, p2, p3, nrow = 1)

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
The gene significant in simple, not in current
stronggene = data.frame(dat$strong.z[5111,])
colnames(stronggene) = 'EffectSize'
stronggene$Group = 0:15
stronggene$se = dat$strong.s[5111,]
p1 = ggplot(stronggene, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000173473') + ylim(c(-6,3)) + geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneSimple = data.frame(m_simple$result$PosteriorMean[5111,])
colnames(stronggeneSimple) = 'EffectSize'
stronggeneSimple$Group = 0:15
stronggeneSimple$se = m_simple$result$PosteriorSD[5111,]
p2 = ggplot(stronggeneSimple, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + coord_flip() + ggtitle('ENSG00000173473 Simple') + ylim(c(-6,3)) + 
  geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
stronggeneCurrent = data.frame(m_current$result$PosteriorMean[5111,])
colnames(stronggeneCurrent) = 'EffectSize'
stronggeneCurrent$Group = 0:15
stronggeneCurrent$se = m_current$result$PosteriorSD[5111,]
p3 = ggplot(stronggeneCurrent, aes(y = EffectSize, x = Group)) + 
  geom_point(show.legend = FALSE) + ylim(c(-6,3)) + coord_flip() + ggtitle('ENSG00000173473 Current') + 
  geom_errorbar(aes(ymin=EffectSize-1.96*se, ymax=EffectSize+1.96*se), width=0.4, show.legend = FALSE) + 
  theme_bw(base_size=12) + theme(axis.text.y = element_text(size = 6))
grid.arrange(p1, p2, p3, nrow = 1)

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
The pairwise sharing by magnitude
x           <- get_pairwise_sharing(m_ignore)
colnames(x) <- 0:15
rownames(x) <- 0:15
clrs=colorRampPalette(rev(c('darkred', 'red','orange','yellow','cadetblue1', 'cyan', 'dodgerblue4', 'blue','darkorchid1','lightgreen','green', 'forestgreen','darkolivegreen')))(200)
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Ignore', mar=c(0,0,5,0))

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
x           <- get_pairwise_sharing(m_simple)
colnames(x) <- 0:15
rownames(x) <- 0:15
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Simple', mar=c(0,0,5,0))

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
x           <- get_pairwise_sharing(m_current)
colnames(x) <- 0:15
rownames(x) <- 0:15
corrplot::corrplot(x, method='color', type='upper', tl.col="black", tl.srt=45, tl.cex = 0.7, diag = FALSE, col=clrs, cl.lim = c(0,1), title = 'Current', mar=c(0,0,5,0))

| Version | Author | Date | 
|---|---|---|
| 71d959e | zouyuxin | 2019-01-06 | 
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] gridExtra_2.3     ggplot2_3.1.0     kableExtra_0.9.0  knitr_1.20       
[5] mashr_0.2.19.0555 ashr_2.2-26       mixsqp_0.1-93     flashr_0.6-3     
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        mvtnorm_1.0-8     lattice_0.20-35  
 [4] assertthat_0.2.0  rprojroot_1.3-2   digest_0.6.18    
 [7] foreach_1.4.4     truncnorm_1.0-8   R6_2.3.0         
[10] plyr_1.8.4        backports_1.1.2   evaluate_0.12    
[13] httr_1.3.1        highr_0.7         pillar_1.3.1     
[16] rlang_0.3.0.1     lazyeval_0.2.1    pscl_1.5.2       
[19] rstudioapi_0.8    whisker_0.3-2     R.utils_2.7.0    
[22] R.oo_1.22.0       Matrix_1.2-14     rmarkdown_1.10   
[25] labeling_0.3      readr_1.1.1       stringr_1.3.1    
[28] munsell_0.5.0     compiler_3.5.1    pkgconfig_2.0.2  
[31] SQUAREM_2017.10-1 htmltools_0.3.6   tidyselect_0.2.5 
[34] tibble_1.4.2      workflowr_1.1.1   codetools_0.2-15 
[37] viridisLite_0.3.0 crayon_1.3.4      dplyr_0.7.6      
[40] withr_2.1.2       MASS_7.3-50       R.methodsS3_1.7.1
[43] grid_3.5.1        gtable_0.2.0      git2r_0.23.0     
[46] magrittr_1.5      scales_1.0.0      stringi_1.2.4    
[49] reshape2_1.4.3    doParallel_1.0.14 bindrcpp_0.2.2   
[52] xml2_1.2.0        rmeta_3.0         iterators_1.0.10 
[55] tools_3.5.1       glue_1.3.0        softImpute_1.4   
[58] purrr_0.2.5       hms_0.4.2         abind_1.4-5      
[61] parallel_3.5.1    yaml_2.2.0        colorspace_1.3-2 
[64] rvest_0.3.2       corrplot_0.84     bindr_0.1.1      
This reproducible R Markdown analysis was created with workflowr 1.1.1