Last updated: 2018-05-12
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed:
set.seed(20180510)
The command set.seed(20180510)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: 4c2def7
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: .DS_Store
Untracked: code/ComplieData.R
Untracked: code/Microarray_All_GLM_IBD.R
Untracked: code/Microarray_Sep_GLM.R
Untracked: data/.DS_Store
Untracked: data/raw_data/
Untracked: data/results/.DS_Store
Untracked: data/results/Microarray_AAD_metaanalysis.rds
Untracked: data/results/Microarray_ASD_metaanalysis.rds
Untracked: data/results/Microarray_BD_metaanalysis.rds
Untracked: data/results/Microarray_MDD_metaanalysis.rds
Untracked: data/results/Microarray_SCZ_metaanalysis.rds
Untracked: data/results/Microarray_compiledGLM_IBD.rds
Untracked: data/results/control/
Untracked: data/results/tables/
Untracked: data/working_data/
Untracked: output/MashCB_EE_Cov_IBD.rds
Untracked: output/MashCB_model_EE_IBD.rds
Untracked: output/MashControl_EE_Cov.rds
Untracked: output/MashControl_model_EE.rds
Unstaged changes:
Modified: _workflowr.yml
Modified: code/Microarray_All_GLM.R
Modified: data/results/Microarray_compiledGLM.rds
Modified: output/MashCB_EE_Cov.rds
Modified: output/MashCB_model_EE.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
library(limma); library(mashr); library(mclust); library(plyr);
Loading required package: ashr
Package 'mclust' version 5.4
Type 'citation("mclust")' for citing this R package in publications.
Attaching package: 'mclust'
The following object is masked from 'package:ashr':
dens
library(flashr); library(colorRamps); library(corrplot); library(ggplot2)
corrplot 0.84 loaded
data = readRDS('../data/results/control/CompiledData.rds')
mash.data = mash_set_data(Bhat = data$beta, pval = data$p)
Top genes:
# find strong genes
m.1by1 = mash_1by1(mash.data, alpha=0)
strong = get_significant_results(m.1by1)
Z = mash.data$Bhat/mash.data$Shat
Z.strong = Z[strong,]
# center
Z.center = apply(Z.strong, 2, function(x) x - mean(x))
Data Driven:
Flash:
flash.data = flash_set_data(Z.center)
fmodel = flash(flash.data, greedy = TRUE, backfit = TRUE)
fitting factor/loading 1
fitting factor/loading 2
fitting factor/loading 3
fitting factor/loading 4
factors = flash_get_ldf(fmodel)$f
row.names(factors) = colnames(data$beta)
pve.order = order(flash_get_pve(fmodel), decreasing = TRUE)
par(mfrow=c(1,3))
for(i in pve.order){
barplot(factors[,i], main=paste0('Factor ',i, ' pve= ', round(flash_get_pve(fmodel)[i],3)), las=2, cex.names = 0.7)
}
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
par(mfrow=c(1,1))
flash
on the loading:
loading = fmodel$EL[,1:3]
colnames(loading) = paste0('Factor',seq(1,3))
flash.loading = flash_set_data(loading)
flmodel = flash(flash.loading, greedy = TRUE, backfit = TRUE)
fitting factor/loading 1
The flash prefers the rank 0 model. There is no hidden structure in the loading matrix.
Cluster loadings:
mod = Mclust(loading)
summary(mod$BIC)
Best BIC values:
VVE,6 VVE,7 EVE,6
BIC -34778.07 -34815.86606 -34816.67977
BIC diff 0.00 -37.79266 -38.60637
Using clustering result to fit mash
:
\[l_{i}\sim \sum_{j=1}^{m}N(\mu_{j}, \Sigma_{j})\] We estimate the covariance as \(F(\Sigma_j + \mu_{j}\mu_{j}')F'\).
U_list = alply(mod$parameters$variance$sigma,3)
mu_list = alply(mod$parameters$mean,2)
ll = list()
for (i in 1:length(U_list)){
ll[[i]] = U_list[[i]] + mu_list[[i]] %*% t(mu_list[[i]])
}
Factors = fmodel$EF[,1:3]
U.loading = lapply(ll, function(U){Factors %*% (U %*% t(Factors))})
names(U.loading) = paste0('Load', "_", (1:length(U.loading)))
# rank 1
Flash_res = flash_get_lf(fmodel)
U.Flash = c(mashr::cov_from_factors(t(as.matrix(factors)), "Flash"),
list("tFlash" = t(Flash_res) %*% Flash_res / nrow(Z.center)))
PCA:
U.pca = cov_pca(mash_set_data(Z.center), 3)
Canonical
U.c = cov_canonical(mash_set_data(Z.center))
Extreme Deconvolution
U.dd = c(U.pca, U.loading, U.Flash, list('XX' = t(Z.center) %*% Z.center / nrow(Z.center)))
U.ed = cov_ed(mash.data, U.dd, strong)
mash.model = mash(mash.data, c(U.c, U.ed))
- Computing 11200 x 651 likelihood matrix.
- Likelihood calculations took 7.98 seconds.
- Fitting model with 651 mixture components.
Warning in REBayes::KWDual(A, rep(1, k), normalize(w), control = control): estimated mixing distribution has some negative values:
consider reducing rtol
Warning in mixIP(matrix_lik = structure(c(0.000549917574123474,
0.107339052191479, : Optimization step yields mixture weights that are
either too small, or negative; weights have been corrected and renormalized
after the optimization.
- Model fitting took 19.68 seconds.
- Computing posterior matrices.
- Computation allocated took 0.69 seconds.
The log-likelihood of fit is
get_loglik(mash.model)
[1] 74461.66
Here is a plot of weights learned:
options(repr.plot.width=12, repr.plot.height=4)
barplot(get_estimated_pi(mash.model), las = 2, cex.names = 0.7)
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
Check XX covariance matrix:
x <- mash.model$fitted_g$Ulist[["ED_XX"]]
colnames(x) <- colnames(data$beta)
rownames(x) <- colnames(x)
corrplot(x, method='color', cl.lim=c(-0.2,1), type='upper', addCoef.col = "black", tl.col="black", tl.srt=45, col=colorRampPalette(c("blue","white","red"))(200))
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
layout(matrix(c(1,2,3,4), 2, 2, byrow=TRUE))
svd.out = svd(mash.model$fitted_g$Ulist[["ED_XX"]])
v = svd.out$v
colnames(v) = colnames(data$beta)
rownames(v) = colnames(v)
options(repr.plot.width=10, repr.plot.height=5)
for (j in 1:4)
barplot(v[,j]/v[,j][which.max(abs(v[,j]))], cex.names = 0.7,
las = 2, main = paste0("EigenVector ", j, " for XX"))
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
There are 4504 diferentially expressed genes.
Check pairwise sharing by sign:
x = get_pairwise_sharing(mash.model, factor=0)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- colnames(data$beta)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
Check pairwise sharing by magnitude and sign:
x = get_pairwise_sharing(mash.model)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- colnames(data$beta)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
Version | Author | Date |
---|---|---|
c74944f | zouyuxin | 2018-05-11 |
CompareSCZ = diag(5)
CompareSCZ[,2] = -1
CompareSCZ = CompareSCZ[-2,]
row.names(CompareSCZ) = colnames(data$beta)[-2]
mash.model.SCZ = mash.model
mash.model.SCZ$result = mash_compute_posterior_matrices(mash.model, mash.data, A=CompareSCZ, algorithm.version = 'R')
Check pairwise sharing by sign:
x = get_pairwise_sharing(mash.model.SCZ, factor=0)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- row.names(CompareSCZ)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
Check pairwise sharing by magnitude and sign:
x = get_pairwise_sharing(mash.model.SCZ)
x[x > 1] <- 1
x[x < -1] <- -1
colnames(x) <- row.names(CompareSCZ)
rownames(x) <- colnames(x)
corrplot.mixed(x, tl.pos="d",upper='color', cl.lim=c(0,1), upper.col=colorRampPalette(rev(c("#D73027","#FC8D59","#FEE090","#FFFFBF",
"#E0F3F8","#91BFDB","#4575B4")))(40),
tl.cex=1.2)
sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.2.1 corrplot_0.84 colorRamps_2.3 flashr_0.5-6
[5] plyr_1.8.4 mclust_5.4 mashr_0.2-6 ashr_2.2-7
[9] limma_3.34.9
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 pillar_1.2.2
[3] compiler_3.4.4 git2r_0.21.0
[5] workflowr_1.0.1 R.methodsS3_1.7.1
[7] R.utils_2.6.0 iterators_1.0.9
[9] tools_3.4.4 digest_0.6.15
[11] tibble_1.4.2 gtable_0.2.0
[13] evaluate_0.10.1 lattice_0.20-35
[15] rlang_0.2.0 Matrix_1.2-14
[17] foreach_1.4.4 yaml_2.1.19
[19] parallel_3.4.4 mvtnorm_1.0-7
[21] ebnm_0.1-11 stringr_1.3.0
[23] knitr_1.20 REBayes_1.3
[25] rprojroot_1.3-2 grid_3.4.4
[27] rmarkdown_1.9 rmeta_3.0
[29] magrittr_1.5 whisker_0.3-2
[31] scales_0.5.0 backports_1.1.2
[33] codetools_0.2-15 htmltools_0.3.6
[35] MASS_7.3-50 assertthat_0.2.0
[37] softImpute_1.4 colorspace_1.3-2
[39] stringi_1.2.2 Rmosek_8.0.69
[41] lazyeval_0.2.1 munsell_0.4.3
[43] doParallel_1.0.11 pscl_1.5.2
[45] truncnorm_1.0-8 SQUAREM_2017.10-1
[47] ExtremeDeconvolution_1.3 R.oo_1.22.0
This reproducible R Markdown analysis was created with workflowr 1.0.1