Last updated: 2018-10-20
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Repository version: e6c9242
wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.Rhistory
Ignored: docs/.DS_Store
Ignored: docs/figure/Test.Rmd/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | e6c9242 | zouyuxin | 2018-10-20 | wflow_publish(“analysis/mouthwash.Rmd”) |
html | b91f3b0 | zouyuxin | 2018-10-19 | Build site. |
Rmd | 6d740f4 | zouyuxin | 2018-10-19 | wflow_publish(“analysis/mouthwash.Rmd”) |
html | ee0595d | zouyuxin | 2018-10-19 | Build site. |
Rmd | bd82922 | zouyuxin | 2018-10-19 | wflow_publish(“analysis/mouthwash.Rmd”) |
html | 6e9ef9c | zouyuxin | 2018-10-19 | Build site. |
Rmd | fcddf6f | zouyuxin | 2018-10-19 | wflow_publish(“analysis/mouthwash.Rmd”) |
html | 8d6944f | zouyuxin | 2018-10-19 | Build site. |
Rmd | a3ea46e | zouyuxin | 2018-10-19 | wflow_publish(“analysis/mouthwash.Rmd”) |
The factor augmented regression model is:
\[ \mathbf{Y}_{N\times J} = \mathbf{X}_{N\times k}\boldsymbol{\beta}_{k\times J} + \mathbf{Z}_{N\times q}\boldsymbol{\alpha}_{q\times J} + \mathbf{E}_{N\times J} \]
where \(y_{ij}\) is the normalized expression level of gene j in sample i; \(\mathbf{X}\) is a matrix containing observed covariates, with \(\boldsymbol{\beta}\) a matrix of corresponding effects; \(\mathbf{Z}\) is a matrix of unobserved factors causing unwanted variation, with \(\boldsymbol{\alpha}\) a matrix of corresponding effects; and \(\mathbf{E}\) has independent (Gaussian) errors with means 0 and column-specific variances var(\(e_{ij}\)) = \(\sigma_{j}^{2}\). Only \(\mathbf{Y}\) and \(\mathbf{X}\) are known; other quantities are to be estimated.
Let’s assume k = 1 for easy understanding \[ \mathbf{Y}_{N\times J} = \mathbf{x}_{N\times 1}\boldsymbol{\beta}_{1\times J} + \mathbf{Z}_{N\times q}\boldsymbol{\alpha}_{q\times J} + \mathbf{E}_{N\times J}\\ \mathbf{y}_{\cdot j} = \mathbf{x}\beta_{j} + \mathbf{Z}\boldsymbol{\alpha}_{j} + \mathbf{e}_{j} \]
We can decompose \(\mathbf{x}\) using QR decomposition, \(\mathbf{x} = \mathbf{Q}_{N\times N}R_{N\times 1}\), Q is an orthogonal matrix, R = \((r, 0, \cdots, 0)^{T}\) \[ \mathbf{Q}^{T}\mathbf{Y}_{N\times J} = \mathbf{R}_{N\times 1}\boldsymbol{\beta}_{1\times J} + \mathbf{Q}^{T}\mathbf{Z}_{N\times q}\boldsymbol{\alpha}_{q\times J} + \mathbf{Q}^{T}\mathbf{E}_{N\times J} \\ \tilde{\mathbf{Y}}_{N\times J} = \mathbf{R}_{N\times 1}\boldsymbol{\beta}_{1\times J} + \tilde{\mathbf{Z}}_{N\times q}\boldsymbol{\alpha}_{q\times J} + \tilde{\mathbf{E}}_{N\times J} \]
\[ \mathbf{y}_{1}^{T} = r\boldsymbol{\beta}_{1\times J} + \tilde{\mathbf{z}}_{1\times q}\boldsymbol{\alpha}_{q\times J} + \tilde{\mathbf{e}}_{1\times J} \\ \mathbf{Y}_{-1}^{T} = \tilde{\mathbf{Z}}_{N-1\times q}\boldsymbol{\alpha}_{q\times J} + \tilde{\mathbf{E}}_{N-1\times J} \\ \]
If we estimate \(\boldsymbol{\beta}_{1\times J}\) by \(\mathbf{y}_{1}^{T}/r\), it is same as the OLS estimates of \(\beta_{j}\) obtained by regressing column \(\mathbf{y}_{j}\) on \(\mathbf{x}\).
For each gene j (\(j = 1, \cdots, J\)) obtain an initial estimate \(\hat{\beta}_{j}\) for \(\beta_{j}\) ignoring unwanted variation by using ordinary least squares (OLS) regression of the jth column of Y on X. \[ \hat{\beta}_{j} = \frac{\mathbf{x}^{T}\mathbf{y}_{\cdot j} }{\mathbf{x}^{T}\mathbf{x}} \] We have \(\hat{\boldsymbol{\beta}} \in \mathbb{R}^{J}\).
Perform FA on the residuals \(\tilde{\mathbf{Y}} = \mathbf{Y} - \mathbf{X}\hat{\boldsymbol{\beta}}\). The FA model assumes column specific variance \(\sigma_{j}^{2}\) \[ \tilde{\mathbf{Y}} = \tilde{\mathbf{Z}}\tilde{\boldsymbol{\alpha}} + \tilde{\mathbf{E}} \] We obtain \(\hat{\boldsymbol{\alpha}} \in \mathbb{R}^{q\times J}\) and \(\hat{\sigma}_{j}^{2}\).
A simplified model: \[ \hat{\beta}_{j} \sim N(\beta_{j} + \hat{\boldsymbol{\alpha}}_{j}^{T}\mathbf{z}, \xi\hat{s}_{j}^{2}) \] \(\mathbf{z} \in \mathbb{R}^{q}\), \(\mathbf{S} = diag(\hat{s}_{1}^{2}, \cdots, \hat{s}_{J}^{2})\) \[ \hat{s}_{j}^{2} = \frac{\hat{\sigma}_{j}^{2}}{\mathbf{x}^{T}\mathbf{x}} \] The interpretation for the model above: the OLS estimates \(\hat{\boldsymbol{\beta}}\) are equal to the true coefficients (\(\boldsymbol{\beta}\)) plus a bias term due to unwanted variation (\(\hat{\boldsymbol{\alpha}}\), \(\mathbf{z}\)) plus some noise (\(N_J(0, \mathbf{S})\)). That is \(\mathbf{z}\) can be interpreted as capturing the effect of the unwanted variation on the OLS estimates. \ Estimate the unimodel effects distribution g and the unwanted variation effects \(\mathbf{z}\), multiplicative parameter \(\xi\) by maximum (marginal) likelihood \[ \begin{align*} (\hat{g}, \hat{\mathbf{z}}, \hat{\xi}) &= \arg \max_{(g,\mathbf{z}, \xi)\in \mathcal{U}\times \mathbb{R}^{q} \times \mathbb{R}^{+}} p(\hat{\boldsymbol{\beta}}|g,\mathbf{z},\hat{\boldsymbol{\alpha}}, \hat{\mathbf{s}})\\ &= \arg \max_{(g,\mathbf{z}, \xi)\in \mathcal{U}\times \mathbb{R}^{q} \times \mathbb{R}^{+}} \prod_{j=1}^{J}\int_{\beta_{j}} N(\hat{\beta}_{j}|\beta_{j}+\hat{\boldsymbol{\alpha}_{j}}^{T}\mathbf{z}, \xi\hat{s}_{j}^{2}) g(d\beta_{j}) \end{align*} \] Suppose \[ g(\beta_{j}) \sim \pi_{0}\delta_{0} + \sum_{m=1}^{M} \pi_{m}N(\beta_{j}; 0, \sigma_{m}^{2}) \] \[ (\hat{g}, \hat{\mathbf{z}}, \hat{\xi}) = \arg \max_{(g,\mathbf{z}, \xi)\in \mathcal{U}\times \mathbb{R}^{q}\times \mathbb{R}^{+}} \prod_{j=1}^{J} \sum_{m=0}^{M} \pi_{m} N(\hat{\beta}_{j}|\hat{\boldsymbol{\alpha}_{j}}^{T}\mathbf{z}, \xi\hat{s}_{j}^{2} +\sigma_{m}^{2}) \]
Compute posterior distributions \(p(\beta_{j}|\hat{g}, \hat{\mathbf{z}}, \hat{\boldsymbol{\beta}}, \hat{\mathbf{s}}), \hat{\xi}\), and return posterior summaries.
This reproducible R Markdown analysis was created with workflowr 1.1.1