1. Define context(s)
reveal new biological insights
Primary goal of the model/tool/database
Single cell technologies provide an unprecedented opportunity to explore the heterogeneity in a biological process at the level of single cells. One major challenge in analyzing single cell data is to identify cell subpopulations, stable cell states, and cells in transition between states. To elucidate the transition mechanisms in cell fate dynamics, it is highly desirable to quantitatively characterize cellular states and intermediate states. Here, we present scRCMF, an unsupervised method that identifies stable cell states and transition cells by adopting a nonlinear optimization model that infers the latent substructures from a gene-cell matrix. We incorporate a random coefficient matrix-based regularization into the standard nonnegative matrix decomposition model to improve the reliability and stability of estimating latent substructures. To quantify the transition capability of each cell, we propose two new measures: single-cell transition entropy (scEntropy) and transition probability (scTP). When applied to two simulated and three published scRNA-seq datasets, scRCMF not only successfully captures multiple subpopulations and transition processes in large-scale data, but also identifies transition states and some known marker genes associated with cell state transitions and subpopulations. Furthermore, the quantity scEntropy is found to be significantly higher for transition cells than other cellular states during the global differentiation, and the scTP predicts the “fate decisions” of transition cells within the transition. The present study provides new insights into transition events during differentiation and development.
Biological domain of the model
scRNA-seq data of various tissues
Structure(s) of interest in the model
Cell (sub)populations, transition states
Spatial scales included in the model
cellular to tissue
Time scales included in the model
seconds to weeks
2. Data for building and validating the model
Data for building the model |
Published? |
Private? |
How is credibility checked? |
Current Conformance Level / Target Conformance Level |
in vitro (primary cells cell, lines, etc.) |
|
|
|
|
ex vivo (excised tissues) |
|
|
|
|
in vivo pre-clinical (lower-level organism or small animal) |
|
|
|
|
in vivo pre-clinical (large animal) |
Yes |
No |
The model was built in an unsupervised way on unbiased single-cell RNA sequencing data. |
Extensive |
Human subjects/clinical |
Yes |
No |
The model was built in an unsupervised way on unbiased single-cell RNA sequencing data. |
Extensive |
Other: ________________________ |
|
|
|
|
Data for validating the model |
Published? |
Private? |
How is credibility checked? |
Current Conformance Level / Target Conformance Level |
in vitro (primary cells cell, lines, etc.) |
|
|
|
|
ex vivo (excised tissues) |
|
|
|
|
in vivo pre-clinical (lower-level organism or small animal) |
|
|
|
|
in vivo pre-clinical (large animal) |
Yes |
No |
By comparing the model determined developmental trajectory and transition states to knowledge. |
Adequate |
Human subjects/clinical |
Yes |
No |
By comparing the model determined developmental trajectory and transition states to knowledge. |
Adequate |
Other: ________________________ |
|
|
|
|
3. Validate within context(s)
|
Who does it? |
When does it happen? |
How is it done? |
Current Conformance Level / Target Conformance Level |
Verification |
Students/postdocs/investigators |
Throughout the project |
1) The convergence of solution is guaranteed by formal theoretical analysis. 2) The clustering result agrees with prior knowledge of number of cell types. 3) The identified transition states are visually verified in low-dimensional plots. |
Extensive |
Validation |
Students/postdocs/investigators |
As the unsupervised model was established |
1) The clustering results and identified transition states are validated using reference cell type assignments. 2) The method is validated using simulated datasets with known groundtruth. 3) 3) The method was compared to several other popular methods on several clustering benchmarks and achieved top performance. |
Extensive |
Uncertainty quantification |
|
|
|
|
Sensitivity analysis |
Students/postdocs/investigators |
As the unsupervised model was established |
By tuning key parameters and comparing to annotated data. |
Adequate |
Other:__________ |
|
|
|
|
Additional Comments |
|
|
|
|
4. Limitations
Disclaimer statement (explain key limitations) |
Who needs to know about this disclaimer? |
How is this disclaimer shared with that audience? |
Current Conformance Level / Target Conformance Level |
The technical noise in single-cell RNA sequencing data might cause inaccuracy. |
Scientific community who intends to apply this method to raw scRNA-seq data. |
|
Adequate |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5. Version control
Current Conformance Level / Target Conformance Level |
Extensive |
|
Naming Conventions? |
Repository? |
Code Review? |
individual modeler |
Yes |
Yes |
Peer |
within the lab |
Yes |
Yes |
Peer |
collaborators |
Yes |
Yes |
Peer |
6. Documentation
|
Current Conformance Level / Target Conformance Level |
Code commented? |
Extensive |
Scope and intended use described? |
Extensive |
User’s guide? |
Extensive |
Developer’s guide? |
Partial |
7. Dissemination
Current Conformance Level / Target Conformance Level |
Extensive |
Target Audience(s): |
“Inner circle” |
Scientific community |
Public |
Simulations |
|
|
|
Models |
|
|
|
Software |
MATLAB package: https://github.com/XiaoyingZheng121/scRCMF |
MATLAB package: https://github.com/XiaoyingZheng121/scRCMF |
|
Results |
Shared folders |
Paper and tutorials |
|
Implications of results |
|
|
|
8. Independent reviews
Current Conformance Level / Target Conformance Level |
Insufficient |
Reviewer(s) name & affiliation: |
|
When was review performed? |
|
How was review performed and outcomes of the review? |
|
9. Test competing implementations
Current Conformance Level / Target Conformance Level |
Adequate |
|
Yes or No (briefly summarize) |
Were competing implementations tested? |
Yes. The method has been compared to several other commonly used methods on benchmark datasets. |
Did this lead to model refinement or improvement? |
Yes |