scMC learns biological variation through the alignment of multiple single-cell genomics datasets

Investigators

Qing Nie

Contact info (email)

qnie@uci.edu

1. Define context(s)

reveal new biological insights

Current Conformance Level / Target Conformance Level

Extensive

Primary goal of the model/tool/database

Distinguishing biological from technical variation is crucial when integrating and comparing single-cell genomics datasets across different experiments. Existing methods lack the capability in explicitly distinguishing these two variations, often leading to the removal of both variations. Here, we present an integration method scMC to remove the technical variation while preserving the intrinsic biological variation. scMC learns biological variation via variance analysis to subtract technical variation inferred in an unsupervised manner. Application of scMC to both simulated and real datasets from single-cell RNA-seq and ATAC-seq experiments demonstrates its capability of detecting context-shared and context-specific biological signals via accurate alignment.

Biological domain of the model

single-cell genomics datasets of various tissues

Structure(s) of interest in the model

Integration of multiple datasets

Spatial scales included in the model

N/A

Time scales included in the model

seconds to days

2. Data for building and validating the model

Data for building the model	Published?	Private?	How is credibility checked?	Current Conformance Level / Target Conformance Level
in vitro (primary cells cell, lines, etc.)
ex vivo (excised tissues)
in vivo pre-clinical (lower-level organism or small animal)	Yes	No	The model was built in an unsupervised way on unbiased single-cell genomics data.	Extensive
in vivo pre-clinical (large animal)
Human subjects/clinical	Yes	No	The model was built in an unsupervised way on unbiased single-cell genomics data.	Extensive
Other: ________________________

Data for validating the model	Published?	Private?	How is credibility checked?	Current Conformance Level / Target Conformance Level
in vitro (primary cells cell, lines, etc.)
ex vivo (excised tissues)
in vivo pre-clinical (lower-level organism or small animal)	Yes	No	By comparing to existing knowledge and ground truth.	Extensive
in vivo pre-clinical (large animal)
Human subjects/clinical	Yes	No	By comparing to existing knowledge and ground truth.	Extensive
Other: ________________________

3. Validate within context(s)

	Who does it?	When does it happen?	How is it done?	Current Conformance Level / Target Conformance Level
Verification	Postdocs/investigators	Throughout the project	The convergence of the core algorithms are verified.	Extensive
Validation	Postdocs/investigators	As the unsupervised model was established	1) The integrated data is validated by ground truth. 2) The imputed data by integration is evaluted by downstream analysis.	Extensive
Uncertainty quantification
Sensitivity analysis
Other:__________
Additional Comments

4. Limitations

Disclaimer statement (explain key limitations)	Who needs to know about this disclaimer?	How is this disclaimer shared with that audience?	Current Conformance Level / Target Conformance Level
When integrating time-course datasets, the temporal information is not used.	Scientific community who intends to apply this method to time-course scRNA-seq data with similar cell types across temporal points..	In discussion of the paper.	Adequate

5. Version control

Current Conformance Level / Target Conformance Level
Extensive

	Naming Conventions?	Repository?	Code Review?
individual modeler	Yes	Yes	Peer
within the lab	Yes	Yes	Peer
collaborators

6. Documentation

	Current Conformance Level / Target Conformance Level
Code commented?	Adequate
Scope and intended use described?	Extensive
User’s guide?	Extensive
Developer’s guide?	Adequate

7. Dissemination

Current Conformance Level / Target Conformance Level
Extensive

Target Audience(s):	“Inner circle”	Scientific community	Public
Simulations
Models
Software	R package and tutorials: https://github.com/amsszlh/scMC	R package and tutorials: https://github.com/amsszlh/scMC	R package and tutorials: https://github.com/amsszlh/scMC
Results	Paper and Github repo: https://github.com/amsszlh/scMC	Paper and Github repo: https://github.com/amsszlh/scMC	Paper and Github repo: https://github.com/amsszlh/scMC
Implications of results

8. Independent reviews

Current Conformance Level / Target Conformance Level
Insufficient

Reviewer(s) name & affiliation:
When was review performed?
How was review performed and outcomes of the review?

9. Test competing implementations

Current Conformance Level / Target Conformance Level
Extensive

	Yes or No (briefly summarize)
Were competing implementations tested?	Yes. The method has been compared to several other commonly used methods on benchmark datasets.
Did this lead to model refinement or improvement?	No

10. Conform to standards

Current Conformance Level / Target Conformance Level
Extensive

	Yes or No (briefly summarize)
Are there operating procedures, guidelines, or standards for this type of multiscale modeling?	Yes. There are several standard procedures for preprocessing scRNA-seq data.
How do your modeling efforts conform?	Common data preprocessing procedures are followed.