1. Define context(s)
reveal new biological insights
Primary goal of the model/tool/database
Characterizing genome-wide binding profiles of transcription factors (TFs) is essential for understanding biological processes. Although techniques have been developed to assess binding profiles within a population of cells, determining them at a single-cell level remains elusive. Here, we report scFAN (single-cell factor analysis network), a deep learning model that predicts genome-wide TF binding profiles in individual cells. scFAN is pretrained on genome-wide bulk assay for transposase-accessible chromatin sequencing (ATAC-seq), DNA sequence, and chromatin immunoprecipitation sequencing (ChIP-seq) data and uses single-cell ATAC-seq to predict TF binding in individual cells. We demonstrate the efficacy of scFAN by both studying sequence motifs enriched within predicted binding peaks and using predicted TFs for discovering cell types. We develop a new metric “TF activity score” to characterize each cell and show that activity scores can reliably capture cell identities. scFAN allows us to discover and study cellular identities and heterogeneity based on chromatin accessibility profiles.
Biological domain of the model
ATAC-seq, DNA sequencing, and ChIP-seq of various tissues
Structure(s) of interest in the model
transcription factor binding
Spatial scales included in the model
N/A
Time scales included in the model
seconds to days
2. Data for building and validating the model
Data for building the model |
Published? |
Private? |
How is credibility checked? |
Current Conformance Level / Target Conformance Level |
in vitro (primary cells cell, lines, etc.) |
|
|
|
|
ex vivo (excised tissues) |
|
|
|
|
in vivo pre-clinical (lower-level organism or small animal) |
Yes |
No |
The model was built in an unsupervised way on unbiased on input data. |
Extensive |
in vivo pre-clinical (large animal) |
|
|
|
|
Human subjects/clinical |
Yes |
No |
The model was built in an unsupervised way on unbiased on input data. |
Extensive |
Other: ________________________ |
|
|
|
|
Data for validating the model |
Published? |
Private? |
How is credibility checked? |
Current Conformance Level / Target Conformance Level |
in vitro (primary cells cell, lines, etc.) |
|
|
|
|
ex vivo (excised tissues) |
|
|
|
|
in vivo pre-clinical (lower-level organism or small animal) |
Yes |
No |
By comparing to existing knowledge, assessing the quality of resulting cluster, and the ability of alleviating batch effects. |
Adequate |
in vivo pre-clinical (large animal) |
|
|
|
|
Human subjects/clinical |
Yes |
No |
By comparing to existing knowledge, assessing the quality of resulting cluster, and the ability of alleviating batch effects. |
Adequate |
Other: ________________________ |
|
|
|
|
3. Validate within context(s)
|
Who does it? |
When does it happen? |
How is it done? |
Current Conformance Level / Target Conformance Level |
Verification |
Students/postdocs/investigators |
Throughout the project |
Monitoring convergence of machine learning models. |
Extensive |
Validation |
Students/postdocs/investigators |
As the unsupervised model was established by training on given data. |
1) Validated the accuracy on bulk data with ground truth. 2) Single-cell TF predictions are consistent with enrichment analysis. 3) Clustering of cells is improved with scFAN predictions. |
Extensive |
Uncertainty quantification |
|
|
|
|
Sensitivity analysis |
|
|
|
|
Other:__________ |
|
|
|
|
Additional Comments |
|
|
|
|
4. Limitations
Disclaimer statement (explain key limitations) |
Who needs to know about this disclaimer? |
How is this disclaimer shared with that audience? |
Current Conformance Level / Target Conformance Level |
There is limited availability of ChIP-seq TF binding data. |
Scientific community who needs full coverage of all TFs in human.. |
In discussion of the paper. |
Adequate |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5. Version control
Current Conformance Level / Target Conformance Level |
Extensive |
|
Naming Conventions? |
Repository? |
Code Review? |
individual modeler |
Yes |
Yes |
Peer |
within the lab |
Yes |
Yes |
Peer |
collaborators |
Yes |
Yes |
Peer |
6. Documentation
|
Current Conformance Level / Target Conformance Level |
Code commented? |
Partial |
Scope and intended use described? |
Extensive |
User’s guide? |
Extensive |
Developer’s guide? |
Partial |
7. Dissemination
Current Conformance Level / Target Conformance Level |
Extensive |
Target Audience(s): |
“Inner circle” |
Scientific community |
Public |
Simulations |
|
|
|
Models |
|
|
|
Software |
Python package: https://github.com/sperfu/scFAN/ |
Python package: https://github.com/sperfu/scFAN/ |
Python package: https://github.com/sperfu/scFAN/ |
Results |
Paper and Github repo: https://github.com/sperfu/scFAN/ |
Paper and Github repo: https://github.com/sperfu/scFAN/ |
Paper and Github repo: https://github.com/sperfu/scFAN/ |
Implications of results |
|
|
|
8. Independent reviews
Current Conformance Level / Target Conformance Level |
Insufficient |
Reviewer(s) name & affiliation: |
|
When was review performed? |
|
How was review performed and outcomes of the review? |
|
9. Test competing implementations
Current Conformance Level / Target Conformance Level |
Extensive |
|
Yes or No (briefly summarize) |
Were competing implementations tested? |
Yes. The method has been compared to several other commonly used methods on benchmark datasets. |
Did this lead to model refinement or improvement? |
No |