Plenary Session 1.2 – The Path to Model Credibility | Interagency Modeling and Analysis Group

Session Description: The session will present a brief history of model sharing, reproducibility, reuse, and the importance of credible practices for model development. The IMAG/MSM Consortium has played an important role in this evolution.

(10 mins) Intro, history, evolution - Herbert M Sauro

Past: Exchange languages, standards, open science, model repositories for sharing, and successes
COMBINE - consortium driving standards development

Power Point Slides

Background documents: https://co.mbine.org/

(30 mins) 10 Simple Rules - Jerry Myers, Lealem Mulugeta, and Ahmet Erdemir

Development and Future of the Ten Simple Rules for Credible Practice of Models and Simulations in Health Care. (https://docs.google.com/presentation/d/1b-jzfLmtNQhjw-Yr9kPbm6jveDf3Vh2…)

CPMS activities - 10 Simple Rules - checks for during the model development process, understanding where the model fails
https://www.imagwiki.nibib.nih.gov/content/10-simple-rules-conformance-…

Background documents: https://www.imagwiki.nibib.nih.gov/content/10-simple-rules-conformance-…

(25 mins) CRBM, Credible Models - Herbert M Sauro

Center for Reproducible Biomedical Models - moving toward model credibility

Power Point Slides

Background documents: https://arxiv.org/abs/2301.06007

(20 min Q&A) Panel members: Ahmet Erdemir, Jerry Myer, Lealem Mulugeta, Herbert Sauro

Session Moderators from left: Lealem Mulugeta, Herbert Sauro

Speaker Bios & Presentation Materials:

Herbert M Sauro: Herbert Sauro is a Professor at the University of Washington in the Department of Bioengineering in Seattle. He holds degrees from the UK in Biochemistry/Microbiology, Computational Biology, and a PhD in Systems Biology. His early career focused on the development of some of the first simulation tools in systems biology as well as doing significant work in metabolic control analysis. He moved to Caltech, Pasadena, in 2000, where he was a founding member of the SBML development team. Since then, he has been involved in kickstarting or helping to develop other widely used modeling standards such as SBOL, SBGN, and SED-ML. He was also one of the original contributors to the Biomodels repository. He teaches graduate and undergraduate classes in biological control theory and computational systems biology at UW and has published a number of textbooks in enzyme kinetics, pathway systems biology, and metabolic control analysis, as well as a number of math books and hands-on computer science books. He is currently the Director of the NIBIB Center for Reproducible Biomedical Modeling.

Ahmet Erdemir: Trained as a mechanical engineer at the Middle East Technical University, Ankara, Turkey; Ahmet Erdemir later pursued graduate work in biomechanics - first, in the Department of Mechanical Engineering, Middle East Technical University, then in the Department of Kinesiology, Pennsylvania State University. After completing his graduate studies, in 2002, he was recruited to the Department of Biomedical Engineering, Cleveland Clinic, where he established a strong research program in computational biomechanics. Currently, Dr. Erdemir is an Associate Staff in the Department of Biomedical Engineering, Chief Scientist of the Cleveland Clinic – IBM Discovery Accelerator, and the Interim Director of Office of Research Development at the Cleveland Clinic. His scholarly contributions are predominantly in the area of musculoskeletal biomechanics, specifically on the explorations of multiscale deformations and movements of the knee. His translational focus has been on establishing computational modeling as a routine, reliable and efficient tool for healthcare delivery and biomedical science. Thus, his team conducted many simulation studies to accelerate the development and evaluation of medical devices in musculoskeletal, cardiovascular, and neurological domains. Dr. Erdemir’s science of science research for developing and demonstrating good practices to enhance credibility, reproducibility, and reusability in modeling and simulation. These activities range from promotion and practice of sharing modeling assets and their incorporation to scientific review. Dr. Erdemir has led many community initiatives to establish guidance on reporting simulation studies and to codify credible practices in modeling and simulation in healthcare. Dr. Erdemir's activities have been continuously supported by federal funding agencies, including National Institutes of Health and the Department of Defense.

Lealem Mulugeta: Lealem is a Co-founder of the Committee on Credible Practice of Modeling & Simulation in Healthcare (CPMS).

Jerry Myers: Dr Myers currently serves as project scientist to the NASA Human Research Program’s Crew Health and Performance Probabilistic Risk Analysis project, which seeks to improve utilization of computational modeling to enhance risk quantification and interrogate physiological effects of space flight. Dr. Myers received his doctoral degree in Mechanical Engineering, with specialization in biofluid dynamics, from the University of Alabama in 1996. With twenty-five years of experience in computational modeling and credibility assessment of models and simulations, Dr. Myers also serves as a founding member and current co-chair of the NIH/NIBIB-sponsored Interagency Modeling and Analysis Group’s Committee on Credible Practice of Modeling & Simulation in Healthcare.

Topics for Discussion:

How to improve the credibility of biomedical models

Just a side note: 50% reproducibility appears to compare well with the percent of reproducible experimental papers

Comment

I'd love to have a discussion about how model credibility and model/parameter uncertainty can be handled together? I would think uncertainty quantification should be included in the model analysis (although maybe it doesn't necessarily fit into "model credibility" per se).

Comment

Yes Let's. We do consider that uncertainty quantification is an incredibly (pun intended) important part of the credible practice for all M&S. In many cases, as important as V&V. So in our view it should always be handled together. There is a challenge in that model,parameter, and local uncertainty are often conflated, so any discussion that help elucidate the important distinctions and how to bring them into the workflow is welcome.

Comment

The credibility practices provided are fantastic for the development for the use of models for the forward problem (e.g., simulation). How do we develop or incorporate credibility practices into the inverse problem (e.g., model calibration, accounting for error correlation or model discrepancy, etc)?

Comment

My first blush at this is to suggest that its similar to the forward problem, it is the outcomes of the model use (i.e. the context of use) that guides how the credible practices approaches are tailored. For instance - Model calibration is often considered a part of testing and validation (at least in the engineering application domain). In an inverse problem). An inverse problem will still require verification, version control and review by the community, as well as pedigree of the information that is applied. In that sense, it may not be the same process for representing credible practice as in a forward problem, however, the categories of credible practice information is still very applicable, even though the approach may need tailored to the environment.

Comment

How do we approach rules 2 and 3 in a significantly data-limited environment? We may have physiologically-based mechanisms within a model, but verification & validation of the model using real life data/evidence, especially for intended use cases, can be challenging. Are there recommendations or guidelines for how to approach verification and validation in these cases?

Comment

Regarding the above comment/question: rules 2 and 3 refer to the 10 Simple Rules for Evaluating Model Credibility. Rule 2: "Use appropriate data", and Rule 3: "Evaluate within context"

Comment

10 Simple rules also refer to relevant frameworks specific to the a given rule, e.g. ASME V&V40 that provides deeper elaboration on framing the balance.

Comment

This is a great question because there are novel systems that it is not easy to acquire data from those systems to help us understand their behaviors. Therefore, models can be powerful tools to help us further gain insight on the behaviors of those systems. In other words, the models are built to be the data sources, and therefore it is very difficult to build and/or validate those models using directly measured data.

NASA runs into this a lot, and the way it has been dealt with includes:

Using data from analog setting (e.g. animal, cell and bed rest experiments)
Making extensive use of independent subject matter experts to review and weigh in on the scientific plausibility of the model predictions

So one possible way to address this challenge is to definitely use as much relevant data to build and validate your models. But make significant efforts in the verification, followed by uncertainty quantification and sensitivity analysis of the models to demonstrate the computational robustness of your models. Following this, make significant use of independent reviews (Rule 8) to have your models tested by subject matter experts to:

substantiate the credibility to the prediction capabilities of your models, and
refine your models

When it comes to using your models for their intended application, it can be very beneficial to apply Rule 8 again to substantiate the credibility of your simulations (i.e. model solutions/results).

The reason why I recommend applying Rule 8 for both the model and the simulations is because credible practices should be followed for both the model and simulation. This is especially important for models that represent systems that are not data rich.

Comment

A bit tangential to this topic, but I am curious about current standards, infrastructure, and research around data repositories. Are there guidelines or templates that are being developed and used to guide organization, metadata management, and documentation?

Comment

FAIR principles for data stewardship is also applicable to modeling and simulation assets. That said workflow management and its seamless and deep documentation is a challenge and have nuances depending on the domains.

Comment

Neat study on the compliance with the 10 rules across a lot of MSM credibility plans.

Q: were the two reviewers' scores/evaluations well correlated with each other?

Comment

We have data to explore this deeper. This has also depended on the nature of study and its communication as well, i.e. when the information is lacking, what does the reviewer assume?

Comment

We have data to explore this deeper. This has also depended on the nature of study and its communication as well, i.e. when the information is lacking, what does the reviewer assume?

Comment

https://www.imagwiki.nibib.nih.gov/content/10-simple-rules-conformance-…

Many of my potential grant applicants find this form useful to help them shape their grant application so it is more understandable by the grant reviewer, and helps them better communicate their model development process in general. Good for publication writing too!

One of the biggest challenges for peer review is to actually understanding the model from the context of the PI (otherwise they infer and fill in the blanks based on their own experiences!)

Comment

How do we know when we've done enough to establish model credibility? Are there target conformance levels that modelers should aim to achieve (e.g., is "adequate" enough?), or is this primarily dependent on the end user's wants/needs?

Comment

This is a context of use dependent challenge. We recognized that Rule 1 - Define Context of Use indicates a level of intensity for implementation of other rules depending on what is at stake.

Comment

How do we engage stakeholders in creating and maintaining standards? How do we encourage the use of these standards?

Comment

Agree credibility is very important not just for reproducibility but also utility to repurpose for uses that were not anticipated.

Comment

I think that we are NOT supported to make out modelers reusable- that takes time and effort to put it in a sharing format.

SO, we prefer collaboration ---- i think that is a better way in the case of complex models.

Also--wetlab experiments ar NOT re-performed when those papers are reviewed...why are we holding models to a different more difficult review standard?

Comment

That is a great point, the challenge is collaboration does not scale when models need to be seamlessly integrated together and continuously monitored and modified, e.g. as in digital wins. Thus, we SHOULD be supported to make model's usable or even better to create the tools that can lower the burden to make them reusable.

Comment

yes, I agree...we should be supported to do this or give to a clearinghouse that does that.

Comment

Agreed, need to assume that half of the wet lab data is wrong

Comment

Thank you for your great talks. In addition to the provide example on bone fracture I would be interested in learning more about other examples of where the Ten Simple Rules have been used to achieve model credibility. Are there other examples that could be shared broadly to the group?

Comment

Ten Rules manuscript provides pointers to other examples including the details on bone fracture (they are part of supplementary material)

https://doi.org/10.1186/s12967-020-02540-4

Comment

Reproducibility can be a difficult bar when involving long simulation times, large unwieldy datasets, complex software dependencies, etc.
Can replicating the findings and reaching the same conclusions be more suitable in these edge cases?

Comment

To clarify, reproducing is getting exact results and replicating is getting not exact-matching results but still reaching the same conclusion.

Comment

It is a possibility, as long as how such an approach fits in the context of use and de-risks the impact of decisions made by the use of the model. I should also note that we may consider this, if end-user goes to different modelers for building and using a model of the same phenomena for the same context and if all give different answers, which one should we rely on? In my mind, this is similar to going to different doctors to get different opinions for a diagnosis.

Comment

Also reproducibility as you define it sets the resolution of the workflow as a source of uncertainty.

Comment

To add, we did an interesting experiment for five modeling teams to start with the same data to build a model for the same context of use.

https://doi.org/10.1115/1.4050028

Comment

Ah a problem near and dear to my work. FWIW Astronaut data is VERY sparse, so lots of challenges with validation.

There is not simple answer. Credibility is not really a "grade for the model" its more a representation of model testing, evaluation rigor with respect to what is achievable and what is needed for it intended use. So let's say the M&S is used to gain insight into physical mechanisms of the physiological system, then the validation is likely at a gross level i.e. measurable large scale response. In that case gaps may be analyzed by evaluating alternative modeling hypothesis. It may be that the validation will remain low until more information for a referent can be obtained. At that point its with respect to the gap in the real world, the referent and the model results.

Would love to discuss this more and see what ideas bubble up!

Comment

I would say that one should think about this as model and simulation credibility. In that sense, credibility is a part of the entire life cycle of development and during tis use history. In a latter the credibility is a quality of the model simulation within the context of that current use. We recommend it not be treated as a check box , that ones model is credible from some point to infinity.

Comment

Are there any model types that may be difficult to code in SBML? What alternatives would you recommend for those cases?

Comment

The semantic layer IS THE MODEL. Everything else is just implementation details.

Model credibility really should be done on the SEMANTIC LAYER. (Sauro talked ONLY about validation on the code).

Comment

Can the approaches used in MEMOTE be applied in areas outside of Systems Biology (and for files other than SBML)? If so, what would be required to do so? And is there documentation available on how MEMOTE was built and/or how it works?

Comment

Just a side note: 50% reproducibility appears to compare well with the percent of reproducible experimental papers, based on a quick search.