Standards for Publication and Distribution of Biophysical Models
A. Background:
The purpose of this document is to provide a beginning for a group effort centered on defining standards for modeling. The particular approach outlined below as “Standards for Class 4 biophysical models” is better suited to cell and organ models than to large scale, computationally complexly structured anatomic and physiological models. The coordination of the review and development of standards for cellular systems is being taken on by groups 2 and 4, hopefully with input from all other groups. Working version:Media:Standards.list.jul08.xls
The representation of complex physiological systems involves more than making experimental and clinical observations, and requires a means of forming the ideas and observations into a self-consistent whole. While diagrammatic schemas provide a useful, often essential, beginning to understanding, it is now evident that quantitative reasoning is best aggregated in the form of mathematical models, which then serve as precisely stated hypotheses that are testable by specific and demanding experiments. Models might use stochastic equations, partial differential equation (PDE), ordinary differential equation (ODE) or multi-scale form.
It is a part of the multiscale consortium (IMAG) effort to share models and tools for developing and maintaining models. Our Simulation Resource website, www.physiome.org, provides a means to disseminate running, documented models derived from our own and other sources such as the CellML and SBML (Cell and Systems Biology Mark-up Languages) libraries for free public download. It is part of our goal, since we have translators of SBML and CellML to MML for use in JSim, to make more of these models from archival sources more readily usable. The Biomodels database, www.ebi.ac.uk/biomodels, is another good source of curated models. These are mostly small models, less than 100 ODEs, but some of out convection-diffusion-reaction models can be expanded at run time to over 100,000 ODE equivalents.
A software package that we are developing, JSim, provides facilities for model construction, comparisons among multiple models, graphics (still primitive), data analysis using sensitivity analysis and optimization, solutions to ODEs and PDEs, and convenient database capability as project files (.proj) containing parameter sets for multiple model and multiple sets of experimental data. JSim is a freely available software package running on Linux, Macintosh OSX and Windows (at www.physiome.org). It uses a Mathematical Modeling Language (MML) to directly represent physiological processes in mathematical form, and can use symbolic representation, a biological component language (BCL), which can be expanded to allow icon-based programming. For computation, the MML is parsed and compiled in Java, thereby becoming independent of the particular computational platform. There is no connection between JSim and the establishment of a set of standards. (JSim does allow unit balance checking on all equations, thereby aiding in fulfilling one conservation item.)
Our lab provides a curated website of a growing number of physiological models in the form of downloadable project files, each of which contains source code, parameters sets with values and units, illustrative graphs of model solutions, configuration files for controllers and numerical methods, iterations, solver choices, etc. Only a few of these are truly up the standards suggested below. In addition, the website allows one to post models for critiques and dissemination through the individual Wiki where comments are encouraged. This is a mode of dissemination.
Almost all of seemingly simple models evolve into multiscale questions: the gross descriptions of what is occurring may well be modeled successfully at a descriptive higher level, but the explanations inevitably lie at the cellular and molecular levels, thereby necessitating multiscale models (Bassingthwaighte, Chizeck, Atlas, 2005, 2006). Structuring such models is accomplished most readily and effectively using a modular development, introducing the problems of linking modules, finding common semantic definitions for variable and parameters, and casting the modules into a program wherein the variables are in common. Adherence to a single ontology for naming components and variable and parameters will necessarily become a part of any standard. Composition of higher level models from well-characterized modules can almost be automated. The modular composition is thus advantageous in construction but is also a barrier to simplification or reduction into higher level, more integrated models; often multiple modules from a lower level are best reconstructed into a single modular component at the higher level, a process that so far cannot be automated. Tougher issues in multiscale modeling are in evolving controls for semi-automated shifting from one level of modularity or simplicity to another depending on the biological conditions constraining the model (Bassingthwaighte, Chizeck Atlas, IEEE, 2006).
Model preparation: This is a critically important component of the work. Almost no models in the literature are complete or even reproducible. Two exceptions to this are the Hodgkin-Huxley 1952d model which is completely reproducible, worthy of a Nobel Prize, even though mass conservation for Na or K was not included. The other is the WRJ cardiac action potential (Winslow et al 2000), but this took some prepublication collaboration to correct small errors in the paper and to get 100% of the parameter values; the result was joint publication in Circ Res simultaneous with release on our website at 4 PM on a Thursday, a pioneering step for AHA and for us. Both of these and other electrophysiology models, earlier and more recent, are freely available at www.physiome.org/Models. It is the goal of this section to round out sets of models in important target areas so that they can be used as teaching tools that extend right up to the research forefront. Exemplary target areas in cardiovascular topics are integrated models of the cardiopulmonary circulations and respiratory exchange suitable for not only gas exchange from inhaled air to tissues but provide pH regulation, hemoglobin binding of O2 and CO2 and H ion, circulatory and respiratory responses to changes in inhaled gas composition, blood loss, exercise and other interventions. We have a good start on this: see www.phsyiome.org/Models/ and under Integrative Physiology bring up “Highly-integrated human with interventions”, a complex model which can be run on a laptop, and for which there are several reduced level components (Neal and Bassingthwaighte, in review; Kerckhoffs et al, 2007). Web Wiki modeling for dissemination: Templates for the modeling website are models currently at www.physiome.org/Models. They have names, key words, descriptions, equations, references, Java applets to run the model and show graphical results and allow an observer to change parameters, display other variables and run other solutions. One can also download JSim and a model and run it on one’s own machine under Windows, Mac OSX, Linux or most UNIX systems. Each of these models takes substantial time to prepare, especially when the equation list is long, and the number of preset graphs is large. Currently we have no tutorials for these applets, but MUST develop them. Only a few meet Class 4 standards.
B. Modeling Standards:
Preamble: To support the goal of establishing sets of modeling resources for the scientific community and making models available for research and education, it is vital to establish idealized or even standardized definitions of the materials to be provided to the user community. Models and the databases that support them are the common basis of understanding of quantitative and qualitative physiology. Models are working hypotheses, summarizing the integrated concept or framework for a body of observations. The best models are those that provide solutions that describe data sets coming from many and varied experiments. In order maximize the rate of advancement the science; these models should be available as working hypotheses, aiding in experiment design, and available publicly for commentary, augmentation, correction, and disproof. The understanding of biological systems is inevitably both evolutionary and graded. One begins with sets of observations, and tries to develop insight into the processes giving rise to them. At the gene and protein level one begins with a set of objects and tries to fathom how they may relate to each other, and to discover what their roles might be in the behavior of a cell, tissue, organ or organism. At the physiological level, one begins with measurements of variables (pressures, concentrations, masses, volumes, etc.) and physical and chemical properties (elastance, conductance, stiffness, density, etc.) and tries to define the interrelationships amongst these that best describe the observed behavior.
The classes of models might be described as:
(Class 1) Lists of observed objects, categorized into groups suspected of being related,
(Class 2) Linkage maps of proposed relationships, sometimes with cause-and-effect directionality,
(Class 3) kinetic descriptions of relationships between variables developed far enough that model solutions fit observed time course behavior, and
(Class 4) Biophysically based models that appear to provide an “explanation”, demonstrating both good fitting of model solutions to data, and being established on a thermodynamically solid base.
While one might regard class 4 models as the epitome of scientific success, there may be no biophysical model that does not at some point incorporate a class 3 descriptor of some component.
A fundamental limitation in physiological modeling is that there is no gold standard, no perfect and complete description of the biology. The models are all evolutionary transients in the advancements of the science, yielding or falling to better data and better ideas on how the system works. The experimental data drive the progress, in the sense of being the closest to the truth, and forcing the next step, as TH Huxley put it, “The great tragedy of science: the slaying of a beautiful hypothesis by an ugly fact.” Ideally, the current “best“ model is the one with the greatest utility in aiding in the design of the experiment that leads to its own destruction. Platt’s (1964) admonition was, given that science advances by disproof, to devise a second, realistic yet feasible, alternative hypothesis, and then to devise the experiment that distinguishes between the two hypotheses; if both are honest-to-goodness viable hypotheses, then the experiment must disprove one and so advance knowledge. Adherence to Platt’s principles will end the days of me-too science. As the models of physiological systems become more complex, then there arises a contradiction, namely that the more explain more and thereby can be used to arm our intuition about the system, their very complexity handicaps our ability to understand them, and to contradict them. The ability to go beyond the working hypothesis requires knowing it intimately, and understanding its behavior in simplified terms. Therefore it becomes ever more important to provide the working hypotheses to the investigator community not only as functioning models, but also to provide reduced version of them that illustrate particular aspects of their behavior. Then the models, both complex and simplified, can be used to design experiments to bring about their downfall.
Achieving the goals of the Physiome Projects, centered on different problems in different species at different levels, will require retrieving archived models, data sets, parameter sets, and full documentation from open databases in order to make real headway. In the following we define goals and standards for models, setting them forth boldly in a form asking for improved definition.
Objective Classification of Models
Class 4. Biophysically-based models, most suitable for cell and organ system modeling
Here we begin with a draft version of the expected characteristics of a class 4, biophysically based, models suitable for archiving in a Physiome Database and for publication, with the goal of making the model available as a completely documented, verified and validated model, a working hypothesis fully described so that it may be challenged by investigators around the world. In addition, since there probably exists no globally complete model of any biological system and its environment, there should be a list of the assumptions or conditions under which the model is seemingly correct, and which may be considered as defining constraints on the conditions listed next.
Class 4, biophysically based models should have:
1. Unitary Balance – exact balancing of units in all equations.
2. Mass Balance – total conservation of mass of individual components. (Conservation of volume should follow from this if all partial molar volumes are known.)
3. Charge Balance – accounting for charge transfer across membranes and for membrane potentials and Donnan equilibria.
4. Osmotic Balance – accounts for water and solute fluxes in transient and steady states
5. Thermodynamic balance – obeys Haldane constraints for reactions, and has energy balance
Demonstration of validity in terms of describing physiological data:
6. Initial conditions: consistent with a physiological steady state (constant or oscillatory)
7. Provision, for public download, of data that are to be fitted by the model, and from which sets of parameters can then be defined through good fits of model to data. Provision also of data sets which cannot be fitted, and therefore serve as challenges to the model.
8. The results of fitting variegated data sets, showing applicability of the model to high quality experimental data from different sources and of different sorts.
9. Parameter justification and evaluation: Since not all of a model’s parameters are necessarily determined via the fitting of the selected data sets, the others should be justified through citations, solutions, etc., choosing references to prior publications that provide best characterization. The parameters determined via model analysis should be described by estimated means and confidence ranges.
Verification: Demonstration that the mathematical expressions defining the model is complete and that the computational method used provided correct solutions to the mathematics:
10. Equations must be mathematically correct and complete, with unitary balance, with initial and boundary conditions defined, and with explicit definitions, units, and unambiguous descriptions of each parameter.
11. Running code is supplied in some reasonably commonly used form. The code should exhibit:
numerical solutions matching appropriate reduced cases having analytical solutions, etc.
run correctly with no, or at least little, dependence on step size
run from varied initial conditions to appropriate steady states
run on more than 1 platform
Documentation: Each model should be accompanied by:
12. A peer reviewed publication, or equivalent, providing a full description, with the validation.
13. A description of its phylogenetic heritage and historical and contemporary setting.
14. Documentation with references for parameter values, appropriate to the species, age, sex, etc.
15. A description of model components or submodels and their sources, if applicable.
16. A description of models incorporating this model into a larger more integrated system, illustrating the position of this model in the hierarchy.
Obeisance to Good Modeling Practices: (Needs editing and amplification.) List the following:
1. Fundamental assumptions
2. Limitation and shortcomings
3. List of alternative models to be considered
4. Describe level of detail used in the model and anticipated position in a hierarchy of models.
Provision for Critique, Commentary and Discussion:
This would presumably be supported on the website providing the model and would include:
Commentary by authors, by reviewers, and the responses by authors.
Commentary as in letters to the editor.
Critiques published subsequently by other authors or the same authors.
Listings of and references to competing or alternative models.
Model classes 1 to 3, earlier phases of models:
Given that even as classically beautiful and thoroughly presented and reproducible a model as that of Hodgkin and Huxley (1952d) cannot fulfill all the requirements for a Class 4 model (it does not provide for ionic balance for example), most model presentations will likewise probably fall short, but will nevertheless be important to document and to provide in reproducible form. Recognizing that models are started at levels below which they can be put into numbers, a database of models for the research community should include the design and development phases of models.
Class 3 Models. (Kinetic descriptions of relationships between variables and model solutions fitting observed time course behavior.)
This is where the bulk of published models lies; illustrating that even when one attempts to define a model as completely as possible there will be identifiable shortcomings precluding its classification as Class 4. There is obviously a fuzzy edge to these classifications, and it would make sense that when a Class 3 model is provided with fully defined assumptions and limitations it might be “elevated” to Class 4 status. Regarding Class 3 models as incomplete Class 4 models usefully defines Class 3 models: they lack one or more specific attributes of a Class 4 model. The identification of the missing attribute thence defines what next is needed: more data regarding a component, the replacement of an empirical or assumed relationship with a biophysically or biochemically defined relationship or the replacement of an hypothesized feedback loop required for realism with an identified and characterized feedback mechanism. Thus a list of the questionable assumptions is a critical component of a Class 3 model.
Class 2 models: (Linkage maps of proposed relationships, sometimes with cause-and-effect directionality)
An example of a class 2 model is a bacterial metabolic model wherein the components are mostly identified and the relationships more or less known, even if the reaction kinetic parameters are not yet known. If the stoichiometries of the reactions are known, then the system can be “characterized” using flux balance analysis (FBA), a technique for estimating the steady-state balance in a network of reactions from the stoichiometries when the reactions have not been characterized either mechanistically, kinetically or thermodynamically. Imposing thermodynamic constraints on a network model improves the likelihood of the steady state fluxes being close to experimentally observed values, but still does not provide evidence on the shapes of transients or even on network stability.
Class 1 Models: (Lists of observed measures, categorized together in groups suspected of being related.)
This class might be regarded suspiciously, a collection of objects masquerading as a model, and yet this is where almost all models or and systems concepts begin. Examples can be taken from various fields. One would be the historical development of an understanding of the tricarboxylic acid (TCA) cycle from an incomplete set of observations of a set of solutes in the first half of the twentieth century, the organization of these by the husband and wife team, the Coris (the Cori cycle), and the later incorporation by Krebs into the TCA cycle, leading to later refinements. Another illustration is the current status of genetic regulatory networks: while the gene or set of genes providing a given protein can be readily identified, the succession of proteins involved in promoting expression and in regulating expression to stable levels are mostly unknown. These regulatory proteins achieve only miniscule concentrations, a few copies per cell, and evade identification, but nevertheless need to be recognized and their kinetics characterized via modeling, even though stochastic modeling techniques will probably be required. This early phase model type can be characterized using lists, Venn diagrams of potentially related elements, and conceptual network diagrams. (Other examples could relate to glucose, hypertension, alveolar gas exchange, etc.)
Reference List
Bassingthwaighte JB, Raymond GM, Ploger JD, Schwartz LM, Bukowski TR. GENTEX, a general multiscale model for in vivo tissue exchanges and intraorgan metabolism. Phil Trans Roy Soc A: Math Phys Eng Sci 364: 1423-1442, 2006.
Bassingthwaighte JB, Chizek HJ, Atlas LE, Qian H. Multiscale modeling of cardiac cellular energetics. Ann NY Acad Sci 1047: 395-426, 2005.
Bassingthwaighte JB, Chizeck HJ, and Atlas LE. Strategies and tactics in multiscale modeling of cell-to-organ systems. Proc IEEE 94: 819-831, 2006.
Bassingthwaighte JB, Vinnakota KC. The computational integrated myocyte: A view into the virtual heart. Ann NY Acad Sci 1015: 391-404, 2004.
Hodgkin AL, Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 117: 500-544, 1952.
Platt JR. Strong inference. Science 146:347-353, 1946
Qian H and Beard DA. Thermodynamic-based computational profiling of cellular regulatory control in hepatocyte metabolism. Am J Physiol Endocrinol Metab 288: E633-E644, 2005.
Smith NA, Crampin EJ, Niederer SA, Bassingthwaighte JB, Beard DA. Computational biology of the cardiac myocyte: Proposed standards for the physiome J Exper Biol 210 1576-1583, 2007
Vinnakota K, Kemp ML, and Kushmerick MJ. Dynamics of muscle glycogenolysis modeled with pH time-course computation and pH dependent reaction equilibria and enzyme kinetics. Biophys J 91: 1264-1287, 2006.
Winslow RL, Rice J, Jafri S, Marbán E, O’Rourke B. Mechanisms of altered excitation-contraction coupling in canine tachycardia-induced heart failure, II Model studies. Circ Res 84: 571-586, 1999.