Back to Developing Standards Page
AI/ML-Readiness
For an initiative, domain, or organization to be “AI/ML Ready”, it should first satisfy the following three criteria, each of which are composed of three specific sub-criteria:
1. Data Criteria -- Does the proposed initiative have a means to:
- Obtain data that is representative of the phenomenon they seek to model?
- Organize and store data according to FAIR Principles?
- Annotate missing and/or erroneous data?
2. Technology Criteria -- Does the proposed initiative have a means to:
- Access necessary computational resources to develop models?
- Obtain personnel with expertise in relevant methodologies?
- Continuously maintain technologies after development?
3. Cultural Criteria -- Are key stakeholders in the proposed initiative:
- Able to obtain better outcomes using AI?
- Receptive to the use of AI agents to replace or augment human actors?
- Willing to democratize data and models?
Semantic Interoperability
Standard codes and definitions for variables are the key to interoperability in research and the merging of data into large data sets. These shared semantics turn information about diagnoses, prognoses, therapies, interventions, and other clinical concepts into computable knowledge using free-text data about human health, as well as illustrates the hierarchies that exist between biomedical concepts. The set of semantic types that can plausibly be linked by relationship help with interpreting the meaning that has been assigned to the Metathesaurus concept to provide clarity and to promote interoperability.
Although the need for data models with richer semantics is widely recognized, no single approach has won general acceptance. In addition, dynamic concepts, such as the addition of the social determinants of health (SDOH) need to be included in all models. The information associated with each semantic type and relationship needs to be defined and relationships linked.
A plan to expand or build upon current resources to establish an AI semantic model, metathesaurus, common data elements and ontologies for research must be established.
Standard: Inclusion and Data Sufficiency
A primary goal of the standards is to ensure that data curation, management and analyses can be generalizable to the entire population. It is imperative that the quality of data is representative of the actual population and takes into account the multitude of factors that contribute to, and significantly impact, health outcomes for all population groups, all ages, women, minorities and other under representative marginalized populations. Outputs from machine learning and other artificial intelligence analyses are limited to the accuracy of available data. Therefore, women and members of minority groups and their subpopulations across the life course must be sufficiently and appropriately included, so that the need for imputation of missing data is minimized, and adequate machine learning/training and algorithms can be tested to mitigate biases before implementation. Data is strongly suggested to include populations identified as low socioeconomic status, rural, and sexual gender minorities to test for potential biases before implementation. Foreign awards and domestic awards with a foreign component must abide with this inclusion standard.
The Data Plan (for grant applications) or Proposal (for contract solicitations) must include a description of plans to ensure the inclusion of population data by sex, racial/ethnic groups, and relevant subpopulations across the life span, such that relevant data sufficiency is achieved for each population group. It is preferred, when relevant, to have adequate data of populations with low socioeconomic status, from rural areas, and sexual gender minorities. In addition, the data plan must include secondary data that yields these data fields as well. Broad utility for all populations must be depicted.
Definition: A minority group is a readily identifiable subset of the U.S. population that is distinguished by racial, ethnic, and/or cultural heritage.
The Office of Management and Budget (OMB) Directive No. 15
http://www.whitehouse.gov/omb/fedreg/ombdir15.html defines minimum standards for maintaining, collecting and presenting data on race and ethnicity for all Federal reporting. NIH is required to use these definitions to allow comparisons to other federal databases, especially the census and national health databases.
Definition: Sex refers to biological differences between females and males, including chromosomes, sex organs, and endogenous hormonal profiles.
Definition: Include all ages across the life span, including children, defined as individuals under the age of 18, and older adults, defined as individuals 65 and older
Data Bias in AI/ML
A type of error in which certain elements of a dataset are more heavily weighted and/or represented than others. Types of data bias include selection bias, exclusion bias, measurement bias, recall bias, observer bias etc. A biased dataset not only introduces issues of ethics and fairness, but also results in skewed outcomes, lower accuracy, and analytical errors. To ensure training data for machine learning projects representative of the real world, bias testing should be built into the development cycle. Google (the What-If Tool), IBM (AI Fairness 360), and Microsoft (a UBE algorithm) have all released tools and guides to help with analyzing bias for a number of different data types. For example, the Google What-If Tool can be used to detect misclassifications, assess fairness in binary classification models, and investigate model performance across different subgroups.
Model Cards
Model cards provide a step towards the responsible democratization of machine learning and related AI technology, increasing transparency into how well AI technology works. Model cards ideally provide a summary of model performance across a variety of relevant factors including groups, instrumentation, and environments. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information.
Model details (date, version, type, licenses, citations etc.) intended use (specify non-intended use), population factors (demographics and biomedical or health issues), metrics (statistics, programs, error types, etc.), training data, evaluation data, ethical considerations, and caveats/recommendations and annotation must be included.