Talk 3: Future of Diverse Data for Multiscale Modeling (IMAG-AND Futures)

Back to IMAG-AND Futures Agenda

 

3:30-3:50 pm               Future of Diverse Data for Multiscale Modeling:

“Inclusive study design at every scale improves genomic analysis and clinical application for everyone

Chani Hodonsky, Univ. of Virginia

 

BIO: Chani Hodonsky is a postdoctoral fellow in the Miller Lab in the University of Virginia Center for Public Health Genomics. She obtained her PhD in Epidemiology from the University of North Carolina Gillings School of Public Health in 2019, where her dissertation focused on the role of genetic variation in red blood cell traits in ancestrally diverse populations in the United States as a member of the PAGE study. Her current research looks at genetic contributions to atherosclerosis in humans, in particular the role of genetic variants on genes expressed in coronary artery tissue.

ABSTRACT: Over the past decade, genome-wide association studies (GWAS) have identified thousands of associations between regions in the genome and complex diseases, in sample sizes ranging from thousands to over a million people. More recently, improved affordability and access to RNA sequencing have allowed for identification of associations between genetic variants and the expression of genes expected to play a role in these diseases. Efforts by the Population Architecture using Genomics and Epidemiology (PAGE) study and others have demonstrated that inclusive, ancestrally diverse study design benefits both discovery efforts and identification of independent associations within known genetic loci. Publicly available resources for interpreting results of these methods provide researchers an opportunity to apply their results on the background of larger sample sizes. However, academic research and accompanying public data have and remain focused on European-ancestry populations, despite the majority of human genetic variation lying outside of Europe. Current polygenic risk scores also rely heavily on genetic associations identified in European-ancestry populations, preventing broad application of a potentially useful clinical tool and threatening to increase health disparities nationally and globally. Increased representation of ancestrally diverse study populations is necessary to improve both the discovery and interpretation of genetic associations. It is also expected to enhance accessibility of clinical applications such as risk prediction and therapeutic development strategies. Chani spent several years working as a molecular genetics technician prior to attending graduate school for epidemiology training. In combination, these experiences led her to seek out a postdoctoral position which would allow for continued study of cardiovascular genomics on both population- and tissue-specific levels.

Presentation:

Comment

Comment

Very good presentation. I do want to ask why are you going towards genetics, are other biomarkers not good enough? For example can't the plots in the first few slides you showed as background cannot be explained by parameters such as age?  

 

Also where can we find your presentation?

Submitted by jbarhak on Tue, 03/17/2020 - 15:49

Comment

I'm wondering if you have seen that some fields of medical research incorporate diverse data more than others?

Submitted by ssblemker on Tue, 03/17/2020 - 15:54

Comment

This is an interesting question. Short answer: I would have to do more research to decide whether I think there is, and whether it's intentional or not. Long answer: In my experience, "samples of convenience" is often the approach when a particular type of analysis is still becoming established (particularly because IRBs rightly have rules about avoiding coercion and ensuring informed consent), and due to historical malfeasance by researchers within a lot of communities, the group of people that is often most willing to participate in and trust academic research is white or European individuals. Funding within European countries of course will more often reflect their own populations, which may have limited representation, and there is a lot of funding for really interesting types of 'omics in Europe right now (I didn't even touch on epigenomics!). Within the United States I think particularly clinical trials are either required to be inclusive or justify why they aren't, which is a good step, but as far as I know there are no mechanisms within funding agencies that ask about data in grant applications. But I also think as sequencing-based methods that are sufficiently powered with lower sample sizes (single-cell RNA seq as opposed to GWAS, for example) become more common, there is a danger that researchers will think--as they did in the early GWAS era of <5,000 sample sizes--if their population is only going to be 100 people anyway, striving for homogeneity will limit the risk of stratification issues and allow for the detection of rare variants within the restricted population. In the US and Europe that type of approach will almost always result in a majority- or exclusively European-ancestry population, despite work showing there isn't a need for this type of approach and the results can be harmful (particularly re: PRS). My hope is that with increased awareness about the limitations of exclusive study design, more researchers will begin to learn about how the methods they employ can (or in some cases cannot, as is currently the case for LD score regression) be successfully and better implemented with more ancestrally representative sampling.

Submitted by chanijo on Tue, 03/17/2020 - 16:34

In reply to by ssblemker

Comment

Congrats! Kudos on your comment that we need to work together! In a different domain of diversity - Historically, it has been pretty common in GWAS studies to EXCLUDE sex chromosomes, preventing adequate study of sex differences. This is a lost opportunity given the info we have on sex differences in CAD phenotypes. From your view, is there progress on this – were you able to include X and Y genes in your study?

Submitted by Holly (not verified) on Tue, 03/17/2020 - 15:54

Comment

I love this question! First, I will distinguish sex & gender. Sex is a biological variable referring very broadly to the number/distribution of sex chromosomes, XX and XY being the most common but far from the only combinations observed in humans. Gender is a social construct that may or may not correspond to an individual's biological sex, and for which there are innumerous possible identities. As far as I am aware there has never been a GWAS of any complex trait that incorporated gender as a covariate in the model, especially because most studies both historically and contemporarily don't even ask about gender identity and therefore have no information about the gender of their participants. On the other hand, biological sex is almost always adjusted for in GWAS because many complex traits are associated with sex although, as you mention, sex chromosomes are often excluded. Not only are sex chromosomes excluded, but individuals with non-XX or XY sex chromosome combinations or recorded sex (often based on outward physical appearance) not matching corresponding genotype data are also excluded, despite up to 2% of the global population falling outside that category (in a sample the size of the UK BioBank of 500,000 participants, this could lead to the exclusion of up to 10,000 individuals who are presumably also at risk for the complex disease being studied, which far exceeds the total sample sizes of early GWAS). Sex differences in associated SNPs have been performed for a fair number of phenotypes by stratifying the study population into males and females and then identifying variants within each group, although these analyses still often exclude the sex chromosomes. I would be thrilled to see increased representation of both gender and sex diversity in genomics research.

 

With regard to exclusion of sex chromosomes, this is an excellent point. The Y chromosome has very few genetic variants per megabase compared to all other chromosomes, although of course some might be present and have phenotypic effects in biological males. On the other hand, the X chromosome is larger than 14 of the autosomal chromosomes and has approximately 800 genes. The typical stated reason for excluding the X chromosome is that analyzing it is more complex than simply sex-stratifying a sample and analyzing in men and women separately, then meta-analyzing the summary statistics. While X inactivation (usually) occurs on one X chromosome per cell and is expected to be random, it's (1) not always random in ways that researchers can't really predict, (2) there are numerous genes that escape X inactivation and therefore variants in open chromatin around those regions would need to be analyzed separately, but those are not particularly well characterized either, and (3) X inactivation may be tissue-specific, although evidence is strong in mice than humans for this phenomenon occurring. I personally did not include the X chromosome in a GWAS I did for my dissertation because the methods I was employing were not really appropriate for this type of complex analysis, but I 100% agree that there are probably hundreds of real genetic associations and maybe thousands of trans-eQTLs for dozens of phenotypes on the X chromosome just waiting to be explored. Over the past several years some new methods development has occurred in this area, and I think once a consensus is reached in the field about the premier way to evaluate the sex chromosomes but particularly the X, a large number of analyses will be published confirming your suspicion that these associations exist and need to be more closely examined.

Submitted by chanijo on Tue, 03/17/2020 - 17:05

In reply to by Holly (not verified)

Comment

Very nice talk! Challenging topic. I'll love to follow your next steps.

Submitted by Rafael Dariolli (not verified) on Tue, 03/17/2020 - 15:54

Comment

Thank you very much! I am grateful to be able to work with other people who are passionate about improved representation in genomics research.

Submitted by chanijo on Tue, 03/17/2020 - 16:07

In reply to by Rafael Dariolli (not verified)