Big Problems, Big Data, Bigger Possibilities in Health Disparities Research

No problem can be solved from the same level of consciousness that created it. – Albert Einstein

By Nancy Breen, Ph.D.
National Institute on Minority Health and Health Disparities

Photo of Dr. Nancy BreenWhile at NIMHD, I was asked to lead the Methods and Measurement Science pillar, one of four pillars of the NIMHD Visioning Process. The tasks of this pillar were to establish definitions, harmonize outcomes, and present scientific insights. The objectives were to expand and strengthen analytic methods and to offer guidelines for consistent measurement.  Results are published the NIMHD AJPH Supplement, New Perspectives to Advance Minority Health and Health Disparities Research. Health disparity outcome measures are defined in “Overview”1, “Methodological Approaches to Understanding Causes of Health Disparities” are emphasized2, and recommendations are offered for “Harmonizing Health Disparities Measurement”3.  Evaluation4, an under-used tool in health disparities research, is encouraged with guidelines provided. This blog enhances findings from “Translational Health Disparities Research in a Data-Rich World”5.

The role of big data in health disparities research is a burning question.  Our interdisciplinary team explored how big data can contribute to reducing health disparities. The collaboration resulted in years of challenging and productive transdisciplinary teamwork that yielded two articles6,7 and the editorial for NIMHD’s AJPH Supplement, New Perspectives to Advance Minority Health and Health Disparities Research5.

While we agreed that large, structured data sets such as the American Community, the National Health Interview and other federal surveys are big data, the perplexing questions had to do with analyzing unstructured big data collections because these were accumulating rapidly from a variety of sources without guidance for analysis. Rich new resources, including social media, electronic health records, sensor information from digital devices, and crowd-sourced and citizen-collected data have the potential to complement more traditional data from health surveys, administrative data, and investigator-initiated registries or cohorts. We concluded that combining types and sources of data and using mixed methods analysis has the potential to provide more timely and detailed analysis that should speed reduction of health disparities.

However, a potential source of harm from big data involves incorporation of implicit bias into analyses or tools using complex data streams. Even though combined data will advance health disparity science, it is important to heed warnings about potential algorithmic bias in machine learning and artificial intelligence. For example, face recognition tools have much higher error rates for women, especially women with darker skin, than for lighter-skinned men because the training sets are overwhelmingly composed of lighter-skinned male subjects8. Biased feeder data will lead to similarly biased outcomes and, already, harmful bias has been shown in multiple venues9, 10. Coupling many different types of data increases the risk of harm for individuals. In addition, entire communities may be stigmatized by research findings that emphasize or overstate negative features. Therefore, health disparity researchers must be mindful of both social and individual ramifications of data and results.

“A potential source of harm from big data involves incorporation of implicit bias into analyses or tools  using complex data streams. Even though combined data will advance health disparity science, it is important to heed warnings about potential algorithmic bias in machine learning and artificial intelligence.”

Interventions that focus on policies and structures are more effective because they reach broader segments of society and require less individual effort.  Knowing the barriers that lead to health disparities faced by community members is key to intervening on the right policies and structures. Big data can supplement understanding of the context of peoples’ lives. We know that fundamental causes and upstream factors generate the social determinants of health that contribute to health disparities. During the pandemic, factors underlying disparities have become especially clear. Greater likelihood of living paycheck to paycheck, less access to health insurance and unemployment insurance, and lack of wealth to finance emergencies have disadvantaged most racial/ethnic population groups more than Whites. Policy responses that would address the structural factors described above could include increasing the minimum wage, extending health and unemployment insurance, and creating a more accommodating bankruptcy process for consumers.

We argue that the most promising intervention approach for reducing health disparities is community-based systematic learning. Our team developed a heuristic, the Iterative Cyclical Approach for Reducing Health Disparities, graphed in Figure 2 of the Health Equity article and reproduced below7. If big data can add context and provide more granular data, it will facilitate use of the Iterative Cyclical Approach at all levels of government, including the local level, where change is most likely to occur.  For researchers to understand the structures and policies that lead to health disparities, community members will need to participate in all intervention activities from articulating barriers and setting goals through assessment and redesign. For community members to have a sustained voice, decision makers at the affected levels of government need to be involved in the intervention. While not all interventions succeed, all result in learning, and this learning is incorporated into the Iterative Cyclical Approach.

Translation from bench science to real-world practice averages 17 years11. To accelerate translational health disparities research, we argued for an iterative approach using analysis of big data that involves all stakeholders. Research teams need to include data providers, data analysts, social scientists, and decision makers at all levels of government.  However, a big data-driven cyclical approach will be challenging.  Throughout, research teams will need to avoid the pitfalls of bias and stigma. Policies and structures that maintain the health disparities we see today were fabricated over many years, so long-term investments are needed to mitigate them.

Today, unprecedented opportunities exist to broaden the field of health disparities inquiry using a continuously growing spectrum of diverse and novel data sources which, with the right workforce and tools, will lead to greater knowledge about causes of health disparities and more effective methods for addressing disparities than previously imagined.


  1. Duran D and Perez-Stable EJ. Novel Approaches to Advance Minority Health and Health Disparities Research. AJPH Supplement 1, 2019, Vol 109, No. S1: S8-S10. Also see HDPulse at
  2. Jeffries N, Zaslavsky AM., Diez Roux AV, et al., Methodological Approaches to Understanding Causes of Health Disparities, AJPH Supplement 1, 2019, Vol 109, No. S1: S28-S33.
  3. Duran D, Asada Y, Millum J, Gezmu M, Harmonizing Health Disparities Measurement, AJPH Supplement 1, 2019, Vol 109, No. S1:S25-S27.
  4. Dye BA, Duran CG, Murray DM, Cresswell JW, Patrick R, Farhat F, Breen N, Engelgau MM, The Importance of Evaluation Health Disparities Research, AJPH Supplement 1, 2019, Vol 109, No. S1:S34-S40.
  5. Breen N, Zhang X, Jackson JS. Wood F, Wong DWS. Translational Health Disparities Research in a Data-Rich World. AJPH Supplement 1, 2019, Vol 109, No. S1: S41-S42.
  6. Zhang X, Pérez-Stable EJ, Bourne PE, Peprah E, Duru OK, Breen N, Berrigan D, Wood F, Jackson JS, Wong DWS, Denny J. Big Data Science: Opportunities and Challenges to Address Minority Health and Health Disparities in the 21st Century. Ethn Dis. 2017 Apr 20;27(2):95-106.
  7. Breen N, Berrigan D, Jackson James S, Wong DWS, Wood F, Denny J, Zhang X, Bourne P, Translational Health Disparities Research in a Data-Rich World, Health Equity, 2019, Vol 3, No.1 (
  8. Buolamwini J, Gebru T. Gender shades: Intersectional accuracy disparities in commercial gender classification. Paper presented at: Conference on Fairness, Accountability and Transparency, 2018; 81:1–15 (available at pdf (
  9. Hooker SE, Jr., Woods-Burnham L, Bathina M, et al. Genetic ancestry analysis reveals misclassification of commonly used cancer cell Lines. Cancer Epidemiol Biomarkers Prev. 2019; 28:1003–1009.
  10. Fuster A, Goldsmith-Pinkham P, Ramadorai T, et al. Predictably unequal? the effects of machine learning on credit markets. March 2018. Last accessed 11/5/19
  11. Morris ZS, Wooding S, Grant J. The answer is 17 years, what is the question: understanding time lags in translational research. J R Soc Med. 2011; 104:510–520.
(Visited 1,513 times, 1 visits today)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.