Welcome! I am a postdoctoral research scientist in the Division of Infectious Diseases at Columbia University Irving Medical Center with expertise in artificial intelligence (AI) in medicine and public health. I am also a DAAD AInet Fellow for Safety and Security in AI through the German Academic Exchange Service, Deutscher Akademischer Austauschdienst (DAAD). I completed my PhD in the Department of Biomedical Informatics at Columbia while concurrently a Visiting Postgraduate Research Fellow in the Department of Medicine at Harvard Medical School. Prior to training in AI in healthcare, I was a member of the Strategic Information division of the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR) at Harvard University which aimed to rapidly expand antiretroviral therapy for people living with HIV/AIDS in sub-Saharan Africa.
Building on a decade and a half decade of domestic and international experience in clinical and public health informatics, my research focuses especially on human-centered AI and the development of systematic, scalable, data-driven approaches to improve human health and well-being for everyone. My work usually examines and applies methods such as machine learning, natural language processing, and spatio-temporal analysis in addition to traditional biostatistics and epidemiology. I am particularly interested in using and interrogating multimodal data sources and the vast toolbox that computational learning offers to better understand, improve, and facilitate study of health in populations and communities. Generally, my research can be grouped into four primary domains:
My PhD was funded by training fellowships from the National Library of Medicine and National Institute of Allergy and Infectious Diseases. I am also the recipient of a Computational and Data Science Fellowship from the Association for Computing Machinery (ACM) Special Interest Group in High Performance Computing (SIGHPC). In addition to my degrees in biomedical informatics, I also have a Master of Applied Science in Spatial Analysis from the Johns Hopkins Bloomberg School of Public Health and a Bachelor of Arts in Sociology and History from Yale University.
harry [dot] reyes [at] columbia [dot] edu
Division of Infectious Diseases
Department of Medicine
Columbia University
622 West 168th Street, PH20
New York, NY 10032
Citations: 1236 h-index: 14
Mining the Health Disparities and Minority Health Bibliome: A Computational Scoping Review and Gap Analysis of 200,000+ Articles.
Science Advances.
2024.
10(4):eadf9033. PMID: 38266089.
Surveillance of HIV and Other Sexually Transmitted Infections in a Learning Public Health System.
JAMA Netw Open..
2025;8(6):e2514308. In Press. medRxiv [Preprint] available .
doi: 10.1001/jamanetworkopen.2025.14308.
Development of Machine Learning-Based Mpox Surveillance Models in a Learning Health System.
Sexually Transmitted Infections. In Press. medRxiv [Preprint] available.
2025 .
doi: 10.1136/sextrans-2024-056382.
The Impact of Evolving Endometriosis Guidelines on Diagnosis and Observational Health Studies.
medRxiv [Preprint].
2024 December.
doi: 10.1101/2024.12.13.24319010.
Professional-Patient Boundaries: a National Survey of Primary Care Physicians’ Attitudes and Practices.
J Gen Intern Med.
2020.
35(2):457–464. PMID: 31755012.
A Scoping Review of Ethics Considerations in Clinical Natural Language Processing.
JAMIA Open.
2022.
5(2):ooac039. PMID: 35663112.
What Works in Medication Reconciliation: An On-Treatment and Site Analysis of the MARQUIS2 Study.
BMJ Qual Saf.
2023.
32(8):457-469. PMID: 36948542.
Ranked #1 among the top research articles of 2023 by BMJ Quality and Safety
Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text. AMIA Annu Symp Proc. 2024 January. 289-298.
Bear Don’t Walk IV OJ, Pichon A, Reyes Nieva H, Sun T, Altosaar J, Natarajan K, Perotte A, Tarczy-Hornoch P, Demner-Fushman D, Elhadad N .Exploring Gender Disparities in Time to Diagnosis. Machine Learning for Health (ML4H) Workshop at the Conference on Neural Information Processing Systems (NeurIPS). 2020 December. 1-6.
Sun TY, Bear Don’t Walk OJ IV, Chen JL, Reyes Nieva H, Elhadad N.
Time of day and the decision to prescribe antibiotics.
JAMA Intern Med.
2014.
174(12):2029-31. PMID: 25286067.
Scale-up of networked HIV treatment in Nigeria: creation of an integrated electronic medical records system.
Int J Med Inform.
2015.
84(1):58-68. PMID: 25301692.
Characteristics of Disease-Specific and Generic Diagnostic Pitfalls: A Qualitative Study.
JAMA Netw Open.
2022.
5(1):e2144531. PMID: 35061037.
The studies presented in this dissertation seek to advance health equity science by drawing from informatics-based methods and subfields of artificial intelligence such as machine learning, natural language processing, and symbolic reasoning. This thesis employs robust methods for big data collection, integration, and analysis to leverage existing and emerging data sources including a large corpus of biomedical literature, electronic health records from the largest public health information exchange in the United States, open government datasets, proprietary national insurance claims datasets, and public health reporting data.
Health Disparities and Minority Health (HDMH) Monitor / Principal developer
The Health Disparities and Minority Health (HDMH) Monitor is an online article repository and interactive dashboard that leverages natural language processing and machine learning methods to support scientific discovery via automated archive, search, information synthesis, and data visualization of articles concerning HDMH in MEDLINE. It is based on a large-scale computational scoping review aimed at characterizing major topics found among nearly a quarter million scientific articles in the HDMH literature, examining change in topic mention over time, identifying notable gaps in coverage, and deriving actionable insights for further inquiry.
C-REACT: Contextualized Race and Ethnicity Annotations for Clinical Text / Co-developer
The C-REACT dataset is a large publicly-available corpus of sentences from clinical notes manually annotated for information related to race and ethnicity (RE). This corpus contains two sets of gold-standard annotations for RE data. The first set contains granular RE-information such as patient country of origin and spoken language. The second set of annotations contains RE labels manually assigned by clinicians. This corpus is intended to improve understanding about granular information related to RE contained within the clinical note and how this information might be used to infer RE.
HERA: Health Equity Research Assessment / Co-developer
The Health Equity Research Assessment (HERA) is a large-scale characterization conducted across Observational Health Data Science Informatics (OHDSI) sites with heterogeneous populations and insurance coverage types, allowing for identification of persistent and generalizable trends in diagnosis differences. The HERA dashboard and visualizations can be used to download study data, further investigate health differences, and generate novel hypotheses.
Covidwatcher is an app and online portal that surveyed users about their exposure to COVID-19, symptoms, access to medical care, and impact on daily life. The data collected was used to track the spread of coronavirus, giving citizens real-time information about hot spots, and enabling health care officials to deploy resources where needed most.
Harvard PEPFAR Nigeria Adult quality improvement tool / Co-developer
Co-developed and evaluated a utility to extract information from EHR data warehouse and generate measures based on 15 adult quality of care indicators at 33 Harvard PEPFAR sites in Nigeria. Module reviews continuity of care, drug therapy initiation, loss to follow-up, laboratory monitoring, disease screening based on clinical symptoms assessment, treatment failure, and treatment response.
Reyes Nieva H, Zucker J, Elhadad N. Elucidating Health Inequities and Research Gaps in HIV and Other Sexually Transmitted Infections Using Data Mining and a Large Language Model: A Computational Scoping Review
September 2024
STI Prevention; Atlanta, GA
Reyes Nieva H. Challenges, Opportunities, and Considerations: Promoting Inclusive Research in the Era of Big Data
November 2023
American Medical Informatics Association Annual Symposium; New Orleans, LA
Invited presenter and panelist for session on “Advancing Diversity, Equity, and Inclusion in Biomedical Informatics Research: Strategies and Best Practices for Using Inclusive Language Across the Research Lifecycle”
Reyes Nieva H, Tucker EG, Castor D, Yin MT, Gordon P, Elhadad N, Zucker J. Health Information Exchange Enables Enhanced STI Surveillance Using Electronic Health Record Data
July 2023
HIV and STI 2023 World Congress; Chicago, IL
Reyes Nieva H, Zucker J, Tucker EG, McLean J, DeLaurentis C, Gunaratne S, Elhadad N. Development and Validation of Machine and Deep Learning Classifiers for Monkeypox
May 2023
Symposium on Artificial Intelligence for Learning Health Systems (SAIL); Río Grande, Puerto Rico
Reyes Nieva H, Elhadad N. Mining the Health Disparities and Minority Health Bibliome: A Computational Scoping Review
November 2022
American Medical Informatics Association Annual Symposium; Washington, DC
Spotlighted and invited for special presentation by AMIA DEI Committee for “demonstrating best practices in promoting diversity, equity, and inclusion through scholarly communications in biomedical informatics”
Sun T, Hardin J, Reyes Nieva H, Natarajan K, Cheng RF, Ryan P, Elhadad N. Large-scale Characterization of Gender Differences in Age at Diagnosis and Time to Diagnosis in Longitudinal Observational Health Data
October 2022
National Institutes of Health (NIH) Office of Research on Women’s Health (ORWH) Workshop on Gender and Health: Impacts of Structural Sexism, Gender Norms, Relational Power Dynamics, and Gender Inequities; Virtual Event due to COVID-19
Reyes Nieva H, Sun TY, Gorman SR, Mao G, Elhadad N. Differential Presentation and Delays in Treatment for Acute Myocardial Infarction Associated with Sex and Race/Ethnicity
November 2021
American Medical Informatics Association Annual Symposium; San Diego, CA
Pang C, Chen R, Reyes Nieva H, Kalluri KS, Sun TY, Jiang X, Rodriguez VA, Natarajan K. Characterization and Comparison of Embedding Algorithms for Phenotyping across a Network of Observational Databases
November 2020
American Medical Informatics Association Annual Symposium; Virtual event due to COVID-19
Boskey E, Tabaac A, Wigell R, Wolf K, Lage I, Landrum S, Reyes Nieva H, Bearnot B, Streed C. Using patterns of missing EHR data to identify care disparities in gender diverse patients
October 2020
APHA Annual Meeting and Expo; Virtual event due to COVID-19
Reyes Nieva H, Blackley S, Streed C, Fiskio J, Zhou L. High physician and clinic-level variation in documentation of sexual orientation and gender identity in the electronic health record
April 2018
New England Science Symposium; Boston, MA
Received the Ruth and William Silen, MD Oral Presentation Award
Mlaver E, Dalal AK, Reyes Nieva H, Chang F, Hanna J, Ravindran S, McNally K, Stade D, Morrison C, Bates D, Dykes P. An Analysis of Patient Portal Use in the Acute Care Setting
November 2015
American Medical Informatics Association Annual Meeting; San Francisco, CA
Reyes Nieva H, Palm K, Zucconi T. Advocacy and Implementation: Gathering Sexual Orientation and Gender Identity Demographics in the Clinical Setting
September 2015
GLMA 33rd Annual Conference; Portland, OR
Reyes Nieva H, Doctor JN, Friedberg MW, Birks C, Fiskio JM, Volk LA, Linder JA. Comparing Clinicians’ Perception of Their Own and Their Peers’ Antibiotic Prescribing to Actual Antibiotic Prescribing for Acute Respiratory Infections in Primary Care
April 2014
Society of General Internal Medicine Annual Meeting; San Diego, CA
Received the Outstanding Quality and Patient Safety Oral Presentation Award