- Review
Ethical Considerations for Machine Learning Research Using Free-Text Electronic Medical Records: Challenges, Evidence, and Best Practices
- Guosong Wu and
- Fengjuan Yang
The increasing availability of free-text components in electronic medical records (EMRs) offers unprecedented opportunities for machine learning research, enabling improved disease phenotyping, risk prediction, and patient stratification. However, the use of narrative clinical data raises distinct ethical challenges that are not fully addressed by conventional frameworks for structured data. We conducted a narrative review synthesizing conceptual and empirical literature on ethical issues in free-text EMR research, focusing on privacy, fairness, autonomy, interpretability, and governance. We examined technical methods, including de-identification, differential privacy, bias mitigation, and explainable AI, alongside normative approaches, such as participatory design, dynamic consent models, and multi-stakeholder governance. Our analysis highlights persistent risks, including re-identification, algorithmic bias, and inequitable access, as well as limitations in current regulatory guidance across jurisdictions. We propose ethics-by-design principles that integrate ethical reflection into all stages of machine learning research, emphasize relational accountability to patients and stakeholders, and support global harmonization in governance and stewardship. Implementing these principles can enhance transparency, trust, and social value while maintaining scientific rigor. Ethical integration is therefore not optional but essential to ensure that machine learning research using free-text EMRs aligns with both clinical relevance and societal expectations.
6 December 2025



