Biomedical Informatics: State of the Art, Challenges, and Opportunities

: Biomedical informatics can be considered as a multidisciplinary research and educational field situated at the intersection of computational sciences (including computer science, data science, mathematics, and statistics), biology, and medicine. In recent years, there have been advances in the field of biomedical informatics. The current article highlights some interesting state-of-the-art research outcomes in these fields. These include research outcomes in areas like (i) computational biology and medicine, (ii) explainable artificial intelligence (XAI) in biomedical research and clinical practice, (iii) machine learning (including deep learning) methods and application for bioinformatics and healthcare, (iv) imaging informatics, as well as (v) medical statistics and data science. Moreover, the current article also discusses some existing challenges and potential future directions for these research areas to advance the fields of biomedical informatics.


Introduction
Biomedical informatics [1][2][3][4][5][6] can be considered as a multidisciplinary research and educational field situated at the intersection of computational sciences (including computer science, data science, mathematics, and statistics), biology, and medicine.It compresses sub-fields like bioinformatics [7], clinical informatics [8], imaging informatics, nursing informatics [9], pharmacy informatics, public health informatics, etc.By incorporating data-driven scientific approaches, it aims to extract useful information from biomedical data and transfer the information into knowledge.Here, biomedical data can be extracted from various application areas of biomedical informatics, computational biology, and medical care-such as anatomy, biomodelling, cancer biology, evolutionary biology, genomics, neuroscience, neuropsychiatry, pharmacy and pharmacology, pharmacometrics, and physiological medicine.Examples of these biomedical data include: • Biological data, ranging from deoxyribonucleic acid (DNA) sequences and protein structures to complex cellular processes, for bioinformatics; • Clinical trial data for clinical informatics; • X-ray images for imaging informatics; • Healthcare data-such as electronic health records (EHRs) or electronic medical records (EMRs)-for public health informatics.
In recent years, there have been advances in biomedical informatics.These include advancements in bioinformatics, clinical informatics, imaging informatics, public health informatics, pharmacological data science, as well as data science methodologies in presenting and utilizing biomedical datasets.Given the high volumes of research outcomes in biomedical informatics, it appears to be impractical to provide an exhaustive list here.Hence, in this article, we highlight several state-of-the-art research outcomes in the field of biomedical informatics, and discuss some existing challenges as well as future opportunities.

State of the Art in Biomedical Informatics
In this section, we highlight some interesting state-of-the-art research in the field of biomedical informatics.

Computational Biology and Medicine
The application of computational techniques in the realms of biology, biotechnology, biomedical research, as well as healthcare and medical practices involves the use of data analysis, mathematical modelling, and simulation.These methods enable us to gain insights into intricate biological systems and decipher the molecular foundations of diseases.The integration of biology, bioinformatics, and computational approaches is geared towards advancing our comprehension of life processes and elevating the precision of medical decision making.In recent years, the rapid evolution of experimental methodologies aimed at unravelling the intricate complexities of the human genome and proteome has resulted in an exponential surge of digital information.As a burgeoning field, bioinformatics synergizes the realms of computer science, biology, and chemistry.It integrates artificial intelligence (AI), encompassing machine learning (ML) and artificial neural networks (ANNs), and catalyzes transformative breakthroughs in both biological and medical sciences.The marriage of AI with computer science has not only modernized traditional medicine, but also heralded a new era in systems biology, promising advancements in drug discovery strategies and the streamlining of clinical practice.
For instance, Athanasopoulou et al. [10] conducted a comprehensive review that delineates the primary categories of AI and provides an in-depth exploration of the fundamental principles underpinning widely employed ML, ANNs, and deep learning (DL) approaches.Furthermore, the review underscores the pivotal role of AI-based methods across various biological research domains, with a specific focus on their applications in proteomics and drug design techniques.Beyond the laboratory, the examination extends to the profound implications of AI in everyday clinical practice and healthcare systems, illuminating its potential to revolutionize patient care.
In another instance, Carreras et al. [11] predicted the outcomes of 184 untreated follicular lymphoma patients using gene expression data and AI-in particular, ANNs.By employing an approach with 120 independent multilayer perceptron (MLP) solutions generated through random number generation, they ranked 22,215 gene probes based on their importance in overall survival forecasting.The final ANN architecture included newly identified predictor genes related to cell processes and integrated the international prognostic index (IPI) and immune markers.

Explainable Artificial Intelligence in Biomedical Research and Clinical Practice
The integration of AI systems in biomedical and clinical contexts has the potential to disrupt the traditional doctor-patient dynamic, a relationship historically rooted in trust and transparency regarding medical advice and therapeutic choices.As the responsibility for diagnoses and treatment decisions shifts from human physicians to machine algorithms, the decision-making process becomes less transparent.ML algorithms-particularly those employed for skill learning in clinical decision making-rely on examples to fine-tune their general applicability, such as ANNs and classifiers.Consequently, seeking an explanation for a decision becomes challenging because these algorithms lack the inherent capacity for detailed justification.While experts in statistics or computer science might comprehend the intricate mathematical aspects of AI algorithms, such technical explanations are insufficient when human lives are at stake.Recognizing this challenge, the concept of explainable AI (XAI) has garnered increasing attention from both the scientific and regulatory realms.It aims to provide (i) trustworthiness, (ii) causality, (iii) transferability, (iv) informativeness, (v) trust, (vi) fairness, (vii) accessibility, (viii) interactivity, and (ix) privacy awareness.The emphasis lies in ensuring that XAI possesses the capability to offer comprehensive explanations for their decisions, catering to the understanding of domain experts.The necessity for transparency becomes paramount, especially in the intricate realm of healthcare, where decisions profoundly impact individuals' well-being.
For instance, Lotsch et al. [12] observed a critical requirement that XAI not only makes decisions, but also provides detailed and comprehensible explanations to experts within the field.This dual functionality is imperative for fostering trust, maintaining ethical standards, and addressing concerns surrounding the ethical implications of AI in medical decision making.As the scientific community delves into the intricacies of AI applications in healthcare, developing robust explanations for AI decisions emerges as a pivotal aspect in navigating the evolving landscape of technology-assisted medical practices.
Gashi et al. [13] provided a comprehensive reflection on a curated list of libraries designed to offer decision support to AI models, with a specific emphasis on supporting visual explainability and interpretability.The primary objective is to assist practitioners and researchers in identifying suitable libraries that facilitate a clear understanding of the decision-making process, particularly in sensitive domains such as medicine, where transparency is paramount for safe and reliable application.They utilized a glioma classification model's reasoning as a foundational case study, recognizing the critical importance of visual interpretability in medical applications.The comparison involves an examination of 11 Python libraries.Notable among these libraries are Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME), well known for their contributions in visualizing explainability in AI models.The evaluation encompasses four libraries for global interpretations: Moreover, the evaluation also encompasses three libraries for local interpretations: The showcased model not only validates known variations, but also contributes to the unveiling of lesser-known variations that could serve as potential biomarkers.Their work underscores the significance of visual explainability tools in enhancing the transparency and reliability of AI models, especially in critical applications such as medical decision making.

Machine Learning Methods and Application for Bioinformatics and Healthcare
As a crucial branch of AI, ML (including DL) has garnered significant interest among researchers in the realms of science and engineering.Its progress is propelled by cuttingedge hardware and innovative approaches to real-world challenges.Among various application domains, bioinformatics and healthcare stand out as arenas where ML or DL can truly unleash their potential.These fields frequently involve extensive data, tackle mission-critical responsibilities, and hold substantial socioeconomic significance for society.For example, ML or DL methods leverage ANNs to analyze and interpret intricate biological data-in particular, addressing diseases like cancer, dengue, and coronavirus disease 2019 (COVID-19).These methods excel in tasks like gene expression analysis, protein structure prediction, and genomic sequence interpretation in bioinformatics, providing heightened accuracy and efficiency.In healthcare, ML and DL are instrumental in medical image analysis, disease diagnosis, drug discovery, and personalized treatment planning.The capacity of ML or DL models to autonomously learn complex patterns from large datasets has transformed these fields, offering innovative solutions and advancing our understanding of biological systems for improved biomedical research and patient care.
To elaborate, breast cancer is a widespread and serious health concern, underscoring the importance of early detection for timely treatment.In the realm of bioinformatics and healthcare, recent strides involve leveraging ML techniques to combat breast cancer.
Traditionally, the extraction of information from data to support clinical diagnosis is a laborious task.To address this challenge, Egwom et al. [14]  Besides breast cancer, dengue has become a persistent global health concern with a notable increase in both cases and fatalities over the years.The absence of direct medications or vaccines necessitates a focus on monitoring and controlling the primary carriers (Aedes mosquitoes) to curb the endemicity.The current methodology involves collecting larval samples from breeding sites and manual examination by expert entomologists using microscopes-which is an arduous, time consuming, and impractical process.Prior attempts at automated Aedes larvae detection systems lacked the required accuracy and reliability.To address this challenge, Hossain et al. [15] proposed an automated system employing ensemble learning, achieving a remarkable accuracy of over 99% even from low-magnification images.This ensemble learning system surpasses previous methods in accuracy and demonstrates practical usability, offering a promising advancement in efficient Aedes larvae detection.

Imaging Informatics
Imaging informatics has made positive impacts on transforming medical imaging, facilitating the handling, storage, retrieval, mining, and analysis of extensive imaging datasets.Incorporating AI and ML algorithms into healthcare imaging serves as a potent mechanism for identifying abnormalities and aiding healthcare professionals in rendering more precise diagnoses.
For instance, the precise and early identification of the causes for pneumonia is crucial for implementing prompt treatment and preventive measures, alleviating the burden of infections and enhancing intervention strategies.The outbreak of COVID-19 has led to a surge in new cases of pneumonia and related conditions like acute respiratory distress syndrome.Chest radiography-commonly known as CXR or X-ray-has emerged as a crucial diagnostic tool for COVID-19-infected pneumonia in designated healthcare institutions.The need for swift and reliable pneumonia diagnosis is crucial.To address this challenge, Ibrokhimov and Kang [16] proposed a computer-aided diagnosis (CAD) system based on DL for the rapid detection of pneumonia using X-ray images.To enhance classification accuracy and expedite model conversion, they also leveraged transfer learning and parallel computing techniques with established DL models like VGG19 (a convolutional neural network (CNN) with 19 layers) and ResNet50 (a deep CNN with 50 layers).Experimental results underscore the effectiveness of DL models in swiftly and accurately diagnosing pneumonia using X-ray images.This shows another application of DL methods for bioinformatics and healthcare in addition to those mentioned in Section 2.3.
Besides X-ray, magnetic resonance imaging (MRI) also aids medical professionals in decision making.Currently, ML algorithms are commonly employed for this purpose, but they often lack transparency in their internal decision processes, making validation and interpretation challenging.To address these challenges, Eder et al. [17] applied XAI methods to interpret the decision making of an ML algorithm in the context of predicting the survival rates of patients with brain tumors based on MRI scans.They employed a CNN structure, enhancing explainability through Shapley overlays.The resulting overfitting of some network structures serves as a use case for their interpretation method.The study demonstrates that network structures can be validated by experts through visualizations, rendering the decision-making process interpretable.The implementation-available on GitLab as "XAIforBrainImgSurv" (https://gitlab.com/matte3000/xai-for-brain-img-surv(accessed on 28 December 2023))-underscores the feasibility of combining explainers with three-dimensional voxels and emphasizes the role of interpretation in supporting result evaluation.This shows another application of XAI in biomedical research and clinical practice in addition to those mentioned in Section 2.2.

Medical Statistics and Data Science
Medical statistics and data science play a vital role in medical research, clinical trials, and healthcare decision making.Given the rapid expansion of biomedical informatics data, the incorporation of sophisticated statistical methods and data science tools has become essential for extracting meaningful insights from intricate datasets and enhancing medical practices.
For instance, the persistent issue of inconsistent result presentation in studies examining the relationship between a quantitative explanatory variable and a quantitative dependent variable has prompted a long-standing concern in the evaluation of reported findings.To address this challenge, Nieminen [18] provided a review to elucidate the procedures for summarizing and synthesizing research outcomes from multivariate models with a quantitative outcome variable.Specifically, the review outlines the application of the standardized regression coefficient as an effect size index in the context of meta-analysis, detailing how it can be estimated and converted from data presented in original research articles.An illustrative synthesis example is provided, focusing on research articles investigating the link between childhood body mass index (BMI) and carotid intima-media thickness (cIMT) in adult life.
Moreover, medical statistics and data science also play a role in the production of innovative implants.Through additive manufacturing, it represents a significant domain within the medical field-particularly, in the context of individualized, serial production.Meeting the demands of personalized healthcare necessitates expedited delivery of implants to healthcare facilities.The comprehensive manufacturing process, encompassing activities such as 3D drawing data generation, imaging techniques, 3D printing, and post-processing, typically spans a week.This duration is notably applicable to high-risk Class III implants (e.g., dental implant) requiring specialized equipment and a validated premarket approval (PMA) process (cf.low-or moderate-risk Class I biomedical device that requires general controls, moderate-or high-risk Class II biomedical device that requires special controls).For instance, Andreucci et al. [19] outlined the development of a biomechanical model for dental implants, from conceptualization and patenting to the creation of a final product ready for additive manufacturing.They also discussed the advantages and constraints associated with using titanium metal printing for dental implant prototypes.

Challenges and Opportunities in Biomedical Informatics
In this section, we discuss some challenges and potential future directions in the field of biomedical informatics.

Computational Biology and Medicine
Despite the widespread excitement surrounding the integration of AI in biomedical informatics, several significant challenges persist.For example, the precision and effective-ness of ML algorithms hinge on the quality and quantity of training datasets, with issues like "overfitting" arising when algorithms memorize noise from restricted datasets.
Additionally, the "opaque-box problem" (also known as the black-box problem) in DL poses a major concern, as it obstructs the understanding of the decision-making process within algorithms, limiting their interpretability.Efforts are being made to develop more XAI models.However, challenges persist-particularly in computer vision, robotics, and natural language processing (NLP) due to data shortages, technical requirements, and linguistic complexities.Beyond technical challenges, social and legal challenges (such as ethical concerns, privacy protection, biases, and accountability) underscore the need for rigorous validation and regulatory approval.Despite the promising convergence of AI and biomedical informatics, the journey ahead can be complex as it is marked by numerous obstacles requiring careful consideration and resolution.

Explainable Artificial Intelligence in Biomedical Research and Clinical Practice
It was observed that exclusively expecting interpretability from statisticians who are involved in medical decision making may be limited.Similar to how AI-specific terms may be perplexing to biomedical experts, biomedical terms and methods can be equally incomprehensible to non-biomedical experts.Despite the fact that the medical environment is regarded as the home of professional field for medical experts, the growing integration of AI necessitates a reciprocal understanding, urging biomedical researchers, medical professionals, and clinical practitioners to familiarize themselves with each other's disciplines.It calls for a shared opportunity for biomedical informatics experts to establish a common language of terms and concepts for mutual discussion, enabling each expert to elucidate their field to the other and collaboratively convey this understanding to the patient.In essence, just as CD19-which stands for Cluster of Differentiation 19-as molecules in a type of white blood cells called B lymphocytes (aka B cells) may be unfamiliar to a computer scientist, SVM-which stands for support vector machine-might be obscure to a biomedical researcher or clinical practitioner, emphasizing the joint effort required for effective communication and knowledge exchange between disciplines.
For some of the tasks involved in biomedical research and clinical practice (e.g., the classification of biomarkers), future opportunities might involve considering the specific types of mutation by incorporating diverse mutations as distinct features.Furthermore, an integration of data from diverse sources could encompass additional subclasses or clinical features, with the inclusion of survival prediction or clustering approaches to gain insights into signaling pathways.Future opportunities may also encompass performance experiments for more detailed insights into requirements and recommendations.Ultimately, leveraging open data, providing an open-source implementation, prioritizing user friendliness, and demonstrating the application of XAI to real scientific problems has the potential to contribute significantly to the realms of biomedical research, clinical practice, and beyond.
In situations where AI decisions significantly impact lives, it is imperative to employ knowledge-based AI.For applications in fields like medicine, where the decisions need to be comprehensible to professionals, the use of AI methods should be confined to systems that are understandable.These systems should offer a causal and logical derivation of decisions from multivariate data, using the terminology and methods of medical decision making, thereby incorporating a formal representation of knowledge which is comprehensible to humans.This perspective aligns with XAI, emphasizing transparency in computational decisions that can be effectively communicated to medical staff and patients.Future opportunities include further enhancements to (i) trustworthiness, (ii) causality, (iii) transferability, (iv) informativeness, (v) trust, (vi) fairness, (vii) accessibility, (viii) interactivity, and (ix) a privacy awareness of XAI.

Machine Learning Methods and Application for Bioinformatics and Healthcare
Recall from Section 2.3 that Egwom et al. [14] used LDA for feature extraction.Future opportunities include an exploration of other feature extraction techniques (e.g., independent component analysis).
Recall from the same section that Hossain et al. [15] demonstrated the use of DL methods in bioinformatics and healthcare applications for detecting dengue.A future opportunity includes further reduction in the implementation cost (e.g., using smartphones instead of computers and digital cameras with a microlens).This reduction would increase practicality, cost effectiveness, accessibility, and adaptability.Another future opportunity is to apply or adapt DL methods in other applications such as detecting Zika and Chikungunya diseases.

Imaging Informatics
Besides applying ML models on X-ray images, it is also imperative to explore the diagnostic performance when applying ML models on MRI and CT (computed tomography) scan modalities, and to conduct relevant experiments to assess their efficacy.Future opportunities also include the employment of parallel computing for data distribution between child nodes to expedite the training process, as well as an incorporation of both data-distributed and model-distributed computing mechanisms for enhanced training acceleration.Network architectures that are more tailored and specifically designed for the analysis of a specific image type (X-ray, MRI, or CT) are also desired.The resulting architectures can be used to investigate SHAP interpretations based on different image modalities.Furthermore, there are opportunities to refine the ML or DL models by integrating heterogeneous data from multiple sources.

Medical Statistics and Data Science
In the area of medical statistics and data science, Riley et al. [20] identified some challenges in the meta-analysis of multivariable findings.Future opportunities include solutions to address these challenges: • Diverse types of effect measures (e.g., correlation coefficients, regression coefficients, risk ratios, odds ratios, and mean differences) that may not be directly comparable.

•
Estimates lacking standard errors, posing an issue as meta-analysis methods typically rely on study weights determined by their standard errors.Moreover, to address the challenge whereby covariates in multiple regression models can be different across studies, a suggested solution is to conduct meta-analyses only on estimates adjusted for at least a predefined core set of established covariates, defined in consultation with experts.Separate meta-analyses can then be conducted for unadjusted and adjusted prognostic effect estimates.
Furthermore, when conducting a meta-analysis, a challenge arises from insufficiently reported data in evaluated articles, hindering the computation of effect size estimates.Articles often lack detailed descriptive statistics and may not provide standard errors for regression coefficients, limiting their use in systematic reviews.The validity and practical utility of observational research depends on good study design, appropriate analysis methods, and high-quality reporting.To address the aforementioned challenges, future opportunities include the development of guidance documents for data presentation and the promotion of a more structured framework in scientific reporting.Examples can be presented through attached tables and figures with descriptive statistics for response and explanatory variables, aiding researchers in summarizing and meta-analyzing effects.With the worry that these challenges may intensify with the application of ML methods (which typically lack interpretable effect sizes for clinicians), there are future opportunities-as mentioned in Section 3.2-to enhance XAI in biomedical research and clinical practice.

Conclusions
In this article, we highlighted some interesting state-of-the-art research in the field of biomedical informatics.Several interesting research outcomes from these areas include: (i) computational biology and medicine, (ii) explainable artificial intelligence (XAI) in biomedical research and clinical practice, (iii) machine learning (including deep learning) methods and application for bioinformatics and healthcare, (iv) imaging informatics, and (v) medical statistics and data science.Moreover, we also discussed some existing challenges and potential future directions for these research areas in the field of biomedical informatics.
adopted ML and feature extraction methods to revolutionize the breast cancer diagnostic process.Their proposed ML model for breast cancer classification employs a support vector machine (SVM) for classification and linear discriminant analysis (LDA) for feature extraction.The model's feature extraction performance was assessed using principal component analysis (PCA) and random forest for classification.Comparative analyses demonstrated the efficacy of the proposed model, involving a computation of missing values based on classifier accuracy, precision, and recall.The study conducted evaluations in computing missing values with the median.Notably, employing LDA-SVM with median-based missing value computation yielded superior results: