Harnessing Artificial Intelligence for Automated Diagnosis

: The evolving role of artificial intelligence (AI) in healthcare can shift the route of automated, supervised and computer-aided diagnostic radiology. An extensive literature review was conducted to consider the potential of designing a fully automated, complete diagnostic platform capable of integrating the current medical imaging technologies. Adjuvant, targeted, non-systematic research was regarded as necessary, especially to the end-user medical expert, for the completeness, understanding and terminological clarity of this discussion article that focuses on giving a representative and inclusive idea of the evolutional strides that have taken place, not including an AI architecture technical evaluation. Recent developments in AI applications for assessing various organ systems, as well as enhancing oncology and histopathology, show significant impact on medical practice. Published research outcomes of AI picture segmentation and classification algorithms exhibit promising accuracy, sensitivity and specificity. Progress in this field has led to the introduction of the concept of explainable AI, which ensures transparency of deep learning architectures, enabling human involvement in clinical decision making, especially in critical healthcare scenarios. Structure and language standardization of medical reports, along with interdisciplinary collaboration between medical and technical experts, are crucial for research coordination. Patient personal data should always be handled with confidentiality and dignity, while ensuring legality in the attribution of responsibility, particularly in view of machines lacking empathy and self-awareness. The results of our literature research demonstrate the strong potential of utilizing AI architectures, mainly convolutional neural networks, in medical imaging diagnostics, even though a complete automated diagnostic platform, enabling full body scanning, has not yet been presented.


Introduction
In recent years, the intersection of AI in medical imaging has ushered healthcare into a transformative era, offering unprecedented possibilities regarding automated diagnostic procedures.Currently, robust research is taking place in the relevant programming domain, with various approaches towards clinical decision support systems (CDSSs), from decision trees (DTs) to artificial neural networks (ANNs) and from support vector machines (SVMs) or naïve Bayes to random forests (RFs) [1].Different levels of supervision during "training" procedures, as well as various layers of transparency and degrees of interpretability, are utilized by AI architectures [2,3].It seems that, although the direction is common, the paths followed by different scientific teams are multiplex.
Since the beginning of the previous century, medical imaging technology has pioneered in revealing living human anatomy and pathology by providing a non-invasive look into the bodily structures.The invention of X-rays by Wilhelm Röntgen in 1895 had a remarkable influence in daily clinical practice, affecting multiple diagnostic procedures ever since.The establishment of digital technology allowed for three-dimensional image reconstruction from a series of illustrative slices, which was enabled by the invention of computerized axial tomography (CT) by Godfrey Hounsfield in 1972.A few years later, its combination with nuclear magnetic resonance scanners led to another milestone, magnetic resonance imaging (MRI), which renders detailed anatomical images whilst avoiding the use of ionizing radiation [4].
As an academic discipline, AI was founded in 1956 and is often attributed to John McCarthy of Dartmouth College [4][5][6].Defined as "the summation of capabilities and operations of machines that mimic or emulate human intelligence", computerized medical imaging procedures which involve recognition, synthesizing and inferring relevant information could be considered as a primitive form of AI.However, automation and technological progress of traditional radiological examination systems do not exhibit the aspects of self-education and decision making in clinical everyday practice yet [1,4].
Machine learning (ML) is a subtype of AI that allows for processing of new information without the need for additional programming and involves creating algorithms that can learn from and make predictions on incoming data [1].Deep learning (DL) is a branch of ML which, in a process called "training", employs large neural networks with functional specification to process data layer by layer and then execute actions to optimize their parameters.In the field of medical imaging tests, this procedure has already been applied to improve diagnostic speed and accuracy [1,5].In this review, we mainly refer to DL procedures or DL algorithms that involve convolutional neural networks (CNNs) to implement automated diagnostics.
The next level of innovation that is about to shift the route of progress appears to be the application of AI technology to the available and continuously generated medical imaging data, targeting a fully automated diagnostic platform with progressively increasing accuracy and sensitivity.Computer-aided diagnosis (CAD) is already attainable, enabling the idea of an "AI-augmented" radiologist [7], promising a reduction in required working time and aiming at a decrease in possible adverse events due to misdiagnosis.Digitalization and computation of imaging data allow for storage in electronic medical records (EMRs) [3], typically registered and saved in digital imaging and communication in medicine (DICOM) format [6,8].Functional interaction with picture archive and communication systems (PACSs), combined with technological advancements in medical image analysis, such as radiomics [1], expand the potentials of the concept.
The foundation of this survey was formed by an extensive literature search in our attempt to present a thorough exploration of the role of AI in medical imaging and to encompass a diverse array of platforms utilized in diagnostic procedures of various organ systems.Our search has led us to a separate reference to the detection and treatment of neoplastic lesions, since this incorporates specific features concerning several medical specialties.Histopathology has also been included in this paper from initial research findings as a paradigm of automated imaging data capturing, segmentation and classification, leading to valid clinical decision making.
From the end-user's point of view, we emphasize the need for transparency and interpretability of assisted diagnoses, describing the role of explainable AI (XAI) in fostering collaboration between AI models and healthcare professionals.We also point out the importance of establishing a typically accepted, standardized language for medical reporting compatible with machine learning platforms.This may exponentially increase the generated usable information, improve interdisciplinary communication and ensure legal compliance.
The aim of this paper is not to evaluate different AI methods of computer-assisted diagnosis from a technical point of view, nor to compare their efficiency, but to navigate the reader through already tested diagnostic platforms and introduce their reported results.In the dynamic landscape of medical imaging, the symbiotic relationship between AI and healthcare professionals emerges as a powerful force, promising improved accuracy, efficiency and pace of action.The next step seems to be the implementation of an integrated diagnostic platform, which will enable full body scanning; thus, we make a few suggestions towards facilitating this transitional state.

Material and Methods
A literature search of all review articles from January 2020 to March 2023 published in English was undertaken using Google Scholar and PubMed-Medline databases by two independent researchers.Search criteria included the terms "artificial intelligence" AND "medical imaging" AND "automated diagnosis" AND "platform" mentioned anywhere in the title, abstract or full text of the articles.We chose the term "medical imaging" over the term "radiology", since we aimed at including applications of AI in evaluated platforms based on digitalized diagnostic imaging data.The initial search came up with 234 unique articles; after excluding the term "COVID" the number of the articles fell to 122.
Subsequently, full-text articles were reviewed to confirm inclusion/exclusion criteria and select them according to their use or not of AI-enhanced medical imaging diagnostic procedures.Papers referring to any kind of application of DL in supervised, assisted or fully automated processing of medical imaging data were chosen, leading to a total of 38 unique articles.An adjuvant search using relative terms was considered necessary to include vital information and important terminology, regardless of the publication date, increasing the number of articles to 54.Our research methodology is depicted in Figure 1.
All relevant articles were classified according to organ system.Additionally, we included two separate sections of evaluation that came up from our initial search, one regarding tumor detection and another one referring to histopathology, as illustrative examples of AI-assisted medical imaging applications.We also incorporated findings about explainability of outcomes as well as standardization of medical report language because of their significance in creative human-machine interaction.
Information 2024, 15, x FOR PEER REVIEW 3 of 18

Material and Methods
A literature search of all review articles from January 2020 to March 2023 published in English was undertaken using Google Scholar and PubMed-Medline databases by two independent researchers.Search criteria included the terms "artificial intelligence" AND "medical imaging" AND "automated diagnosis" AND "platform" mentioned anywhere in the title, abstract or full text of the articles.We chose the term "medical imaging" over the term "radiology", since we aimed at including applications of AI in evaluated platforms based on digitalized diagnostic imaging data.The initial search came up with 234 unique articles; after excluding the term "COVID" the number of the articles fell to 122.
Subsequently, full-text articles were reviewed to confirm inclusion/exclusion criteria and select them according to their use or not of AI-enhanced medical imaging diagnostic procedures.Papers referring to any kind of application of DL in supervised, assisted or fully automated processing of medical imaging data were chosen, leading to a total of 38 unique articles.An adjuvant search using relative terms was considered necessary to include vital information and important terminology, regardless of the publication date, increasing the number of articles to 54.Our research methodology is depicted in Figure 1.
All relevant articles were classified according to organ system.Additionally, we included two separate sections of evaluation that came up from our initial search, one regarding tumor detection and another one referring to histopathology, as illustrative examples of AI-assisted medical imaging applications.We also incorporated findings about explainability of outcomes as well as standardization of medical report language because of their significance in creative human-machine interaction.

Cardiovascular System
The leading causes of mortality on a global scale are disorders related to the cardiovascular system, with the commonest risk factor being coronary atherosclerosis, that is, the formation and deposition of plaques consisting of adipose tissue and calcium on the endothelium of the coronary arteries, causing gradual narrowing of their lumens.Certain algorithms [9] based on recurrent CNNs have been applied to image and component analysis of coronary atherosclerotic plaques in order to facilitate clinical assessment and prognostic evaluation of artery stenosis [9].The rapidly growing amount of relevant imaging data has triggered an interest in automated diagnostic tools [10].

Cardiovascular System
The leading causes of mortality on a global scale are disorders related to the cardiovascular system, with the commonest risk factor being coronary atherosclerosis, that is, the formation and deposition of plaques consisting of adipose tissue and calcium on the endothelium of the coronary arteries, causing gradual narrowing of their lumens.Certain algorithms [9] based on recurrent CNNs have been applied to image and component analysis of coronary atherosclerotic plaques in order to facilitate clinical assessment and prognostic evaluation of artery stenosis [9].The rapidly growing amount of relevant imaging data has triggered an interest in automated diagnostic tools [10].
The structural and functional gold standard for the diagnosis of obstructive coronary artery disease is angiography.Nevertheless, since this is an invasive and demanding procedure that involves surgical risks, coronary (or cardiac) computed tomography angiography (CCTA) has emerged as an alternative option in non-invasive assessment of the lesion, even though there is a probability of overestimating the degree of narrowing of the artery lumen.CCTA can be significantly improved by leveraging automated image processing and federated learning (FL), resulting in a more accurate and rapid evaluation of coronary stenosis, plaque characterization and probability of the ensuing myocardial ischemia [11].In a study by Kang et al. [12], an SVM algorithm for the detection of significant reductions in the diameter of the coronary artery (≥25%) was performed, with outcomes indicating high sensitivity (93%), specificity (95%) and accuracy (94%) (Table 1).
The presence and size of the plaque observed in the CT scan of the coronary artery is referred to as the calcium or Agatston score; coronary artery calcium scoring is a screening tool for patients with low to intermediate risk for coronary artery disease.The data emerging from those diagnostic and prognostic tools are multiplying at a rapid rate, often rendering preventive medical action insufficient or inaccurate.Automated segmentation and quantification of enhanced postprocessing medical images, using DL, combined with algorithms that take into consideration risk factors such as age, gender, class of obesity and smoking habits, are about to become a strong predictor of cardiovascular events [10].
Another measurable parameter which is considered to be related to the risk of myocardial infarction through metabolic and endocrinal paths is epicardial adipose tis-sue, that is, the amount of fat localized between the heart and the pericardium and on the accompanying coronary arteries.Transthoracic echocardiography is routinely preferred in the quantification and assessment of epicardial adipose tissue, but cardiac CT is another useful means of evaluation, presently gaining in popularity.The latter is not a dynamic, human-dependent procedure; thus, it can be assisted by AI algorithms in achieving automated measurements of epicardial adipose tissue so as to increase accuracy and reliability of outcomes whilst reducing the time of clinical examination [13].
An interesting paper by Linardos et al. [14] evaluated different applications of FD into the diagnosis of hypertrophic cardiomyopathy based on datasets of T1-weighted sequences from cardiac MRI (Table 1).A three-dimensional (3D)-CNN model (ResNet18) is adapted to incorporate and utilize prior information regarding myocardial shape, while various data augmentation set-ups are evaluated on the grounds of their impact on the different DL procedures.Each simulation used datasets from four different medical centers, focusing specifically on the diagnosis of hypertrophic cardiomyopathy.The highest scores were achieved by the combination of FL and chosen augmentations, reaching an accuracy of 89% [14].

Pulmonary System
The detection of pulmonary lesions with the use of automated diagnostic tools appears to be a promising field, hosting remarkable research activity.Apart from lung tumor detection and classification by computer-aided technology, two specific respiratory conditions are taken into consideration: an abnormal collection of air in the pleural space be-tween the lung and the chest wall, that can be spontaneous or traumatic, called pneumothorax, and the occurrence of a blood clot that moves through the bloodstream from elsewhere in the body towards a lung artery causing its blockage, called pulmonary embolism.Both pathologies are considered as life-threatening medical emergencies and may be indicative of underlying malignancies, with a high mortality and morbidity rate.Both are subject to various experimental models for automated diagnosis of medical imaging processes.
According to a systematic review by Iqbal et al. [15], the employment of deep learning techniques in automated chest radiograph diagnostic processes of different types of pneumothorax can be considered successful.The analysis concluded that two DL-based models (CheXNet and LinkNet), among those evaluated, were most efficient in the detection of pathologic areas.More specifically, in one platform the reported outcome showed an area-under-curve (AUC) score of 88.87% (Rajpurkar et al., 2017) [16]; in the second platform, the value of Dice similarity coefficient (DSC) was estimated at 88.21% (Groza et al., 2020) [17].Both studies presented encouraging results regarding detection and localization of pneumothorax in chest radiographs (Table 1).
In another article [18], a 2D segmentation model of the U-Net-based architecture classification algorithm was developed and tested targeting the diagnosis of pulmonary embolism.The researchers used a clinical dataset containing 251 computerized tomography pulmonary angiograms (CTPAs).Setting a calibration that aimed at minimizing false negative rates, they calculated that blood clots in the main pulmonary and segmental arteries could be detected with 93% sensitivity and 89% specificity (Pranav Ajmeral et al., 2022) [18] (Table 1).Another study that utilized CAD for the detection of pulmonary embolism by means of multiaxial segmentation [19] indicated an obvious superiority of the 2.5D training method over the 2D network architectures but no clear difference between the 2.5D and 3D method.

Brain Pathology
Another medical field where AI implementation exhibits a growing impact is the automation of diagnostic brain pathology.In this context, multiple sclerosis is an autoimmune chronic demyelinating disease of the central nervous system characterized by inflammation and neurodegeneration.Early detection of the lesions is crucial for prompt therapeutic interventions, which will mitigate the complications that negatively affect the patient's quality of life.Following clinical examination and identification of the symptoms, the diagnosis is usually confirmed with MRI scans.In one study, using brain MRI slides, a CNN based on 3D convolutional layers was trained to distinguish between multiple sclerosis and its imitators with 98.8% accuracy [20].In another study, DL algorithms (Adam, SGS and RMSDrop) identified multiple sclerosis with 99.05% specificity, 95.45% precision and 99.14% sensitivity [21] (Table 1).In most cases of potential multiple sclerosis, the diagnostic procedure requires acquisition of medical history, performing blood tests and spinal fluid analysis.Using extra trees (ET) models, the relevant clinical data can be effectively processed to allow for disease identification with 94.74% accuracy [22].
Intracranial tumors are detected in MRI scans following suspicious symptoms, such as headache, dizziness or vertigo.The initial findings may shed some light on the type and prognosis of the disease, but biopsy is often deemed essential for the final classification, while medical imaging examinations are routinely performed for subsequent staging and follow-up.The rapid development of segmentation and radiomics as parts of imageprocessing and interpretation algorithms have shown potential advantages in recognition of complex patterns.Additionally, radiogenomics has improved our understanding of the molecular biology and behavior of brain lesions in response to conventional treatments.In a diagnostic study of 37,871 patients from four hospitals, the mean metrics of a DL-assisted evaluation of 18 types of brain tumors was 72.7% to 88.9% for sensitivity and 84.9% to 96.8% for specificity, which proved significantly higher than the performance of experienced neuroradiologists [23] (Table 1).Furthermore, pretrained models and fully CNNs that function as automated classifiers may turn out to be useful for reducing the need to resize and crop an image or normalize its pixel values [24].
Another domain where AI is already in use is ischemic stroke imaging.Middle and posterior cerebral artery infarcts are often evaluated with the use of a 10-point quantitative topographic CT scale, named Alberta Stroke Program Early CT (ASPECT) scoring.Notably, AI algorithms (CNN, SVM, VGG-16, GoogleNet and ResNet-50) have displayed acceptable results in automated ischemia evaluation, infarct segmentation, prognosis prediction, mid-dle central artery density detection, patient selection for intervention and prognostication, with 45-98% reported sensitivity and 57-95% specificity in commercially available DL platforms [25] (Table 1).

Musculoskeletal
Hip disorders constitute a common health problem usually involving osteoarthritis (OA), a degenerative decease of the articular surface, or hip fractures usually caused by falls in the elderly.The gold standard in diagnosing both medical conditions is the plain anteroposterior hip X-ray [26,27].A deep CNN (VGG-16 layer) was trained and tested on 420 hip X-ray images to automatically diagnose hip OA.This CNN model achieved a balance between high sensitivity (95.0%) and specificity (90.7%), as well as an accuracy of 92.8%, a performance equivalent to that of physicians with 10 years of experience [26].In other research, a DL network (Faster-RCNN) was applied to detect intertrochanteric hip fractures on X-rays.Compared to orthopedic attending physicians, the algorithm performed better in accuracy (88% vs. 84% ± 4%), specificity (87% vs. 71% ± 8%) and time consumption (5 min vs. 18.20 ± 1.92 min), while there was no statistically significant difference in sensitivity (89% vs. 87% ± 3%) and missed diagnosis rate (11% vs. 13% ± 3%) [27].
Similarly, fracture identification and classification models were developed [28] to evaluate the performance of a deep CNN (ResNet18 DCNN) in detecting and classifying distal radius fractures in 15,775 labeled radiographs.In two independent tests, the results were impressive for fracture detection, reaching 97.5% and 98.1% accuracy, but not sufficient regarding classification tasks: fragment displacement, 58.9% and 73.6%; joint involvement, 61.8% and 65.4% and multiple fragments, 84.2% and 85.1%, respectively.This outcome suggests that the models can be utilized as secondary reading tools to detect distal radius fractures, however, they cannot be trusted for fracture classification in clinical practice yet.
Table 1 shows indicative reported data, concerning the accuracy of AI algorithms for medical imaging automated diagnosis, in the aforementioned organ systems.It is easily observed that outcomes indicate an impressive performance regarding sensitivity and specificity of DL architectures, which are mostly represented by CNNs.

AI Algorithms for Tumor Detection in Oncology
In our exploration of the available literature on proposed or established AI diagnostic platforms, oncology emerged as a separate entity due to the particular nature of tumor detection and prognostication.The application of AI for enhancing medical image tumor analysis and visualization can benefit several medical specialties involved in the management of neoplastic diseases, such as prostate and bladder cancer, osteosarcoma, skin cancer, lung cancer or breast cancer.Consequently, different imaging modalities like mammography, CT, MRI and ultrasound are instrumental in localization and staging, evaluation and treatment planning [29].The comprehensive outcomes of our search in this domain are demonstrated in Table 2.
Despite high radiation rates, a CT scan is regarded as the test of choice for detailed bone tissue consideration because it provides considerable soft-tissue-signal contrast.Accurate AI-based bone segmentation tasks have been successfully utilized for the detection and classification of osteosarcomas in CT slides [29].M. Santin et al. assessed the capacity of a DL algorithm to recognize abnormal laryngeal cartilage, achieving a maximum sensitivity rate of 83% and specificity rate of 64% [30].Automatic segmentation procedures appear to be promising for the quantification of skeletal tumor burden during cancer staging [31].
Low back pain is one of the most common symptoms in medical practice.A thoracic or lumbar spine MRI plays a pivotal role in its investigation because of the image clarity it ensures, compared to CT, as well as the lack of harmful radiation.A combination of DL algorithms with radiomics in dynamic contrast-enhanced MRI of the spine has been used for differentiating between lung cancer and non-lung-cancer bone metastases, reporting 75% sensitivity and 83% specificity for the most accurate of the tested networks (CLSTM) [31].Another research team has studied the assisted segmentation of routine MRI scans, for automatic classification of primary benign and malignant bone tumors, with an accuracy similar to that of radiology experts (automated, 76% vs. human, 73%) [32] (Table 2).
When treating a known tumor with skeletal metastatic potential, it is often a part of the staging protocol to carry out a whole-body bone scintigraphy to detect new lesions and monitor the activity of known ones.Bone scan index is a scoring interpretation system that estimates the burden of bone metastasis by visually assessing the affected percentage of each bone.DL techniques have reached satisfactory success rates when applied to nuclear medicine imaging [33].A CAD system, based on image processing and CNNs, has been studied for the interpretation of bone scintigraphy to determine the presence or absence of bone metastases and has demonstrated 90% sensitivity and 89% specificity [34].
In the field of neurosurgery, the collaboration of an interdisciplinary team of computer scientists with medical experts led to the creation of a trainable CNN that used only endomicroscopy tissue images as data.The applied network displayed the ability to detect and pinpoint gliomas (benign or malignant endocranial or spinal tumors) [35] (Table 2).The classification dataset included 6287 microscope images from 20 patients.Applying a visual relevance localization method to a multihead network resulted in the formation of diagnostic maps that were merged with prototypes inspired by biology models.The outcome was impressive, since not only were the established diagnostic criteria correctly identified, exhibiting a mean accuracy of 87.5%, but novel findings were also revealed to radiologists [35].
The application of DL in diagnosing colorectal cancer through laboratory imaging seems promising [36].Intelligent diagnosis systems play a crucial role in providing visual cues associated with potential pathology, determining disease location and distinguishing benign from malignant findings.Using the regression NN-augmented Lagrangian genetic algorithm (RNN-ALGA) for image segmentation of computerized tomography colonography (CTC) slices, 97% accuracy when identifying colonic diseases is feasible [37].In another paper [38], the use of three-dimensional (3D) fully CNNs combined with 3D level-set showed a higher sensitivity than 3D fully CNNs alone in the segmentation of colorectal cancer on MRI [38] (Table 2).
DL strategies have been used for the distinction between benign or malignant primary bone lesions, with accuracy similar to that of subspecialist radiologists.In a study by Yu He et al., the AI model achieved 77.7% sensitivity and 89.6% specificity in distinguishing between malignant and non-malignant bone tumors, as well as 82.7% sensitivity and 81.8% specificity in distinguishing between benign and non-benign lesions, in conventional radiographs.The accuracy in cross-validation from radiologists was slightly lower [39].Similarly, cystic and lytic mandibular lucent lesions have been comparatively analyzed in DL-based studies of panoramic radiographs, with approximately 88% sensitivity [40].Table 2 indicates the previously presented outcomes concerning AI-augmented tumor detection.

Computer-Aided Image Recognition in Histopathology
Although histologic investigation of tissues removed from the body for diagnostic purposes is not directly related to radiology, it may function as the archetype of a process fundamentally based on recognition and evaluation of medical imaging data.In this context, all relevant information can be digitalized and presented to pattern recognition algorithms for automated segmentation and classification, thus representing the "third revolution" in this specific medical field.The products of such proceedings could improve diagnostic speed, reliability and accuracy of the pathologist.
Among the various AI strategies that could be employed in that direction, CNNs emerge as the preferred method for analyzing pathology images due to their end-to-end learning potential, flexibility and their ability to fit a variety of functions [36].The development of CAD systems capable of quantitatively defining important histopathological features is a key driver in advancing digital pathology.Segmentation of tumor regions in hematoxylin-eosin-stained images comprises a first critical step, followed by detecting and classifying nuclei in cancer tissue, which is challenging due to the complexity and heterogeneity of malignant cells [41].The introduction of persistent homology maps (PHMs) seems to be effective in pinpointing abnormal areas.For example, topological data analysis of glands and their nuclei is instrumental in qualitative studies of adenocarcinomas [42].
There are two types of computer vision applications that seem to be employed in automated biopsy: image recognition, which involves identifying objects, patterns or features within an image; and image generation, which creates new images, often with specific attributes or characteristics, based on training data.A DL algorithm for image recognition, AlexNet, introduced in 2012, could predict types of brain tumor histology images with 99.04% accuracy in classification tasks [29].On the other hand, generative adversarial networks (GANs) and variational autoencoders (VAEs), two pivotal architectures in the field of generative modeling, achieved 99% accuracy in multiclass (normal/primary/metastatic) tumor classification procedures [29].
Automated analyses of histopathology slides that involve material from bone marrow aspirates collected from patients with lymphoproliferative disorders (e.g., myelodysplastic syndrome, acute lymphoblastic leukemia) have been the object of numerous studies to assess AI detection and classification protocols [33].In osteosarcomas, the distinction between viable and necrotic lesion tissue has been systematically examined [43,44].A DL model of spectroscopy applied to bone biopsy samples was able to predict treatment outcomes in Ewing sarcoma [45].
The recurrence rate of prostate cancer after initial therapy could be accurately predicted in 13,188 digitized whole-mount histological slides by use of an architecture model consisting of two DL autoencoders [46].The prediction accuracy using algorithm-generated features was compared with that of expert pathologists who used established diagnostic criteria (Gleason score).The algorithm was able to calculate an impact score by identifying not only the established lesion sites but also specific regions or features that had been overlooked by the human.The combined AI model and human-determined criteria led to a more accurate recurrence rate prediction than the use of either method alone.
Another branch of pathology is cytology, that focuses on the microscopic examination of cells instead of tissues for differential diagnostic purposes.AI algorithms, including neural network-based whole-slide image (WSI) classification models, have been utilized to achieve high specificity and sensitivity rates in slide interpretation, while reducing the demanded microscopy time [47,48].In a study by Hui Du et al., continuous array scanning technology (slide scanner RQ1000, Ruiqian Co. Ltd., Suzhou, China) was applied to cytology samples to generate multidepth images for cervical cancer screening.After two consecutive cell identification pipelines, with the implementation of Yolov5 and ResNet architectures, respectively, slides negative for suspicious lesions were excluded, decreasing diagnostic duration from 3 min to 30 s for each slide, without compromising sensitivity of detection [48].

Explainable AI (XAI)
The healthcare professional as an end-user of automated diagnostic tools should be familiarized with the paths followed in processes of automatic medical image interpretation.Since predictions made by AI architectures can influence treatment recommendations, there are safety implications, due to which human supervision and contribution in decision making ought to be unhindered.This purpose is served by a branch of AI known as "explainable" or "interpretable" AI (XAI), which aims at ensuring a high level of awareness of the reasoning and rationale of outcomes.Explainability refers mainly to the function of AI architectures, whereas interpretability concerns the ability of the utilized parameters to justify the results [49].In medical practice, the human-in-the-loop can act as a "safety valve" that enhances the reliability of the AI image classification or segmentation process and, by consequence, the trust in its suggestions or predictions.
The use of CAD may be expanded through a more collaborative interaction between technology and human expertise.Explanatory AI models are designed to provide insights into the decision-making process of AI algorithms, allowing clinicians to interpret and evaluate their function [50].Opaque, "black box", models discourage communication with the end-user and yield results that are difficult to assess, while XAI helps comprehending or interpreting the judgments made, supporting clinical decisions more effectively.Currently, effort is being made to enrich DL techniques with applications that make their operation more visible and reduce insecurity and uncertainty [22].
As AI tends to become a part of daily medical practice, special training of medical professionals or continuous cooperation with specialized technicians seems to be required.Radiologists cannot rely solely on the transparency of a relevant CDSS to be made aware of how and why a DL algorithm reached a specific outcome.They need to be able to follow the procedure and evaluate or even intervene, to confirm or challenge each step whenever required, without sacrificing the speed and accuracy of AI performance [51].The information fidelity criterion is another essential metric that allows for evaluating the reliability of the interpretable model by measuring the degree to which the AI imageprocessing algorithm approximates the "black box" model predictions [49].
As investigation expands, different architectures of XAI are utilized, continuously improving the performances in this domain.Although the analysis of these techniques falls outside the scope of this paper, typical examples would be the explainable expert system (EES) framework, class activation mapping (CAM), contextual importance and utility (CIU), layer-wise relevance propagation (LRP), uniform manifold approximation and projection (UMAP), local interpretable model-agnostic explanation (LIME), SHapley Additive exPlanations (SHAP) or ANCHOR [49].
XAI could make an impact on the division and performance of responsibility, as well as the legal and ethical extensions of it.It is obvious that hardware and software companies would not accept accusations of malpractice in sensitive, consequential decisionmaking cases involving public health.The "black box" architectures, in which the models are opaque, create a feeling of uncertainty.On the contrary, "glass box" models reveal the parameters of diagnostic procedures and explain the predictions, encouraging trust between patients and health systems.
The philosophy of XAI is schematically represented in Figure 2. The initial processing of input medical imaging data (e.g., anteroposterior view of a standard hip X-ray) by an "opaque" DL algorithm implements a classification procedure to predict a diagnosis (e.g., "right hip intertrochanteric fracture").The end-user clinician may experience doubt for the unjustified prediction, therefore the integration of XAI comes to play a significant role.Simultaneously, another AI architecture points out unexpected imaging findings (e.g., established criteria for femoral intertrochanteric fracture classification, depicted as green curves and red dots) to restore confidence in informed decision making (e.g., operative treatment).As investigation expands, different architectures of XAI are utilized, continuously improving the performances in this domain.Although the analysis of these techniques falls outside the scope of this paper, typical examples would be the explainable expert system (EES) framework, class activation mapping (CAM), contextual importance and utility (CIU), layer-wise relevance propagation (LRP), uniform manifold approximation and projection (UMAP), local interpretable model-agnostic explanation (LIME), SHapley Additive exPlanations (SHAP) or ANCHOR [49].
XAI could make an impact on the division and performance of responsibility, as well as the legal and ethical extensions of it.It is obvious that hardware and software companies would not accept accusations of malpractice in sensitive, consequential decision-making cases involving public health.The "black box" architectures, in which the models are opaque, create a feeling of uncertainty.On the contrary, "glass box" models reveal the parameters of diagnostic procedures and explain the predictions, encouraging trust between patients and health systems.
The philosophy of XAI is schematically represented in Figure 2. The initial processing of input medical imaging data (e.g., anteroposterior view of a standard hip X-ray) by an "opaque" DL algorithm implements a classification procedure to predict a diagnosis (e.g., "right hip intertrochanteric fracture").The end-user clinician may experience doubt for the unjustified prediction, therefore the integration of XAI comes to play a significant role.Simultaneously, another AI architecture points out unexpected imaging findings (e.g., established criteria for femoral intertrochanteric fracture classification, depicted as green curves and red dots) to restore confidence in informed decision making (e.g., operative treatment).

Standardization of Multimodal (Image and Text) Medical Reports
Ideally, a fully automated diagnostic procedure would be complete with a comprehensive, accurate and logically structured medical report.The already existing natural language processing (NLP) models proved capable of creating meaningful sentences by extracting information based on given datasets.Text and image/text analysis was discussed in some of the articles as a means of enabling a two-way interaction between the human-in-the-loop radiologist and the deep learning neural network [2,3,50].
Among file formats for recording medical text and images, DICOM constitutes the international standard, covering all medical imaging modalities and applications, being

Standardization of Multimodal (Image and Text) Medical Reports
Ideally, a fully automated diagnostic procedure would be complete with a comprehensive, accurate and logically structured medical report.The already existing natural language processing (NLP) models proved capable of creating meaningful sentences by extracting information based on given datasets.Text and image/text analysis was discussed in some of the articles as a means of enabling a two-way interaction between the human-in-the-loop radiologist and the deep learning neural network [2,3,50].
Among file formats for recording medical text and images, DICOM constitutes the international standard, covering all medical imaging modalities and applications, being supported by most vendors of radiological systems and exhibiting remarkable compatibility with standard, established EMRs as well as interventional radiology [6,7].This format allows detached files to contain, apart from images, textual information, often referred to as "metadata", such as patient and clinician identity, hospital, applied treatment and imaging technique [6].The adoption of DICOM for WSI in digital and computational pathology has demonstrated its feasibility concerning storage, security and processing continuity, although its progress appears rather prolonged due to the vast amount of processed data [8].
In the fields of radiology and histopathology, it is medical experts that compose de-tailed documents describing notable findings, supported by textual descriptions of images that render the reports inherently explanatory.AI-based systems can be designed to process these multimodal data and generate similar accounts.Generating long and coherent medical reports is a significant challenge in textual explainability.This is partly due to the difficulty of handling long temporal dependencies among words, which is a limitation of models like long short-term memory (LSTM).Transformer models, on the other hand, are more effective in capturing relationships between words in longer sentences [2].To ensure that the generated text will be coherent and clinically meaningful, the size of the medical reports as well as the terminology used must be compatible with computerized models.
It is necessary to retain confidentiality of personal health data in a reliable, legally compliant and primarily dignified manner.Image de-identification tools are utilized to remove or substitute all patient identifiers such as name, address and hospital identification number in a process described as "anonymization" [6].The use of identity codes such as personal health number instead of names is already a common practice that assists health professionals in maintaining medical confidentiality.Electronic health records are a valuable and convenient source of data, suitable for machine learning, as they contain numerous images and their corresponding standardized reports [3].
Natural language processing of bone scintigraphy radiology reports has already been used to quantify bone metastases as an example of established use of a standardized diagnostic presentation [33].Moreover, musculoskeletal MRI protocols have been automatically classified by decision tree algorithms, and natural language processing has been utilized in free-text bone scintigraphy reports for automated quantification of bone metastases [52].Moreover, the DICOM Image ANalysis and Archive (DIANA) interface has been used to improve access to imaging repository data, as it interacts with existing hospital infrastructures, such as PACS.In a comparative retrospective and prospective study by Thomas Yi et al., the functionality of this workflow for data retrieval was evaluated in AI-driven bone age estimation and intracranial hemorrhage detection [53].

Discussion
Current literature confirms that AI offers the potential to further enhance medical imaging diagnostic procedures by utilizing DL algorithms with increasing sensitivity and specificity.Remarkable strides in the detection and staging of various diseases render the goal of establishing an automated diagnostic device capable of non-invasive, full-body scanning and identifying underlying pathologies more realistic than ever.The results in Tables 1 and 2, presenting the reviewed platforms and algorithms, in most cases show accuracy comparable or superior to that of inexperienced physicians.
Among the various AI architectures, CNNs seem to gather favorable characteristics for image processing, as they are designed to process information through multiple layers and exploit natural image properties, such as local inconsistency, to use big datasets for supervised or unsupervised training.The supervised models are accurate in medical clas-sification, as the models are fed with classified medical data, including independent and dependent variables, and are trained to predict new dependent ones.Purely unsupervised models are less complex than the supervised ones, as they do not need prelabeled data, but exhibit lower discriminative skills [25].Most medical studies of our search utilized pretrained algorithms, (e.g., ResNet, U-Net, GoogleNet, AlexNet, CheXNet, VGG16, VGG19, etc.) for supervised learning procedures.
Regarding the choice of ideal imaging method for the considered concept, increased dose-related radiation in CT and PET scans causes them to be unsuitable for routinely executed medical examination screening.Ultrasound is a dynamic test, influenced by the skills and subjectivity of the radiologist, that may be supported by AI procedures only if the process takes place in real time, "in vivo" [4].MRI technology, on the other hand, is safe, exhibits remarkable sensitivity and efficiency and produces a storable data format, compatible with automated diagnosis [4,5].Some drawbacks are that it is time consuming, not particularly cost-effective and rather annoying to claustrophobic patients.
Before unconditionally accepting the dynamic entrance of AI in healthcare, there are certain procedural, legalistic and ethical issues that need to be addressed.Transparency and reliability are of paramount importance in CDSSs, reinforced by a certain degree of explainability in results, to inspire trust and accountability of the automated processes [49].Since current ML and DL models have not yet fully assimilated the aspects of self-education and decision making in clinical practice, the quantity of information does not guarantee the quality of the outcome.Therefore, XAI seems to be essential for effective interplay among medical staff and patients as end-users of AI technologies.

Limitations of our Study
The perspective of this paper in automated diagnostics has been to impart an aggregate description of the current developments in medical practice, encouraging interdisciplinary communication and cooperation between computer scientists and medical doctors.A limitation of our study is that we do not technically evaluate or compare different diagnostic algorithms, but we present their reported results.A comparative evaluation would require homogeneity in protocols used for different studies, concerning amount, quality and proportion of data used for training and testing CNNs, to achieve coherence at level of evidence.
Our presentation did not try to include all existing automated diagnostic algorithms or the summing up of applications to organ systems but a representative subset of them.The protocol used for our search was not very restrictive methodologically, attempting to widen the borders of our overview, and allowed for additional data, according to our findings.We did not come across similar reviews, exploring the potential of a complete diagnostic platform, to make comparisons.

Limitations Encountered in Overviewed Studies
A common weakness of the result evaluation in many studies is that the clinical assessments of experienced radiologists was considered the "gold standard" in classification of patients in the training group, according to the presence or absence of a specific pathologic finding.The same experienced radiologists evaluated the patients of the testing group, and the comparison was usually made between AI algorithms and inexperienced radiologists.Only in studies with histologic confirmation of the diagnosis, as well as in retrospective studies, could a specificity of 100% in training groups be achieved.
We also observed an absence of scientific justification concerning the criteria used for the selection of image processing DL algorithms (Tables 1 and 2).The chosen diagnostic algorithms included commercially available software and were trained to cover a limited diagnostic range in each study.Diverse imaging findings in different medical conditions exhibit different degrees of rarity in prevalence and difficulty in detection.Therefore, the attempt to make a comparison of accuracy and efficacy among studies would be unsubstantial.
Various cross-validation techniques, such as K-fold cross-validation (KFCV), are utilized for the evaluation of DL models so that the resulting estimate is more reliable.In general, 70% of the available data are used for training, 15% for validation and the remaining 15% for testing the model, although the percentage allocations can vary [54].In the presented studies the proportion of the used data ranged significantly from 75:25 and 80:20 to 90:10 or less for learning and testing, respectively [14,16,17,21,23].The main reason for that was accessibility in already evaluated medical records and availability of new cases for testing.This could lead in statistical bias during algorithm performance comparison.

Suggestions for Further Investigation
Establishment of certain platforms and AI techniques, which have piqued the interest of researchers, requires extended clinical comparative trials, similar to those considered necessary for the acceptance of pioneering or experimental therapies.Most novel treatments, apart from scientific relevance, require specific testing phases before application to humans.It would be a safety measure, in our opinion, to enact defined standards for AI diagnostic algorithms testing before applying them to everyday medical practice.
Training data that come from multicenter and multiethnicity sources and are age and gender randomized may be associated with more generalizable, robust and reliable outcomes.The evaluation of sensitivity and specificity of outcomes must be objective and free of bias.Additional statistical documentation of performance can define the level of accuracy.This could be an interesting research direction.
Although technical evaluation of the presented automated diagnostic procedures is not among the targets of this review, there is a justified concern about the quality of the images used for training DL algorithms.Segmentation of the images, as well as density, signal intensity, matrix size in pixels and other quality factors, is discussed in some papers [25][26][27][28].The proposed systems are usually trained only with axial MRI slices.Therefore, the utilization of different MRI views in training and prognostic procedures may be associated with improved discriminative efficiency of the system.
While challenges for progress remain, holding the promise of safer and more precise diagnostic imaging methods, cooperation among scientists appears questionable, indicating the need for establishing a generally accepted coordination institution.It would be useful to adopt a standardized, generally accepted medical report "language", a globally acceptable terminology, as well as a formal structural format.In this way the results would be meaningful, comprehensive, "educational" and beneficial for both healthcare professionals and AI algorithms.
Lastly, it seems necessary to define the legal framework of automated diagnostics in order to clarify the borders of human and AI "responsibility" in decision making.Retaining medical confidentiality, as well as ensuring patient's dignity, is fundamental, without compromising the availability of input data for DL algorithms' "training".

Conclusions
There is no doubt that AI is about to change the status of medical diagnostic practicing in the direction of augmented speed and accuracy.The alteration is already in progress in a gradually increasing rhythm.Current literature, concerning sophisticated applications of DL in healthcare, is very extensive and numerous platforms, protocols, techniques, algorithms and networks are available.The amount of ongoing research promises a near future where automated diagnostics would be an integral part of the medical landscape.
It is strongly recommended to encourage an efficient, productive interdisciplinary communication and cooperation, since the undoubtable effort that takes place seems, for the time being, fragmentary.It is also important to strictly define the border line between machine and human provenance.The sensitive nature of healthcare requires prioritization, empathy and legality, as well as overcoming the existing technical limitations.Further research may lead to integrated diagnostic AI algorithms, covering every known radiologically detectable pathology, which when applied to full-body medical imaging tests would fulfil the concept of a complete, automated non-invasive diagnostic platform.

Figure 1 .
Figure 1.Methodology of article research.

Figure 1 .
Figure 1.Methodology of article research.

Figure 2 .
Figure 2. Schematic representation of Explainable AI in automated diagnosis.

Figure 2 .
Figure 2. Schematic representation of Explainable AI in automated diagnosis.

Table 1 .
Indicative Applications of DL architectures in medical imaging diagnosis.
* Sensitivity (true positive rate) is the probability of a positive test result in a truly positive sample.** Specificity (true negative rate) is the probability of a negative test result in a truly negative sample.

Table 2 .
Indicative Applications of DL image recognition algorithms in Oncology.