Document Difﬁculty Aspects for Medical Practitioners: Enhancing Information Retrieval in Personalized Search Engines

: Timely and relevant information enables clinicians to make informed decisions about patient care outcomes. However, discovering related and understandable information from the vast medical literature is challenging. To address this problem, we aim to enable the development of search engines that meet the needs of medical practitioners by incorporating text difﬁculty features. We collected a dataset of 209 scientiﬁc research abstracts from different medical ﬁelds, available in both English and German. To determine the difﬁculty aspects of readability and technical level of each abstract, 216 medical experts annotated the dataset. We used a pre-trained BERT model, ﬁne-tuned to our dataset, to develop a regression model predicting those difﬁculty features of abstracts. To highlight the strength of this approach, the model was compared to readability formulas currently in use. Analysis of the dataset revealed that German abstracts are more technically complex and less readable than their English counterparts. Our baseline model showed greater efﬁcacy than current readability formulas in predicting domain-speciﬁc readability aspects. Conclusion: Incorporating these text difﬁculty aspects into the search engine will provide healthcare professionals with reliable and efﬁcient information retrieval tools. Additionally, the dataset can serve as a starting point for future research.


Introduction
Search engines play a crucial role in facilitating information retrieval (IR) and supporting users' information-seeking tasks across various domains.However, the field of healthcare poses unique challenges for medical practitioners when it comes to accessing timely and comprehensible search results as the information they acquire can significantly impact their decision making processes and ultimately influence patient care outcomes.Despite the widespread availability of search engines, medical practitioners often face difficulties in finding the precise and contextually relevant information they need within the constraints of their demanding schedules.These challenges arise from the specialized nature of medical knowledge, the vast amount of scientific literature available, and the requirement for accurate and easily understandable information.
Research conducted by Entin and Klare [1] has revealed the significant influence of factors such as a reader's level of interest, prior knowledge, and the readability of a text on their comprehension of the material.It is essential to clarify that ease of reading primarily pertains to text understandability, while technicality is more focused on the concepts and domain-specific knowledge within the text [2][3][4].This understanding underscores the need to address the readability and technicality aspects in the development of search engines tailored specifically to the needs of medical practitioners.To improve IR for healthcare professionals, it is necessary to incorporate these aspects into search engines.By retrieving comprehensible information based on their language proficiency and domain knowledge, these search engines can enhance the efficiency and effectiveness of the information provided, leading to informed decisions and optimal care.Similarly, Ref. [5] proposed to use the readability aspect for accepting or revising health-related documents.
While previous studies [6][7][8] have focused primarily on developing personalized search engines for health information consumers, such as laypeople and patients, there is a clear gap in adapting to the specific requirements of medical practitioners.Unlike laypeople, medical practitioners possess specialized expertise and language proficiency in their respective fields.Therefore, our research project aims to contribute to filling this gap by developing a model capable of extracting and classifying the ease of reading and technicality levels of medical research articles.
It is important to note that ease of reading and technicality are not mutually exclusive aspects [3].Texts can exhibit varying degrees of both characteristics, leading to different combinations that may arise between these aspects.For instance, a text can be easy to read while still containing high technicality, or it can be difficult to read with low technicality, as shown in the following examples.(All examples have been reviewed and approved by a senior medical practitioner.Examples of easy-to-read content are from [9], while examples of harder-to-read content are from [10]).

Easy to read and high technicality
Autologous hematopoietic stem cell transplantation has emerged as a promising therapeutic intervention for individuals with refractory multiple myeloma.This treatment approach has shown remarkable advancements in terms of progression-free survival and overall response rates, signifying its potential in improving patient outcomes.

• Hard to read and high technicality
The pathophysiological mechanisms underlying idiopathic pulmonary fibrosis involve aberrant activation of transforming growth factor-beta signaling pathways, leading to excessive deposition of extracellular matrix components and subsequent progressive scarring of lung tissue.

• Easy to read and low technicality
Regular physical exercise has been widely recognized as a key lifestyle intervention for the prevention of cardiovascular diseases, with numerous studies demonstrating its positive impact on reducing the risk of heart attacks, stroke, and hypertension.

• Hard to read and low technicality
Carcinogenesis is a multifactorial process characterized by the dysregulation of cellular homeostasis, involving intricate interactions between oncogenes and tumor suppressor genes that disrupt normal cell growth control mechanisms, resulting in uncontrolled proliferation and the formation of malignant tumors.
Integrating difficulty aspects into search engines empowers medical practitioners with tailored and relevant search results that are aligned with their expertise and language proficiency.Without these aspects, search engines may fail to effectively address the specific needs of medical professionals, leading to limitations and challenges.For example, search results may include highly technical papers and articles with varying levels of readability, requiring manual sifting and wasting valuable time.The lack of customization based on technicality and ease of reading hinders precision, relevance, and quick access to necessary information.Furthermore, complex language and dense scientific jargon can impede comprehension for practitioners without specialized expertise, hindering decision making [11].By considering difficulty aspects, search engines not only improve information accessibility and ensure comprehensibility but also optimize decision making and patient care.Similarly, studies [12,13] presented other features that could potentially enhance personalization in medical search engines.This enhancement in IR empowers medical professionals by making it easier for them to identify their target audience, thus facilitating the efficient utilization of their valuable time and expertise [14].
Readability formulas have already been developed for measuring the readability of a given text.However, the most commonly used readability formulas were not developed for technical materials [15].Moreover, traditional readability formulas are oversimplified to deal with technical materials [16].Therefore, to accomplish our objective, we leverage the power of pre-trained language models and fine-tuning techniques for predicting the difficulty aspects of a given document.
An intriguing application of our research lies in the integration of these assessed aspects into the IR process.This can be achieved by either filtering out search results that exceed a certain threshold of technicality or ease of reading, ensuring that the retrieved documents align with the user's preferred level of comprehension.Additionally, these aspects can contribute to the calculation of relevance scores [17], allowing documents that match the desired technicality and readability criteria to be ranked higher in search results.This integration has the potential to enhance the efficiency and precision of IR for medical professionals, aiding them in accessing documents that align with their specific requirements and facilitating informed decision making.
However, we encountered a challenge in finding datasets specifically designed for medical practitioners, as most existing datasets target laypeople.Therefore, we compiled a dataset of medical abstracts from PubMed (PubMed is an extensive online database that grants users access to a diverse range of scientific literature in the field of biomedical and life sciences.It serves as a valuable resource for researchers, medical practitioners, and individuals seeking in-depth scholarly articles, abstracts, and citations pertaining to diverse medical disciplines, https://pubmed.ncbi.nlm.nih.gov/,accessed on 1 June 2023) and sought the expertise of medical doctors and medical students to annotate each article with ease of reading and technicality scores.
Through our research, we aim to demonstrate the ability of language models to capture and classify the ease of reading and technicality levels of medical documents.Furthermore, we investigate the differences in language complexity and technicality between English and German abstracts, finding German abstracts tend to be harder to read and exhibit higher levels of technicality when compared to their English counterparts.
To sum up, our research article makes contributions in the following areas: (a) we address a problem concerning the trade-off between comprehensibility and relevance within the field of IR for medical practitioners; (b) we are presenting a new dataset containing medical research articles annotated with ease of reading and technicality scores; (c) we are developing models capturing these aspects using pre-trained language models and comparing them with known readability formulas.
By addressing the specific needs of medical practitioners and integrating the difficulty aspects into search engines, we strive to enhance the accessibility and relevance of search results, ultimately empowering medical professionals with efficient and reliable IR tools.

Literature Review 2.1. Comprehensibility Aspects in Information Retrieval
The consideration of comprehension aspects in IR, particularly in domain-specific contexts, has received significant attention.Researchers have recognized the challenges faced by both domain experts and average users when searching for domain-specific information, such as medical and health-related content, from online resources [18].
A common issue encountered by users in IR systems is the presence of search results that encompass documents with varying levels of readability [16,19,20].This poses a challenge, particularly for users with limited domain knowledge or lower education levels, as well as those facing physical, psychological, or emotional stress [16].Consequently, there is a need for IR systems that not only retrieve relevant documents but also prioritize those with higher readability, adapting to the diverse needs of users [21].
To address this challenge, various approaches have been explored.Some studies [22,23] have investigated computational models of readability, aiming to develop efficient methods for assessing the readability of technical materials encountered in domain-specific IR.Traditional readability formulas, although widely used, are often insufficient for handling technical texts.On the other hand, more advanced algorithms, such as textual coherence models, may offer improved accuracy but suffer from computational complexity when applied to large-scale document re-ranking scenarios.
The importance of domain-specific readability computation in IR has been emphasized [24].Technical terms and the need for efficient computations for large document collections are among the challenges identified.By integrating concept-based readability and domain-specific knowledge into the search process, researchers aim to enhance the accessibility and relevance of search results.These efforts contribute to empowering users, including both domain experts and average users, with efficient and reliable IR tools [16].

Readability Formulas
Since the early 20th century, researchers in this field have developed a variety of readability formulas aimed at laypeople.Many of these formulas are still widely used today [25].Among the most commonly used formulas are Simple Measure of Gobbledygook (SMOG), the Dale-Chall Readability formula, the Flesch Reading Ease formula, the Fog Index, and the Fry Readability Graph.More details about readability methods can be found in [26].
These formulas typically analyze syntactic complexity and semantic difficulty.Syntactic complexity is often evaluated by examining sentence length, while semantic difficulty is measured using factors such as syllable count or word frequency lists.Other factors that have been found to influence readability include the presence of prepositional phrases, the use of personal pronouns, and the number of indeterminate clauses.However, it is important to note that readability formulas specifically targeting medical practitioners are currently lacking.Further research is needed to develop readability formulas tailored to the unique needs and expertise of medical professionals.

Simple Measure of Gobbledygook (SMOG)
The SMOG Index was introduced by clinical psychologist G. Harry McLaughlin in 1969 [27].It is designed to estimate the years of education required to comprehend a piece of written text accurately by counting the words of three or more syllables in three ten-sentence samples.The formula calculates the reading grade level based on a simple mathematical equation that incorporates the count of polysyllabic words within a sample text.
The SMOG formula has been widely used in various fields, including education, healthcare, and IR [28].Its simplicity and ease of application make it a popular choice for estimating readability levels.However, it is important to note that the SMOG formula may have limitations when applied to specific domains, such as technical or scientific texts, as it does not consider the domain-specific terminology and nuances that might impact comprehension [29].
Nonetheless, SMOG stands out as a well-suited formula for healthcare applications.It consistently aligns results with expected comprehension levels, employs validation criteria, and maintains simplicity in its application [28].These factors make it a reliable choice for assessing the readability of healthcare-related documents.Researchers frequently use the SMOG formula as a reference point when evaluating alternative readability models or proposing new formulas tailored to specific domains.

Dale-Chall Readability Formula
The Dale-Chall Readability Formula is a widely used readability measure that provides an estimate of the comprehension difficulty of a given text.Developed by Edgar Dale and Jeanne Chall in 1948 [30], this formula takes into account both the length of sentences and the familiarity of words to determine the readability level.
The Dale-Chall readability formula calculates its final score by examining the proportion of words in a given text that do not belong to a predefined list of commonly known words.This list comprises 3000 words that are generally understood by fourth-grade stu-dents.The formula calculates the readability score by incorporating the average sentence length and the percentage of unfamiliar words.
In summary, the Dale-Chall formula's notable advantage lies in its focus on word familiarity, enhancing its ability to assess readability, especially for less experienced readers or those with limited vocabulary.This practical and accessible approach considers both word familiarity and sentence length, offering insights into comprehension difficulty for readers of different proficiency levels.However, it is crucial to acknowledge the formula's limitations and its suitability for specific contexts.

Readability for Health Consumers
Health literacy and effective communication of health information are crucial in ensuring that patients can access and understand important medical content [31].With the increasing reliance on online resources for health-related information, it is essential to assess the readability of online materials, particularly those aimed at health consumers.
Several studies have examined the readability and quality of medical content targeting health consumers [32][33][34].These studies highlight the challenges associated with readability in health-related content, indicating that a significant portion of medical content is difficult for the average layperson to understand.This issue extends to other languages as well, such as German [35].
The findings from these studies underscore the need for greater attention to readability and clear communication in online health resources.Collaborative efforts among healthcare professionals, researchers, and organizations are crucial for enhancing the readability of health materials, including Wikipedia pages and patient education resources.By improving the readability of these resources, we can enhance health literacy and empower patients to make informed decisions about their health [33].
In conclusion, the related work in this subsection underscores the significance of comprehensibility in domain-specific IR.The exploration of computational models, readability formulas, and approaches tailored for specific domains, such as health and medical IR, reflects a growing recognition of the importance of addressing comprehension aspects.The results of experiments conducted in this domain showcase the potential benefits of integrating readability considerations into IR systems.However, further research is needed to generalize these findings to other domains and explore additional factors influencing word-level relatedness, document cohesion, and sentence-level readability computation.

Materials and Methods
Our process started with the creation of a dataset.This required extracting articles from PubMed and then having medical experts annotate them.Next, we analyzed the dataset for its general characteristics, level of difficulty, and language variations.Finally, we utilized these data to refine our pretrained BERT model.

Data Collection
The dataset creation process started with the extraction of 10,000 articles from PubMed, specifically targeting those with available abstracts in both German and English.The data were then stored in a MongoDB and afterwards extended with information about the readability and technicality of those abstracts.This was completed in an annotation process by 216 medical students and practitioners using an annotation tool developed only for this purpose.In this process, a total number of 209 annotated articles could be gathered.An overview of the data acquisition is shown in Figure 1.

Documents
We have chosen to focus on research articles' abstracts as they serve as concise summaries of the main research findings and are widely utilized for initial screening and IR purposes.In light of this, the PubMed database proves to be an ideal resource for our specific use case.
PubMed consists of an extensive collection of scientific literature spanning various medical disciplines, making it a comprehensive repository of valuable medical information.To ensure a manageable dataset, we opted to download a subset of articles from the vast PubMed database.As selection criteria, we chose 10,000 random articles that had abstracts available and were written by the respective authors in both the German and English languages.

Participants
A total of 216 participants were recruited from university hospitals for the annotation process.Those can be divided into four categories, as shown in Table 1: 70 medical students up to the 6th semester (junior students), 59 medical students in the 7th semester or higher (senior students), 59 medical doctors with up to 2 years of experience (junior doctor), and 28 doctors with more than 2 years of experience (senior doctor).All participants either studied at a German university or worked in a German hospital, ensuring they possessed the necessary domain knowledge.Additionally, all participants were asked to assess their German and English language proficiency based on the Common European Framework of Reference (CEFR) (CEFR is an international standard for describing language ability.It describes language ability on a six-point scale, from A1 for beginners, up to C2 for those who have mastered a language), ranging from B1 (Intermediate level) to C2 (Native level).Furthermore, in accordance with institutional regulations and to maintain complete anonymity, no further questions were asked.

Annotation
In the initial phase of the study, we developed a web-based application using Python and MongoDB to facilitate the evaluation process.This application allowed participants to log in anonymously, access clear guidelines, and review the criteria for rating abstracts' readability and technicality.Participants had no time constraints, providing flexibility in completing the evaluation.Each participant assessed 2-3 different abstracts, rating them for ease of reading and technicality on a scale from 0 to 100 in 5-point increments (0, 5, 10, . . ., 100).They also identified relevant medical disciplines.To ensure reliability and minimize bias, three independent participants evaluated each abstract, and their scores were averaged to represent text complexity and technicality fairly.Annotations were conducted independently, enhancing the reliability and consistency of ratings.

Data Analysis
The analysis subsection provides insights into the composition of the dataset and sheds light on the ease of reading and technicality aspects of the abstracts.

Dataset Overview
The annotation process proved to be a resource-intensive task, primarily due to the challenges associated with securing time from busy medical professionals, including both practicing physicians and medical students.With their commitments ranging from extended working hours and on-call duties to direct patient care responsibilities and rigorous exam preparation, allocating time for annotation was constrained.As a result, only 209 abstracts were annotated for technicality and readability aspects.Each annotated abstract included both English and German versions, ensuring comprehensive coverage of the research literature across languages.The dataset encompassed various medical disciplines, creating a representative subset of medical research publications.

Descriptive Statistics
To gain a better understanding of the dataset, we conducted descriptive statistics on the abstracts.The average length of the abstracts was found to be 215 (SD = 90) words for English and 190 (SD = 77) words for German abstracts.It is noteworthy that, while English abstracts have a higher word count, German abstracts tend to be longer in terms of character count.This observation is due to the nature of the German language, where words often contain more characters compared to English [36].Specifically, the average character count for English abstracts was 1480 (SD = 585), while, for German abstracts, it was 1587 (SD = 616) characters.Additionally, the annotations were accompanied by significant statistical insights.The average annotation time for each document was 176 (SD = 73) seconds, highlighting variability in annotation durations.The annotation process was also conducted at an average rate of 113 (SD = 12) words per minute (WPM), emphasizing diverse annotation speeds, which is consistent with Klatt et al.'s study [37].Furthermore, the mean intraclass correlation coefficient (ICC) for annotations was found to be 0.81 (SD = 0.08) (according to Koo et al. guideline [38], ICC values below 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and exceeding 0.90 indicate poor, moderate, good, and excellent reliability, respectively), indicating substantial consistency and agreement among the annotations provided by medical professionals.

Ease of Reading Analysis
The ease of reading aspect was assessed by medical professionals, who assigned ease of reading scores to each abstract, where 0 means hard to read and 100 means easy to read.The average ease of reading scores across all abstracts were found to be 64.21(SD = 21.54) and 61.50 (SD = 21.67) for English and German readability scores, respectively.Figure 2a shows a visual representation of the distribution of ease of reading scores and identifies any potential outliers or patterns.

Technicality Analysis
The technicality level of the abstracts was evaluated based on the ratings provided by the medical professionals, where 0 means high technicality and 100 low technicality found in the abstract.The average technicality scores for the dataset were 30.55 (SD = 10.02) and 26.72 (SD = 12.81) for English and German readability scores, respectively.Figure 2b shows a visual representation of the distribution of technicality scores and identifies any potential outliers or patterns.

Comparison of German and English Abstracts
For gaining deeper insights into the dataset, we conducted a comparative analysis of the technicality and ease of reading levels between German and English medical abstracts using paired t-tests.The goal was to investigate potential differences in the linguistic characteristics of the two languages and their impact on the ease of reading and technicality of the abstracts.
For the technicality level, our analysis revealed that German medical abstracts tended to exhibit higher levels of technicality compared to their English counterparts.This finding aligns with previous studies [39,40] highlighting the inherent complexity of the German language, particularly in the medical domain.The higher technicality level of German abstracts can be attributed to the frequent usage of specialized medical terminology and the structural intricacies of the German language itself.Figure 3a shows the difference between English and German technicality, where the positive side of the graph shows articles with higher technicality in German, and articles with higher technicality in English on the negative side.
In terms of readability, we observed that English medical abstracts were slightly easier to read compared to German abstracts.This disparity can be attributed to several factors.First, the English language generally exhibits a more straightforward and concise writing style, which may enhance readability for a wider audience.Second, English has a larger presence in the global scientific community, leading to greater standardization and familiarity among medical practitioners.Consequently, English abstracts may be tailored to a broader readership, including non-native English speakers.Figure 3b shows the difference between English and German ease of reading, where the positive side of the graph shows articles easier to read in English, and articles are harder to read in German on the negative side.

Limitations
It is important to acknowledge the limitations of the dataset analysis.Out of the 10,000 downloaded abstracts, the sample size of 209 may be inadequate to represent the entire breadth of medical research literature.Moreover, the annotations provided by medical professionals might introduce some level of subjectivity or bias.Nevertheless, we took steps to minimize such limitations by involving multiple annotators per abstract.These limitations must be considered when interpreting the results and generalizing findings from the dataset.Furthermore, we should acknowledge that our analysis focused only on technicality and readability and did not explore other factors that might impact the document's understandability.

Model
The model subsection provides insights into the developed models and evaluation matrix.

Our Model
To establish a baseline for our dataset, we employed pretrained BERT models designed for medical text, "PubMedBERT" for English abstracts [41] and "German-MedBERT" for German abstracts [42].BERT, known for its impressive performance in various Natural Language Processing (NLP) tasks, was a fitting choice for our project.
The fine-tuning process for the pretrained BERT models, specifically "BERT-Readability" for assessing ease of reading and "BERT-Technicality" for evaluating technicality, involved using our dataset, which includes annotated medical abstracts, each assigned scores on a scale from 0 to 100 for ease of reading and technicality.Our main objective was to train these models to predict these scores based on the textual content within the abstracts.
To assess the performance of "BERT-Readability" and "BERT-Technicality," we employed the root mean square error (RMSE) metric.RMSE measures the average difference between predicted and actual scores, with lower RMSE values indicating better alignment between predictions and actual scores.
By leveraging pretrained BERT models, we established a foundational framework for predicting ease of reading and technicality in medical abstracts."BERT-Readability" and "BERT-Technicality" serve as baseline models and serve as reference points for future analyses.This enables us to assess the effectiveness of any forthcoming advancements or novel techniques introduced into our work.

Common Readability Formulas
To judge the performance of our models, we specifically selected readability formulas that have been utilized in the domain of medical literature [43,44].These commonly used readability formulas were tested on the same test set as our models.
Prior to evaluation, we took the necessary steps to ensure compatibility between the outputs of the readability formulas and the ground truth values of our dataset.To achieve this, we employed rescaling/normalization techniques to align the results of each formula with the range and distribution of the dataset's ground truth scores.This approach allowed us to establish a fair and consistent basis for comparison.Subsequently, we evaluated the of each readability formula using the same evaluation metrics employed for our models ("BERT-Readability" and "BERT-Technicality").
By incorporating a comparison with these readability formulas, we are able to gain a broader perspective on the strengths and limitations of our models.This comparative analysis allows us to assess whether our models outperform or align with the established readability formulas in the specific context of medical documents.The objective is to position the performance of our models within the broader landscape of readability assessment methods utilized in the medical domain.

Results
In this study, we can divide our contributions into two parts, which are a dataset and regression models for predicting the ease of reading and technicality of a scientific research abstract.

Dataset
For this study, we curated a comprehensive dataset of scientific research abstracts written in English and German from various medical disciplines (such as Immunology, Dermatology, Radiology, Emergency medicine, Internal medicine, Neurology, etc.).The dataset comprises 209 abstracts collected from the PubMed database.Each abstract was presented to three medical practitioners (student or doctor).This dataset can be used to improve the readability aspect of any NLP or IR system targeting the medical domain.

BERT-Readability and BERT-Technicality
We developed BERT-Readability and BERT-Technicality regression models to predict ease of reading and technicality levels in scientific research abstracts on a prediction scale between 0 and 100.These models were trained using the curated dataset, which included annotations for ease of reading and technicality scores for each abstract.We evaluated the performance of BERT-Readability and BERT-Technicality using the RMSE metric.The results demonstrated the models' effectiveness in predicting ease of reading and technicality scores.
As shown in Table 2, the RMSE for ease of reading is 10.61 for English abstracts and 11.80 for German abstracts.For technicality, the RMSE is 9.42 for English abstracts and 10.07 for German abstracts.
Furthermore, we conducted a comparative assessment, pitting our models against widely adopted and well-established readability formulas commonly utilized in healthcare literature [45,46], as shown in Table 2.The results consistently demonstrated that our models outperformed these traditional formulas in terms of prediction accuracy.This underscores the critical advantage of employing specialized models tailored explicitly for medical experts.
The existing readability formulas assessed in our comparison rely on a set of general features, such as sentence length, syllable count, and word complexity, which are designed for assessing text readability across various domains.In contrast, our models have been meticulously fine-tuned to account for the specific needs and nuances of medical research abstracts, offering a more precise and effective solution for this specialized context.Overall, the dataset and models presented in this study offer valuable resources for assessing the ease of reading and technicality of scientific research abstracts in both English and German.These models serve as valuable tools for personalized search engines, whether by enabling the use of filtering-based techniques or contributing to the ranking algorithm used in the retrieval process.This enables medical practitioners to access relevant research findings that align with their language proficiency and expertise.

Discussion
The primary objective of this research paper was to tackle the challenges faced by medical practitioners when seeking timely and comprehensible search results within the healthcare domain.We specifically delved into the realms of ease of reading and technicality in medical research articles, with the ultimate aim of enhancing IR for medical professionals, thereby augmenting their decision making capabilities and improving patient care outcomes.
The dataset compiled for this study consists of 209 scientific research abstracts from diverse medical disciplines, available in both English and German.Each abstract underwent annotation by medical practitioners, who assigned ease of reading and technicality scores.The dataset analysis provided valuable insights into the linguistic characteristics of medical abstracts and revealed differences in technicality and ease of reading between English and German abstracts.Notably, our findings revealed that German abstracts tended to be more technically challenging, while English abstracts were slightly easier to read.
Our study introduced two regression models, namely BERT-Readability and BERT-Technicality, which proved highly effective in predicting ease of reading and technicality scores for the abstracts.These models outperformed existing readability formulas commonly employed in the literature, underscoring their significance in predicting domainspecific readability aspects.This highlights the critical importance of employing domainspecific models tailored to scientific research abstracts for precise readability assessment.
The integration of ease of reading and technicality aspects into search engines has practical implications for medical practitioners.Personalized search results, based on language proficiency and expertise, empower medical professionals with efficient and relevant IR tools.This customization enhances the relevance and accessibility of information, optimizing decision making and patient care.Our research paper underscores the significance of tailoring readability assessment to the specific needs of medical practitioners, leading to improved information utilization and overall usability of search engines in the medical domain.

Future Directions
A promising avenue for future research is to explore more sophisticated models and to utilize advanced transfer learning techniques.These efforts aim to improve the accuracy and applicability of readability assessment in the context of medical research abstracts.
Another compelling area of future study involves investigating the direct influence of readability assessment on the decision making processes and patient care outcomes of medical professionals.Understanding the tangible benefits of improved readability in medical literature can further underscore the importance of our research.
One interesting possibility for future research is the incorporation of text complexity factors into the IR ranking algorithm.Although our study has demonstrated their potential advantages, additional research is required to examine the feasibility of this integration.A future study could focus on implementing text difficulty aspects into IR ranking algorithms to improve the retrieval of relevant medical information for practitioners.The study should address algorithmic refinement, adaptability, and user experience assessment for practical implementation.

Limitations
One notable limitation of this study concerns the dataset's sample size.Although the dataset offered useful insights, its limited size may restrict the generalizability of our findings to a wider medical literature context and various medical disciplines.
Our research focused on English and German abstracts, which may not represent the full linguistic diversity of medical literature.Future studies could expand to include a more extensive range of languages to enhance the scope of applicability.
While our regression models demonstrated superior performance, their complexity may pose challenges in real-world implementation.Future research should address ways to streamline these models for practical use.

Conclusions
In conclusion, this research paper addresses a crucial gap in the field of healthcare IR and readability assessment.By providing a comprehensive dataset and introducing the integration of ease of reading and technicality aspects into personalized search engines, we have taken significant strides toward enhancing the tools available to medical practitioners.Our work not only offers efficient and reliable IR solutions but also contributes to the broader goal of improving patient care and facilitating informed decision making within the healthcare domain.
The dataset compiled for this study serves as a valuable resource for future research and development in the realm of medical literature analysis.Its provision underscores our commitment to advancing the state of the art in IR and readability assessment.

Figure 1 .
Figure 1.The dataset creation process includes extracting articles from PubMed and expertly annotating abstracts for technicality and readability assessment.

Figure 2 .
Figure 2. Aspects of scores' distributions on English and German abstracts.

Figure 3 .
Figure 3. Difference between English and German scores per article.Scores are between 0 and 20.

Table 1 .
Annotation process participants' distribution among four categories.

Table 2 .
Formulas and our models' performance on the dataset: RMSE scores comparison.