Tracing the Evolution of Reviews and Research Articles in the Biomedical Literature: A Multi-Dimensional Analysis of Abstracts

: We previously examined the diachronic shifts in the narrative structure of research articles (RAs) and review manuscripts using abstract corpora from MEDLINE. This study employs Nini’s Multidimensional Analysis Tagger (MAT) on the same datasets to explore five linguistic dimensions (D1–5) in these two sub-genres of biomedical literature, offering insights into evolving writing practices over 30 years. Analyzing a sample exceeding 1.2 million abstracts, we observe a shared reinforcement of an informational, emotionally detached tone (D1) in both RAs and reviews. Additionally, there is a gradual departure from narrative devices (D2), coupled with an increase in context-independent content (D3). Both RAs and reviews maintain low levels of overt persuasion (D4) while shifting focus from abstract content to emphasize author agency and identity. A comparison of linguistic features underlying these dimensions reveals often independent changes in RAs and reviews, with both tending to converge toward standardized stylistic norms.


Introduction
Biomedical publishing plays a crucial role in the field of biomedicine and has significant importance for the dissemination of scientific knowledge [1].The release of research articles facilitates the systematic evaluation, synthesis, and analysis of data derived from various studies.The acquisition of robust evidence is imperative for guiding clinical guidelines, shaping treatment protocols, influencing public health policies, and steering healthcare interventions [2].Moreover, the publication of biomedical literature serves as a conduit through which researchers and scientists globally can share their findings, discoveries, and innovations.This dissemination is indispensable for advancing scientific understanding and fostering collaboration across borders.Beyond its collaborative impact, the act of publishing research findings also champions transparency and accountability within the scientific community.Researchers willingly subject themselves to scrutiny and evaluation by their peers, creating an environment that encourages the conduct of rigorous, well-designed studies.Simultaneously, this system discourages unethical practices and the misrepresentation of data [3].
Biomedical publishing not only serves as a cornerstone for career advancement among researchers, scientists, and healthcare professionals but also plays a pivotal role in shaping their visibility, credibility, and professional reputation.The quantity and impact of publications carry substantial weight in academic promotions, grant applications, and funding decisions [4,5].This means that there are strong incentives to publish, and the number of papers has steadily grown in the last years, reaching levels that make it very difficult for researchers in any field to keep abreast of the published material [6], a problem made even more acute by the surge in predatory journals [7].Biber's multidimensional (MD) analysis has been extensively used to investigate corpora of different kinds and origins, such as abstracts published in different countries [15] or by writers of different origins [16], and has proved very useful because it concisely provides an overview of the general linguistic and rhetorical stances of a text in the broader context of the literature production in many fields and genres.Through the Multi-dimensional Analysis Tagger [17], texts can be added to Biber's MD analysis of English, as it replicates Biber's 67 original features used to compute his dimension scores, on which we have also decided to rely for the present work.
We previously characterized the narrative arcs in research articles and literature reviews in the biomedical field by applying the LIWC 2022 analysis tool [18] to a corpus of abstracts from research articles and reviews obtained from MEDLINE over the course of the last 33 years [19].Building upon this foundation, we proceeded to apply Biber's analysis to the same corpus.This analytical approach aims to provide deeper insights into the extant similarities and differences in these two genres as well as linguistic changes that might have occurred in the 1989-2022 period, shedding light on the nuances and divergences in these transformations over time.

Materials and Methods
The datasets that we used for the present study were composed of two independent corpora of abstracts of scientific manuscripts obtained from MEDLINE and previously published in [19].MEDLINE is one of the world's largest and most used repositories for biomedical literature; it is run by the National Institute of Health and is freely accessible through a portal called Pubmed, which works as its search engine.To retrieve data from MEDLINE in a way that was conductive to our analysis, we accessed it via command line through the Pubmed API using Python running on Jupyter notebooks [20], a popular development environment for this programming language.Briefly, we used the Python 3.9 litter-getter library [21] to search and retrieve the abstracts by connecting to Pubmed, performing a search, and automatically downloading the data in a format we could use for subsequent analysis.We relied on the following search terms: Pubmed uses a simple syntax for queries, and search terms can be easily joined by Boolean operators; plus, tags within brackets can be added to limit the search by keywords to the desired field in the record.In this case, the [dp] tag refers to the field 'Date of Publication', while the [pt] tag limits the search for the keyword to the 'Publication Type' field.The word 'year' in our search is not really a keyword but a Python variable, which we set to iterate from 1989 through 2022.These queries retrieved two lists of Pubmed IDs (PMIDs).Search #1 generated a list of abstracts from PMIDs excluding the 'Review' type, and search #2 generated a list of PMIDs exclusively constituting 'Review' abstracts in the same time interval.The reason for this distinction is that Reviews are a genre of scientific article that comprise several peculiar sub-types including 'Narrative reviews' or 'Systematic reviews', each with its distinctive purposes and structure [22], and we hypothesized that reviews may display different linguistic features from research articles, in agreement with the differences in narrativity highlighted by our previous LIWC analysis [19].
As previously explained [19], to balance our corpora, we randomly sorted 20,000 PMIDs out of the total number of retrieved PMIDs for each year, and proceeded to retrieve the data from the PMID list, thus obtaining 2 independent corpora as follows: 1.
To obtain the abstract texts, litter-getter downloaded an XML file for each publication containing all the data in the record, based on its PMID; we then created a Pandas Dataframe [23], which is a special tabular form, not dissimilar from an Excel sheet, and populated it using the BeautifulSoup library [24].BeautifulSoup is a Python library specifically designed to extract and clean XML data, i.e., recognize XML tags and isolate the desired information.The table was created in such a way that each row contained a publication, and it had columns for the authors, title of the publication, journal, abstract, etc.We then took the abstract column; we lowercased all the abstracts and saved them as a separate text (.txt) file.
This file was passed into the Multidimensional Analysis Tagger v 1.3.3[17], freely available at https://sites.google.com/site/multidimensionaltagger(accessed on 20 June 2023).This tagger is grounded in Biber's (1988) Variation across Speech and Writing tagger for the multidimensional functional analysis of English texts.Unlike Biber's framework, this program is based on the Stanford Tagger [17] and generates both a grammatically annotated version of the text as well as statistics following Biber's method [10].This tool is very user-friendly, thanks to its graphical interface, and its output is a series of scores for the 5 dimensions outlined by Biber, plus scores for each of the underlying 67 linguistic features (Appendix A), all expressed as Z scores, in separate comma-separated value files (.csv).A Z score is, simply put, a measure of the distance of a score for a given sample from the mean of that score for a whole population based on Biber's corpus [25], expressed as number of standard deviations from the mean.So, in our case, the MAT software contains the means for each score for a vast corpus of texts from various genres, including conversations, speeches, personal letters, broadcasts, and academic writing [17].As an example, a Z score of 2 for any linguistic feature means that this score is 2 standard deviations above the mean of that mixed corpus, which is representative of a general literature production.All the analyses were conducted on Jupyter notebooks [20] by importing the .csvfiles back into Pandas tables in Python.Matplotlib [26] and Seaborn [27] libraries were then used to plot the data [20].

Results and Discussion
The two corpora comprised 680,000 abstracts each, without overlap, because of the way they were selected.The selection criteria for corpus #1, however, had a consequence, i.e., that this corpus contained not only RAs, but also a small number of different genres.A post hoc analysis on the corpus showed that 611,450 abstracts out of the total 680,000 in #1 corpus belonged to research articles, and 43,567 abstracts (7.1%) belonged to the comment, letter, and editorial categories, which do not fall within our area of interest, while the remaining 24,983 could be classified as less frequent manuscript types, e.g., news or historical articles [19].

Dimension 1
Our analysis commenced with Dimension 1 (D1) in MAT, a dimension that, in this context, signifies the level of informational versus involved discourse.The positive pole of D1 is typically linked to dialogues characterized by language rich in interaction and expressive affective content, as can be found in personal correspondence [11].Conversely, the negative pole of D1 is associated with information-rich and highly edited text, aligning with the expectations from a textbook, or an academic article [28].Consistently, both research articles (RAs) and reviews in our corpus exhibit negative scores (Figure 1A).While the D1 scores for RAs have remained relatively constant over time, those for reviews have become slightly more negative, i.e., they have become even more aligned to the informational pole of discourse.To delve deeper into how the dimensions evolved within our two document groups, we employed scatterplots depicting the values of different features for RAs and reviews over time.Figure 1B illustrates that RAs with varying D1 scores (and thus with more or less pronounced informational natures) are present across all publication dates.In contrast, however, newer reviews tend to cluster around comparatively more negative values than older ones, suggesting that newer reviews exhibit stronger informational traits.To enhance the transparency of the dimension score and gain further insight into the evolving phenomena in the literature, we turned our attention to analyzing the underlying linguistic features associated with D1.Notably, a frequent use of nouns and long words is a characteristic feature of information-rich texts, corresponding to negative scores for D1, as these features require careful planning in production and are less frequent in impromptu speeches and dialogues.Consistently with this observation, our analysis reveals that both RAs and reviews have increased their Z scores-i.e., frequency-for these features over the years in a parallel manner (Figure 2A,B).However, no clear trend is discernible for two additional features typical of texts in the negative pole of D1 score: the type/token ratio and the frequency of attributive adjectives (Figure S1).The former parameter reflects the ratio between different words and the total number of words (tokens), indicating the diversity of language used.The latter parameter is associated with a language rich in adjectives, which, again, is a common feature of planned discourses.Notably, a frequent use of nouns and long words is a characteristic feature of informationrich texts, corresponding to negative scores for D1, as these features require careful planning in production and are less frequent in impromptu speeches and dialogues.Consistently with this observation, our analysis reveals that both RAs and reviews have increased their Z scores-i.e., frequency-for these features over the years in a parallel manner (Figure 2A,B).However, no clear trend is discernible for two additional features typical of texts in the negative pole of D1 score: the type/token ratio and the frequency of attributive adjectives (Figure S1).The former parameter reflects the ratio between different words and the total number of words (tokens), indicating the diversity of language used.The latter parameter is associated with a language rich in adjectives, which, again, is a common feature of planned discourses.Notably, a frequent use of nouns and long words is a characteristic feature of information-rich texts, corresponding to negative scores for D1, as these features require careful planning in production and are less frequent in impromptu speeches and dialogues.Consistently with this observation, our analysis reveals that both RAs and reviews have increased their Z scores-i.e., frequency-for these features over the years in a parallel manner (Figure 2A,B).However, no clear trend is discernible for two additional features typical of texts in the negative pole of D1 score: the type/token ratio and the frequency of attributive adjectives (Figure S1).The former parameter reflects the ratio between different words and the total number of words (tokens), indicating the diversity of language used.The latter parameter is associated with a language rich in adjectives, which, again, is a common feature of planned discourses.A more strictly informational writing style is also evidenced by a decline in other typical features of involved texts.Analytic negation (Figure 2D); the use of demonstra-tive pronouns (e.g., This, That, etc.), often employed with a deictic function in spoken language and interaction (Figure 2E); private verbs expressing internal cognitive processes (e.g., think, feel, perceive, etc.); and the use of be as the main verb (Figure 2F) have all experienced a decrease in frequency.Intriguingly, the use of prepositions, typically high in texts with strongly negative D1 scores, decreased in both corpora over time (Figure 3A), while non-phrasal coordination, associated with involved writing, has increased in both corpora (Figure 3B).A more strictly informational writing style is also evidenced by a decline in other typical features of involved texts.Analytic negation (Figure 2D); the use of demonstrative pronouns (e.g., This, That, etc.), often employed with a deictic function in spoken language and interaction (Figure 2E); private verbs expressing internal cognitive processes (e.g., think, feel, perceive, etc.); and the use of be as the main verb (Figure 2F) have all experienced a decrease in frequency.Intriguingly, the use of prepositions, typically high in texts with strongly negative D1 scores, decreased in both corpora over time (Figure 3A), while non-phrasal coordination, associated with involved writing, has increased in both corpora (Figure 3B).However, research articles (RAs) and reviews diverged in at least three features.The frequency of first-person pronouns, characteristic of dialogue (and involved writing), is generally low in both corpora, aligning with the expected academic style.Yet, there are some notable exceptions, such as the following striking example: It was my second clinical placement and I was working on a surgical ward when I was asked to accompany a patient to theatre.[29] Admittedly, this may be an unusual style for academic prose, yet it is found in our corpus.
Notably, the frequency of first-person pronouns increased during the 1990s in RAs but remained relatively constant in reviews.Only in the early 2000s did it start to rise in both text types (Figure 3C).The most likely explanation for this behavior is that, although passive verbs have been used abundantly in academic writing as a rhetorical device to highlight the detachment of the narrator from the events contained in the text and as a sign of objective observation [30], the use of active verbs and first-person pronouns has been advocated in more recent times for the sake of clarity [31] and has been observed to be on the rise in academic writing in biology or life sciences [32].It may be assumed that RAs were more prone to the use of first-person pronouns, as they often reported on the However, research articles (RAs) and reviews diverged in at least three features.The frequency of first-person pronouns, characteristic of dialogue (and involved writing), is generally low in both corpora, aligning with the expected academic style.Yet, there are some notable exceptions, such as the following striking example: It was my second clinical placement and I was working on a surgical ward when I was asked to accompany a patient to theatre.[29] Admittedly, this may be an unusual style for academic prose, yet it is found in our corpus.
Notably, the frequency of first-person pronouns increased during the 1990s in RAs but remained relatively constant in reviews.Only in the early 2000s did it start to rise in both text types (Figure 3C).The most likely explanation for this behavior is that, although passive verbs have been used abundantly in academic writing as a rhetorical device to highlight the detachment of the narrator from the events contained in the text and as a sign of objective observation [30], the use of active verbs and first-person pronouns has been advocated in more recent times for the sake of clarity [31] and has been observed to be on the rise in academic writing in biology or life sciences [32].It may be assumed that RAs were more prone to the use of first-person pronouns, as they often reported on the experimental activity of a research group, as opposed to reviews, which typically summarize the findings of other research groups, and thus this increase occurred earlier.
The use of present tense verbs is strongly associated with involved discourse, too (as it is very frequent in interactions between speakers), and, though generally low in both corpora, our data indicate that their frequency Z score is higher in review papers (Figure 3D).This discrepancy may stem from the nature of reviews, which often encapsulate the current knowledge in a specific area and draw conclusions that are presented as general rules, as in the following: Primary care clinicians treat patients with cancer and cancer pain.It is essential that physicians know how to effectively manage pain including assessment and pharmacologic and nonpharmacologic treatment modalities.[33] In such cases, the present tense is aptly employed to convey a sense of lasting value to the conclusions drawn from the literature.Conversely, the purpose of RAs is typically to report on one or more experiments, situated in time and place, often described using past tenses, as in the following: During 8 observation days (with time delay of 10-14 days between each observation day), all adult patients hospitalized at an internal medicine ward of 4 Belgian participating hospitals were screened for AB use.Patients receiving AB on the observation day were included in the study and screened for signs and symptoms of AAD using a period prevalence methodology.[34] The use of present tense slightly increased in RAs in the 1990s and remained stable thereafter, while it started to decline in reviews around the same time.The exact explanation is speculative at this point, possibly related to the rise in systematic reviews or a stylistic shift.Notably, the Z score for RAs remains significantly lower than for reviews (Figure 3D).The use of possibility modals (i.e., verbs such as may or might) is associated with an involved style, too, as these are often utilized to express subjectivity or a guess, which is a common situation in a dialogue context.However, they can also be found quite regularly in academic writing [35], usually to express a hypothesis, as in the following: Administration of thioredoxin may have a good potential for anti-aging and anti-stress effects.[36] Admittedly, the room for hypotheses, although a common and actually quite essential practice in the scientific method [37], is quite limited in academic literature, given the need for evidence-grounded reasoning, hence their low frequency.Interestingly, the use of possibility modals has moved in opposite directions in the two corpora analyzed.The Z score for this feature was slightly positive in reviews, aligning with a text genre prone to drawing conclusions based on reviewed data.However, in RAs, it was negative, suggesting that assumptions and hypotheses were likely confined to few sentences in such texts.Over time, this index steadily decreased in the review group, reaching negative values in the last decade, possibly in association with the increase in systematic reviews where the extensive use of statistical tools may reduce speculation.Conversely, it increased by almost 30% in the RA corpus, possibly linked to a bolder or more personal style, as observed previously (Figure 3E) [32].
When evaluating the outcomes generated by Nini's MAT software, it is imperative to bear in mind that, while it draws inspiration from Biber's work, it might not entirely capture the nuance of Biber's original analysis.A more in-depth exploration would involve conducting a comprehensive factor analysis.This analytical process aims to delve into the intricate associations between features, weighing the contribution of each feature in each dimension.Moreover, there arises the possibility of redefining these dimensions to better align with the specific characteristics of the corpus under examination.The adoption of a fixed solution, as exemplified by the MAT software, undoubtedly streamlines the presentation of final results and increases the comparability across studies.Yet, it introduces inherent constraints concerning the generalizability of the identified dimensions and their fidelity in reflecting the distinctive attributes of the analyzed documents.However, in the face of these methodological considerations, we maintain the belief that the insights garnered from the MAT software can offer valuable perspectives on the evolutionary trends within academic articles.This assertion gains particular significance when individual Z scores for the linguistic features in question are examined.By assessing them, it becomes possible to extract more granular insights into how specific linguistic elements contribute to the overarching trends and transformations observed in academic writing.

Dimension 2
We then proceeded to analyze the second dimension evaluated by MAT, Dimension 2 (D2), associated with narrative discourse.A positive score for D2 indicates a narrative, active, event-oriented nature, while a negative score suggests a more descriptive or static quality [11].
Our corpora of RAs and reviews have negative Z scores for D2, with reviews having a lower score than RAs (Figure 4A).This is not unexpected, as RAs more likely report, by definition, on the execution of one or more experimental procedures, which are usually associated with some sort of activity, as in the following: fidelity in reflecting the distinctive attributes of the analyzed documents.However, in the face of these methodological considerations, we maintain the belief that the insights garnered from the MAT software can offer valuable perspectives on the evolutionary trends within academic articles.This assertion gains particular significance when individual Z scores for the linguistic features in question are examined.By assessing them, it becomes possible to extract more granular insights into how specific linguistic elements contribute to the overarching trends and transformations observed in academic writing.

Dimension 2
We then proceeded to analyze the second dimension evaluated by MAT, Dimension 2 (D2), associated with narrative discourse.A positive score for D2 indicates a narrative, active, event-oriented nature, while a negative score suggests a more descriptive or static quality [11].
Our corpora of RAs and reviews have negative Z scores for D2, with reviews having a lower score than RAs (Figure 4A).This is not unexpected, as RAs more likely report, by definition, on the execution of one or more experimental procedures, which are usually associated with some sort of activity, as in the following: We investigated expression of the five ssts in various adrenal tumors and in normal adrenal gland.Tissue was obtained from ten pheochromocytomas (PHEOs)… [38]  This passage emphasizes action, doing, selecting, analyzing, and other similar activities that require narration to navigate through them.
Interestingly, the Z score for RAs progressively decreased and became more negative over time, while the D2 score for reviews remained constant and even increased, becoming less negative in the last 5 years (Figure 4A).This trend is reflected in Figure 4B, showing a drop in RAs' D2 score in the 1990s and early 2000s, while reviews' score started to increase independently from RAs in the first decade of the 2000s.This shift might be We investigated expression of the five ssts in various adrenal tumors and in normal adrenal gland.Tissue was obtained from ten pheochromocytomas (PHEOs). . .[38] This passage emphasizes action, doing, selecting, analyzing, and other similar activities that require narration to navigate through them.
Interestingly, the Z score for RAs progressively decreased and became more negative over time, while the D2 score for reviews remained constant and even increased, becoming less negative in the last 5 years (Figure 4A).This trend is reflected in Figure 4B, showing a drop in RAs' D2 score in the 1990s and early 2000s, while reviews' score started to increase independently from RAs in the first decade of the 2000s.This shift might be justified by the change in the Z score for the use of past tense verbs [10].This score, which was and has remained negative in both corpora for the whole timeframe (Figure 4C), decreased in RAs until the first decade of the XXI century, and it was followed by an increase in this score for review articles in the last two decades.This means that an abstract from a review article in 1989 could more easily contain a passage like the following: Several lines of evidence indicate that platelet-activating factor (PAF-acether) is implicated in hypersensitivity reactions.Indeed, PAF-acether reproduces the features of asthma in vivo and in vitro, since it induces bronchoconstriction, hypotension, and hemoconcentration and activates platelets and leukocytes.[39] This passage, rich in present tense verbs, conveys general principles about a phenomenon, such as a disease or a condition.In contrast, a more recent review text might incorporate more past tenses, as in the following: Mammalian neonates have been simultaneously described as having particularly poor memory, as evidenced by infantile amnesia, and as being particularly excellent learners.[40] This change could suggest that since the early 2000s, review articles have tended to circumscribe their conclusions to the research papers they use as sources, contextualizing them and possibly being more cautious with generalizations.
Other important linguistic features associated with D2 underwent similar changes in both corpora: the frequency of third-person pronouns (i.e., he, she, they) increased for both text types (Figure 4D), as did the use of present participial clauses (Figure 4F), while the frequency of perfect aspect verbs decreased in both RAs and reviews, although the scores for this feature remained significantly lower in RAs than in reviews (Figure 4E).
Noticeably, these findings are also apparently in contrast with what we reported on the same corpora using LIWC 2022 [19].In particular, we reported a higher Narrativity Overall score for reviews.That score was calculated based on the adherence to particular metrics, i.e., the three fundamental narrative curves that were measured in each abstract, namely Staging, Plot Progression, and Cognitive tension [41].The theory behind these measures is that a narrative trajectory can be traced in a text which follows Freytag's dramatic arc: first the stage for the action is set, characters and referents are introduced and presented; the action then begins, and as the text progresses, it intensifies as the narrator describes events and activities; and cognitive tension refers to the struggles and conflicts that ensue in the story and that reach a culmination point with the resolution of the crisis that leads to the end of the narration [42].To obtain an automated measure of these features, Pennebaker et al. decided to rely on grammatical words, which admittedly form a small set of words in English (and any language) [43].In particular, Boyd et al. proposed measuring the frequency of articles and prepositions as proxies for the staging score, because they can be assumed to be more abundant when new referents are introduced in the text (via articles) and their relations are explained (possibly also through the use of prepositions), while auxiliary verbs and anaphoric pronouns are taken as proxy measures of plot progression, because they can be expected to be used when describing an action.Cognitive tension is measured based on the abundance of verbs in a special dictionary created ad hoc and that includes such words as 'think' or 'believe' (which would be classified as 'private verbs' in Biber's multidimensional analysis).Boyd et al. recommend splitting texts in at least five segments to monitor how these scores vary as the text progresses.It is evident that LIWC 2022 and MAT scores rely on distinct features.Readers should focus on understanding the specific characteristics of the text that these tools measure, rather than becoming fixated on the 'narrativity' label.

Dimension 3
A positive score for Dimension 3 (D3) is associated with explicit and context-independent references, as opposed to the negative pole of this dimension, i.e., nonspecific, context-dependent content [10].This means that referents in the text are mentioned and described explicitly, so that there cannot be any doubt about their identity.According to our data, reviews have a higher D3 score than RAs, and both their scores have been progressively increasing over time (Figure 5A,B).Among the features that affect D3, nominalization appears to have followed this trend and may be responsible for the visible changes in D3 over time.
context-dependent content [10].This means that referents in the text are mentioned and described explicitly, so that there cannot be any doubt about their identity.According to our data, reviews have a higher D3 score than RAs, and both their scores have been progressively increasing over time (Figure 5A,B).Among the features that affect D3, nominalization appears to have followed this trend and may be responsible for the visible changes in D3 over time.Nominalization [44] indicates the replacement of a verb with a noun that denotes the same action and is a common feature of technical language [45], which is often used to convey a more impersonal tone, because a noun, by describing an action as an entity, detaches it from the agent and confers it a higher independence [46].The use of nominalization, albeit often deemed undesirable [47], has been growing in academic writing [48].An example of nominalization in our corpus could be the following: Nominalization [44] indicates the replacement of a verb with a noun that denotes the same action and is a common feature of technical language [45], which is often used to convey a more impersonal tone, because a noun, by describing an action as an entity, detaches it from the agent and confers it a higher independence [46].The use of nominalization, albeit often deemed undesirable [47], has been growing in academic writing [48].An example of nominalization in our corpus could be the following:

Pancreatic cancer (PC) is characterized by high tumor invasiveness, distant metastasis, and insensitivity to traditional chemotherapeutic drugs… [49]
Pancreatic cancer (PC) is characterized by high tumor invasiveness, distant metastasis, and insensitivity to traditional chemotherapeutic drugs. . .[49] Phrasal coordination is also positively associated with D3, as it may be associated with a higher degree of descriptivity and more thorough explanation of textual referents and, similarly to nominalization, displays a similar trend.An example of phrasal coordination in a manuscript with a high score for this feature is the following: . . . the specific mechanisms are blurry, especially the involved immunological pathways, and the roles of beneficial flora have usually been ignored.[49]

Dimension 4
Dimension 4 is associated with overt expression (positive pole) or non-overt expression (negative pole) of persuasion [11], not only referring to the writer's opinion, but also the quality of texts to prompt readers toward a certain course of action.Both our corpora have a negative score (Figure 6A), which indicates that both Ras and reviews from our corpus tend to be non-persuasive, which is in line with the declared function of biomedical literature, as previously stated elsewhere [28].Unsurprisingly reviews tend to be less negative than Ras in regard to D3 score.This is easily explained by the fact that reviews, by nature, provide readers with an overview of facts and knowledge that can be used to trace recommendations or guidelines.However, the D3 score changed over time, and while Ras have been mostly stable over the years, displaying a slight trend for D3 to increase by about 10% over the course of the last 30 years, reviews have further decreased this score by the same amount in the last decade (Figure 6B), signaling a movement toward a more impartial stance in review papers.Among the factors that may have affected these changes, the use of infinitives has been increasing in both corpora in a similar way (Figure 6C), such as in the following:

Dimension 4
Dimension 4 is associated with overt expression (positive pole) or non-overt expression (negative pole) of persuasion [11], not only referring to the writer's opinion, but also the quality of texts to prompt readers toward a certain course of action.Both our corpora have a negative score (Figure 6A), which indicates that both Ras and reviews from our corpus tend to be non-persuasive, which is in line with the declared function of biomedical literature, as previously stated elsewhere [28].Unsurprisingly reviews tend to be less negative than Ras in regard to D3 score.This is easily explained by the fact that reviews, by nature, provide readers with an overview of facts and knowledge that can be used to trace recommendations or guidelines.However, the D3 score changed over time, and while Ras have been mostly stable over the years, displaying a slight trend for D3 to increase by about 10% over the course of the last 30 years, reviews have further decreased this score by the same amount in the last decade (Figure 6B), signaling a movement toward a more impartial stance in review papers.Among the factors that may have affected these changes, the use of infinitives has been increasing in both corpora in a similar way (Figure 6C), such as in the following:

Understanding the age-dependent neuromuscular mechanisms underlying force reductions … allows researchers to investigate new interventions to mitigate these reductions.
[50]  Understanding the age-dependent neuromuscular mechanisms underlying force reductions . . .allows researchers to investigate new interventions to mitigate these reductions.[50] Suasive verbs are, understandably, another hallmark of overt persuasion, as in the following: . ..an ad hoc committee of the American Venous Forum, working with an international liaison committee, has recommended a number of practical changes.[51] Their frequency, quite similar in both manuscript types, has, however, been decreasing steadily over the years (Figure 6E), which is consistent with that more neutral stance we mentioned above.However, prediction modals, which have quite a high bearing on this dimension, despite displaying quite a high variability in our corpora, have mostly changed for RAs (Figure 6D), and a slight increase can be observed.Meanwhile, the use of split auxiliaries has changed for reviews only in the last decade (Figure 6F).Prediction modals include forms like will, should, or must, which indicate the future directions that research or practice should take, as in the following: The data suggest that treatment of H. pylori infection should be considered in children with concomitant GERD.[52] 3.5.Dimension 5 Dimension 5 refers to the abstract (positive pole) or non-abstract (negative pole) nature of the information contained in the texts [11].As already reported, academic texts, including those from the biomedical field, tend to have high scores for D5, as they tend to contain technical, abstract concepts.
In our corpora, review papers score higher than RAs regardless of the publication date (Figure 7A).Although the D5 score decreased for both text types over the years, the gap between the two groups vanished by the mid-second decade of the 2000s (Figure 7A).In the last 5 years, the D5 score appeared to increase again in reviews only (Figure 7B).The frequent use of passives is a hallmark of abstract style, as it typically mitigates the action of an agent (even more so if the passive is agentless).These two indices-passives with a "by" agent and agentless passives-have been decreasing in both text types (Figure 7D,E), presumably driving the trend of the overall D5 score.The use of conjuncts, however, has increased both in reviews and RAs, and this increase has been quite sudden in the last 5 years for reviews, which might explain the surge in D5 score in that timeframe.
Their frequency, quite similar in both manuscript types, has, however, been decreasing steadily over the years (Figure 6E), which is consistent with that more neutral stance we mentioned above.However, prediction modals, which have quite a high bearing on this dimension, despite displaying quite a high variability in our corpora, have mostly changed for RAs (Figure 6D), and a slight increase can be observed.Meanwhile, the use of split auxiliaries has changed for reviews only in the last decade (Figure 6F).
Prediction modals include forms like will, should, or must, which indicate the future directions that research or practice should take, as in the following: The data suggest that treatment of H. pylori infection should be considered in children with concomitant GERD.[52] 3.5.Dimension 5 Dimension 5 refers to the abstract (positive pole) or non-abstract (negative pole) nature of the information contained in the texts [11].As already reported, academic texts, including those from the biomedical field, tend to have high scores for D5, as they tend to contain technical, abstract concepts.
In our corpora, review papers score higher than RAs regardless of the publication date (Figure 7A).Although the D5 score decreased for both text types over the years, the gap between the two groups vanished by the mid-second decade of the 2000s (Figure 7A).In the last 5 years, the D5 score appeared to increase again in reviews only (Figure 7B).The frequent use of passives is a hallmark of abstract style, as it typically mitigates the action of an agent (even more so if the passive is agentless).These two indices-passives with a "by" agent and agentless passives-have been decreasing in both text types (Figure 7D,E), presumably driving the trend of the overall D5 score.The use of conjuncts, however, has increased both in reviews and RAs, and this increase has been quite sudden in the last 5 years for reviews, which might explain the surge in D5 score in that timeframe.

Conclusions
In conclusion, the analysis of over 1.2 million biomedical literature abstracts published in MEDLINE over the last 30 years reveals several noteworthy trends.The consolidation of an informational tone (D1) is observed in both research articles (RAs) and reviews.This is accompanied by a decrease in the use of narrative devices (D2), with this change being more pronounced in the RA corpus.Simultaneously, there is a parallel increase in context-independent stances (D3) in both RAs and reviews.The relative lack of overt persuasion (D4) in the examined academic texts has remained relatively stable over the years.Additionally, there is a decrease in the degree of abstractness, coinciding with a decline in the use of passive voice constructions.When comparing RAs to reviews, it becomes apparent that RAs used to rely more heavily on narration than reviews.However, RAs have toned down the use of this stylistic device to a level similar to that of reviews.On the other hand, reviews, as a manuscript type, historically exhibited a higher degree of content-independency, overt persuasion, and abstractness.These characteristics have been maintained over the years.This comprehensive multidimensional analysis provides valuable insights into the evolving linguistic and rhetorical characteristics of biomedical literature abstracts, shedding light on how different dimensions have changed over time and distinguishing patterns between RAs and reviews.

Figure 1 .
Figure 1.(A) Line plot of Dimension 1 (D1) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D1 score for RAs and reviews.

Figure 2 .
Figure 2. Scatter plots of linguistic features of Dimension 1 in RA and review corpora by publication years.These linguistic features change similarly in the 2 corpora.

Figure 1 .
Figure 1.(A) Line plot of Dimension 1 (D1) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D1 score for RAs and reviews.

Publications 2024 , 16 Figure 1 .
Figure 1.(A) Line plot of Dimension 1 (D1) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D1 score for RAs and reviews.

Figure 2 .
Figure 2. Scatter plots of linguistic features of Dimension 1 in RA and review corpora by publication years.These linguistic features change similarly in the 2 corpora.Figure 2. Scatter plots of linguistic features of Dimension 1 in RA and review corpora by publication years.These linguistic features change similarly in the 2 corpora.

Figure 2 .
Figure 2. Scatter plots of linguistic features of Dimension 1 in RA and review corpora by publication years.These linguistic features change similarly in the 2 corpora.Figure 2. Scatter plots of linguistic features of Dimension 1 in RA and review corpora by publication years.These linguistic features change similarly in the 2 corpora.

Figure 3 .
Figure 3. Scatter plots of the linguistic features of Dimension 1 in RA and review corpora by publication years.These features change differently in the 2 corpora.

Figure 3 .
Figure 3. Scatter plots of the linguistic features of Dimension 1 in RA and review corpora by publication years.These features change differently in the 2 corpora.

Figure 4 .
Figure 4. (A) Line plot of Dimension 2 (D2) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D2 score for RAs and reviews; (C-F) scatter plots of the linguistic features of D2 in RA and review corpora by publication years.

Figure 4 .
Figure 4. (A) Line plot of Dimension 2 (D2) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D2 score for RAs and reviews; (C-F) scatter plots of the linguistic features of D2 in RA and review corpora by publication years.

Figure 5 .
Figure 5. (A) Line plot of Dimension 3 (D3) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D3 score for RAs and reviews; (C,D) scatter plots of the linguistic features of D3 in RA and review corpora by publication years.

Figure 5 .
Figure 5. (A) Line plot of Dimension 3 (D3) score over the years for the research article (RA) corpus and the review corpus, in blue orange, respectively; (B) scatter plot of D3 score for RAs and reviews; (C,D) scatter plots of the linguistic features of D3 in RA and review corpora by publication years.

Figure 6 .
Figure 6.(A) Line plot of Dimension 4 (D4) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D4 score for RAs and reviews.; (C-F) scatter plots of the linguistic features of D4 in RA and review corpora by publication years.

Figure 6 .
Figure 6.(A) Line plot of Dimension 4 (D4) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D4 score for RAs and reviews; (C-F) scatter plots of the linguistic features of D4 in RA and review corpora by publication years.

Figure 7 .
Figure 7. (A) Line plot of Dimension 5 (D5) score over the years for the research article (RA) corpus and the review corpus, in blue and orange, respectively; (B) scatter plot of D5 score for RAs and reviews; (C-E) scatter plots of the linguistic features of D5 in RA and review corpora by publication years.

Table 1 .
Outline of the 5 dimensions of Biber's multidimensional analysis that were used in the present paper[11].