A Machine-Learning-Based Bibliometric Analysis of the Scientific Literature on Anal Cancer

Simple Summary Squamous-cell carcinoma of the anus, being a rare cancer, requires national and international collaborations, networking, organizational proficiency and leadership to overcome barriers towards the implementation of clinical trials to establish improved standards of care treatment strategies and the conduction of translational research projects to shed light into its biology and molecular characterization. The purpose of the present study is to obtain a global frame of the scientific literature related to anal cancer, through a bibliometric analysis of the published articles during the last 20 years (2000–2020), exploring trends and common patterns in research, tracking collaboration and networks to foresee future directions in basic and clinical research. Abstract Squamous-cell carcinoma of the anus (ASCC) is a rare disease. Barriers have been encountered to conduct clinical and translational research in this setting. Despite this, ASCC has been a prime example of collaboration amongst researchers. We performed a bibliometric analysis of ASCC-related literature of the last 20 years, exploring common patterns in research, tracking collaboration and identifying gaps. The electronic Scopus database was searched using the keywords “anal cancer”, to include manuscripts published in English, between 2000 and 2020. Data analysis was performed using R-Studio 0.98.1091 software. A machine-learning bibliometric method was applied. The bibliometrix R package was used. A total of 2322 scientific documents was found. The average annual growth rate in publication was around 40% during 2000–2020. The five most productive countries were United States of America (USA), United Kingdom (UK), France, Italy and Australia. The USA and UK had the greatest link strength of international collaboration (22.6% and 19.0%). Two main clusters of keywords for published research were identified: (a) prevention and screening and (b) overall management. Emerging topics included imaging, biomarkers and patient-reported outcomes. Further efforts are required to increase collaboration and funding to sustain future research in the setting of ASCC.


Introduction
Squamous-cell carcinoma of the anus (ASCC) is considered a rare cancer, with an annual incidence of 0.5-2 new cases in 100,000 individuals [1]. Nevertheless, the incidence rate is constantly increasing, particularly in Europe, United States of America (USA) and Australia, mostly due to the human papilloma virus (HPV) epidemics in these regions [2]. Therefore, this clinical-pathological entity is regarded as a matter for global public health attention, with a focus on primary and secondary prevention, treatment and survivorship [3]. Given the rarity of the disease, barriers have been encountered when carrying out clinical trials to establish standard treatment strategies and to conduct translational research projects to shed light into its biology and molecular characterization [4,5]. However, despite these, ASCC has been a prime example of national and international collaborations, networking abilities, organizational proficiency and leadership. We performed a bibliometric analysis of ASCC-related literature of the last 20 years, exploring trends and common patterns in research, tracking collaboration and networks and foreseeing future directions in basic and clinical research.

Materials and Methods
The electronic Scopus database was searched using the keywords "anal cancer". Literature search was restricted to include manuscripts written in English and published between 1 January 2000 and 31 December 2020. The results of the electronic search were exported in a dataset and included citation information (authors, document title, year of publication, source title, volume, issue, pages, citation count, source and document type) and bibliographical information (affiliations, editors, keywords and funding details). Data analysis was performed using R-Studio 0.98.1091 software. A machine-learning bibliometric methodology was applied to evaluate the distribution of each factor. The bibliometrix R package was used [6]. "Summary ()" function was employed to summarize the main information regarding the bibliographic data, such as the annual percentage growth rate (in terms of relative percentage change between a later and an earlier time point), the most productive countries (based on first author's affiliation), the country-specific production (based on authors appearance by country affiliations), the most relevant affiliations (based on disambiguated affiliation items, applying semantic similarity), the most-cited papers, the most-represented journals and the most frequent keywords. A factorial-analysis methodology was applied to identify joint keywords. The "conceptualStructure" function was used and multiple-correspondence analysis was applied to a Document x Word matrix [6]. The following options were used: method = "multidimensional scaling", field = "Author's keywords", number of terms = "50", number of clusters = "Auto". Author's keywords were plotted on a two-dimensional map and results were interpreted according to relative-point positions and their distribution along dimensions [6]. Scientific collaboration analysis was obtained using the "biblioNetwork ()" function that calculated a country collaboration network [NetMatrix <-biblioNetwork(M, analysis = "collaboration", network = "countries", sep = ";")] [6]. International collaboration intensity of a specific country was based on the number of documents in which at least one coauthor worked in different country compared to the corresponding author [6]. "Biblioshiny ()" function was used to plot the global collaboration maps.

Publication Numbers
A total of 2322 scientific documents were found, comprising articles (n = 1863), reviews (n = 281), letters (n = 83), conference papers (n = 58), editorials (n = 29) and short surveys (n = 8) (as per Scopus: short or mini-reviews of original research). There were fewer than 100 publications per year until 2010, then a significant increase in the subsequent decade, reaching a total of 223 manuscripts published in 2020. The average annual growth rate was around 40% over the period of analysis.

Publication Numbers
A total of 2322 scientific documents were found, comprising articles (n = 1863), reviews (n = 281), letters (n = 83), conference papers (n = 58), editorials (n = 29) and short surveys (n = 8) (as per Scopus: short or mini-reviews of original research). There were fewer than 100 publications per year until 2010, then a significant increase in the subsequent decade, reaching a total of 223 manuscripts published in 2020. The average annual growth rate was around 40% over the period of analysis.
In total, there were 47 countries represented by authors participating in the global anal-cancer literature ( Figure 2). The USA and UK had the greatest link strength of international collaboration (22.6% and 19.0%, respectively). France, Australia and Italy had a high number of publications, but their international collaboration strength was relatively lower (13.6%, 16.9% and 8.0%, respectively).  In total, there were 47 countries represented by authors participating in the global analcancer literature ( Figure 2). The USA and UK had the greatest link strength of international collaboration (22.6% and 19.0%, respectively). France, Australia and Italy had a high number of publications, but their international collaboration strength was relatively lower (13.6%, 16.9% and 8.0%, respectively).

Keywords
As depicted in Figure 3, the factorial analysis identified two main clusters of keywords: (i) those mainly addressing screening and anal dysplasia; (ii) keywords linked to the overall management of anal cancer, from epidemiology and risk factors to treatmentrelated topics and survival.
After detailed analysis of the trend topics, we observed new items emerging in the last two years, including functional imaging [mostly, positron-emission tomography (PET)], biomarkers and patient-reported outcomes. To present a better visualization of the data records, a focus of a time span over ten years (2010-2020) is shown in Figure 4.

Keywords
As depicted in Figure 3, the factorial analysis identified two main clusters of keywords: (i) those mainly addressing screening and anal dysplasia; (ii) keywords linked to the overall management of anal cancer, from epidemiology and risk factors to treatment-related topics and survival.  After detailed analysis of the trend topics, we observed new items emerging in the last two years, including functional imaging [mostly, positron-emission tomography (PET)], biomarkers and patient-reported outcomes. To present a better visualization of the data records, a focus of a time span over ten years (2010-2020) is shown in Figure 4.

Journals
In the 20 years studied, 758 journals published articles on ASCC. Overall, 24% of journals (n = 182) have published more than two documents. The most represented journals were "Radiotherapy and Oncology", "International Journal of Radiation Oncology Biology Physics" and "Diseases of the Colon and Rectum" with a total number of publications of 48, 47 and 44, respectively (see Figure 5).

Journals
In the 20 years studied, 758 journals published articles on ASCC. Overall, 24% of journals (n = 182) have published more than two documents. The most represented journals were "Radiotherapy and Oncology", "International Journal of Radiation Oncology Biology Physics" and "Diseases of the Colon and Rectum" with a total number of publications of 48, 47 and 44, respectively (see Figure 5). The three-field plot in Figure 6 reveals the relations between the most prolific countries and the relevant keywords and sources. The keyword parameter was used as a surrogate of research topic. The analysis of the top ten countries demonstrated that researchers around the globe had strong relations with the main topics of the literature [anal cancer, HPV, human immunodeficiency virus (HIV)] and had most frequently published within high-ranked international journals.  The three-field plot in Figure 6 reveals the relations between the most prolific countries and the relevant keywords and sources. The keyword parameter was used as a surrogate of research topic. The analysis of the top ten countries demonstrated that researchers around the globe had strong relations with the main topics of the literature [anal cancer, HPV, human immunodeficiency virus (HIV)] and had most frequently published within high-ranked international journals.

Affiliations and Citations
The institution with the highest number of contributions was the University of California, USA with a total of 129 documents, followed by the University of Melbourne, Australia (n = 65) and Mayo Clinic, USA (n = 52). It must be noted that despite the affiliation disambiguation, there are different locations indicating the same affiliation and the method may not precisely measure the similarities between affiliations. For instance, the

Affiliations and Citations
The institution with the highest number of contributions was the University of California, USA with a total of 129 documents, followed by the University of Melbourne, Melbourne, VIC, Australia (n = 65) and Mayo Clinic, Rochester, MN, USA (n = 52). It must be noted that despite the affiliation disambiguation, there are different locations indicating the same affiliation and the method may not precisely measure the similarities between affiliations. For instance, the contribution activity of the University of California includes different institutes' resources, such as those located in San Francisco (UCSF), San Diego (UCSD), Los Angeles (UCLA), Irvine (UC Irvine) and Davis (UC Davis).

37
Kachnic LA, 2013, Int J Radiat Oncol Biol Phys [14] RTOG 0529: a phase 2 evaluation of dose-painted intensity modulated radiation therapy in combination with 5-fluorouracil and mitomycin-C for the reduction of acute morbidity in carcinoma of the anal canal 351 39 Shiels MS, 2009, J Acquired Immune Defic Syndr [15] A meta-analysis of the incidence of non-AIDS cancers in HIV-infected  individuals   350  27 Giuliano AR, 2011, Lancet [16] Incidence and clearance of genital human papillomavirus infection in men (HIM): a cohort study 335 31 * Number of times each document has been cited; ** yearly average number of times each document has been cited.

Discussion
The annual scientific production published on ASCC is increasing, with an average annual growth rate around 40% and a steeper increase between 2010 and 2020 compared to the previous decade. Interestingly, amongst the more than 2000 publications published between 2000 and 2020, around 80% comprise original articles. A good proportion of them are published in high-ranked journals covering the field of clinical, radiation and gastrointestinal oncology. The geographical distribution covers all continents, which were generators of clinical and translational research. The predominant hubs for scientific production are the USA, Europe and Australia. Additionally, an active role is also played by Asian countries, despite the lower incidence of the disease in those areas, and by South America, particularly Brazil. Nations such as the USA and UK have both a high quantitative scientific throughput and an efficient collaborative international networking capacity, while others such as France, Italy and Australia can sustain high scientific productivity with a prevalent 'standalone' approach, focused more on national projects and collaboration.
The geographical distribution of the scientific production is mirrored by the most productive academic institutions, which are located in the USA (University of California and Mayo Clinic) and Australia (University of Melbourne). One limitation of the current analysis is that collaboration was measured using the simple surrogate of coauthorship, which not necessarily reflects a well-functioning and active scientific network.
Nevertheless, collaboration and networking are crucial to promote clinical and translational research in a rare disease such as ASCC, where at present, appropriately funded national and international projects are scarce. However, in recent years, different initiatives have been settled and implemented to promote innovation in this setting. As an example, the International Rare Cancer Initiative (IRCI) for the relapsed/metastatic anal-cancer group completed the first multicentric randomized prospective controlled trial on advanced anal cancer, the InterAACT trial, which succeeded in patient recruitment across different collaborative groups in Europe [United Kingdom, Nordic Anal Cancer Group (NOAC), European Organization for Research and Treatment of Cancer (EORTC)], the US (NCIendorsed initiative) and Australia (The Australasian Gastrointestinal Trials Group, AGITG), leading to a new standard of care for systemic therapy in metastatic ASCC [17]. The same cooperative group has been expanded and is now evaluating the role of anti-programmed cell death protein 1 (PD-1) inhibitor in inoperable, locally recurrent or metastatic ASCC not previously treated with systemic chemotherapy within a phase III, global and multicenter, double-blind randomized study (POD1UM-303/InterAACT2 trial) [18]. Other examples of cooperation in this clinical setting comprise the EORTC Gastrointestinal Tract Cancer Group (GITCG) Rectum, Anal Task Force and the National Cancer Institute's National Clinical Trials Network (NCTN) Rectal-Anal Task Force. In the Nordic European countries, collaborations within NOAC have led to joint studies and the possibility to discuss advances in research and therapy [19].
In 2020 and 2021, clinicians and researchers organized two International Multidisciplinary Anal Cancer Conference (IMACC) Webinars. There was also a physical meeting held in Aarhus, Denmark, in 2021 [4,20]. These meetings presented an opportunity to discuss several potential topics and open issues for future collaborations, including the implementation of global prospective registries to record outcomes of specific patient subgroups, the harmonization of clinical-trial designs and outcomes to facilitate data pooling and analysis, and the need and rationale for developing biological and translational studies [4].
In our bibliometric evaluation, two main clusters of keywords were identified with the factorial analysis: the first being primary and secondary prevention, including screening and early detection; the second being overall therapeutic management. The last two decades have shed light into the causal effect of HPV infection as a prominent risk factor for precancerous lesions and invasive ASCC, together with information on the molecular pattern, biological characteristics and prognosis of HPV-related ASCC. The role of the HPV vaccine against anal HPV infection and anal intraepithelial neoplasia and its potential effect on the prevention of invasive cancer was also established [10]. Counterintuitively, the high number of publications related to the role of HPV in ASCC and the high quality of the scientific contributions published does not necessarily match the interest of the general community about this topic and the evidence of HPV-related cancer epidemics. A stronger effort should probably be made to raise awareness in the population about risk factors and sexual behaviors, to enhance prevention, screening and early detection and to advocate amongst policy makers for further dissemination of vaccination programs in young males and females [21]. With respect to the therapeutic management of ASCC, given that concurrent chemoradiation based on 5-fluorouracil/mitomycin-C (MMC) is the standard of care as demonstrated by first-generation trials [22][23][24], the last two decades reported the publication of three prospective randomized phase III trials studying potential new standard options. It was demonstrated that there is no benefit in substituting MMC to cisplatin, nor for induction chemotherapy, higher radiation-boost dose and maintenance chemotherapy [11,12,25]. Two of these trials are amongst the 10 most-cited publications in the study period [11,12]. Another highly cited document, reporting the results of the RTOG 0529 trial, explored the role of intensity-modulated radiotherapy (IMRT) to reduce the treatment-related acute toxicity profile [14]. Still, there have been relatively few large randomized trials to determine the optimal treatment regimens for ASCC. Currently, a Cancers 2022, 14, 1697 9 of 12 large UK trial (PLATO) is investigating the optimal radiation doses for different clinical stages.
It is interesting to note that a frequent focus has been on imaging, particularly magnetic resonance imaging (MRI) and fluorodeoxyglucose (FDG)-PET, highlighting the growing interest for these diagnostics tools in this setting. Pelvic MRI is now considered a mandatory examination, in agreement with the updated European Society for Medical Oncology (ESMO) guidelines, providing vital diagnostic, staging and prognostic information [5]. Moreover, FDG-PET is recommended, being potentially useful to confirm or exclude suspicious features on MRI, as well as to drive target volume selection and delineation for tailored IMRT approaches [5,[26][27][28].
The Core Outcomes for clinical trials of chemoradiation for anal cancer (CORMAC) initiative in the UK and the work carried out within the EORTC Quality of Life (QoL) Group to develop the EORTC QLQ-ANL27 anal-cancer-specific questionnaire are valuable examples of the recent interest of the scientific community for patient survivorship and reported outcomes and QoL [37,38].
Of note, the terms related to late gastrointestinal function, sexual dysfunction and bone fracture do not appear as specific areas of previous research during the analyzed time period. Such data would, however, fuel the rationale for the needed clinical research aiming at decreasing sequelae, with treatment de-intensification and/or improved radiotherapy technologies [39,40].
Despite the relevant research achievements in ASCC during the last 2 decades, important gaps and unmet needs still exist, and there is a need to improve and increase international cooperation, under a shared vision and a global perspective. Some of the countries with a high incidence of ASCC (such as sub-Saharan Africa and South America) are underrepresented in the collaborative networks and very few studies on ASCC have been conducted enrolling patients in these regions. This observation hampers the generalizability of the clinical results observed in the randomized trials that set the standard of care for ASCC and call for external validation of these findings in underrepresented populations via a global effort.
Few studies are available on equitable access to diagnosis and treatment for ASCC patients. Racial, gender and socioeconomic status disparities have been documented in this setting, and hence collaborative initiatives should be promoted to dissect causation, mitigate the impact and suggest potential operative solutions [41,42]. Similarly, for the subpopulation of HIV-positive ASCC patients, few clinical data are available highlighting the need for collaborative initiatives to fill the gap, such as the ACTION HIV lead by researchers in Brazil, to establish a global registry to collect clinical data and report on treatment outcomes in HIV ASCC patients [43].

Conclusions
We presented descriptive data from a bibliometric analysis on publications over the past 20 years on ASCC. Prevention and screening, diagnostics and biomarker landscapes, together with treatment strategies are the most explored topics in ASCC research. New topics are emerging, however. There is enthusiasm, precedence and growing output from international cooperation in the ASCC research field to promote advances and to identify and fulfill the unmet needs in this setting. It is finally important to highlight that there is an urgent need to increase collaboration and funding, to design proper clinical trials and implement timely translational-research projects to optimize treatment and improve clinical outcomes in patients with ASCC.