A Critical Appraisal of Evidence- and Consensus-Based Guidelines for Actinic Keratosis

Actinic keratoses (AK) are common lesions of the skin that can be effectively treated with several lesion- and field-directed treatments. Clinical practice guidelines assist physicians in choosing the appropriate treatment options for their patients. Here, we aimed to systematically identify and evaluate the methodological quality of currently available guidelines for AK. Guidelines published within the last 5 years were identified in a systematic search of guideline databases, Medline and Embase. Then, six independent reviewers evaluated the methodological quality using the tools “Appraisal of Guidelines for Research and Evaluation” (AGREE II) and “Recommendation EXcellence” (AGREE-REX). The Kruskal–Wallis (H) test was used to explore differences among subgroups and Spearman’s correlation to examine the relationship between individual domains. Three guidelines developed by consortia from Canada, Germany and the United Kingdom were eligible for the evaluation. The German guideline achieved the highest scores, fulfilling 65 to 92% of the criteria in AGREE II and 67 to 84% in AGREE-REX, whereas the Canadian guideline scored 31 to 71% of the criteria in AGREE II and 33 to 46% in AGREE-REX. The domains “stakeholder involvement“ and “values and preferences“ were identified as methodological weaknesses requiring particular attention and improvement in future guideline efforts.


Introduction
Actinic keratoses (AK) are common lesions of chronically sun-damaged keratinocytes in the epidermis, the upper layer of the skin [1]. They are most commonly found in areas that have been chronically exposed to ultraviolet (UV) radiation such as the head, face and the dorsal hands [2]. Affected areas usually present as red, scaly plaques with a rough surface [2]. AK can further progress to invasive squamous cell carcinoma of the skin [3]. However, it is currently not possible to predict which AK may progress and which do not, thus, consequent treatment of AK is recommended by several international practical guidelines, especially in high-risk patients [4][5][6].
Over the last decades, a broad variety of options have been licensed for AK treatment. They include lesion-directed options as cryotherapy and fractional laser therapy, which are suitable to treat single AK lesions. Another approach comprises field-directed treatments, which are usually deployed to treat larger areas of sun-damaged skin, for instance photodynamic therapy (PDT), which may be also effective in other skin diseases such as acne vulgaris, as recently shown by Del Duca et al. [7], microneedling and application of various topicals. Additionally, monotherapies can be combined for a stronger effect to achieve better results especially in difficult-to-treat or therapy-resistant AK patients [8][9][10][11][12]. The vast number of available and approved therapies may be both a blessing and a curse. Thus, up-to-date medical practical guidelines are valuable tools that help to select the most suitable and evidence-based approach for the individual patient with additional consideration of their personal preferences [13]. However, the provided recommendations should be developed in a structured process based on a sound methodological quality to ensure reliability and engagement.
In this study, we aimed to assess the methodological strengths and weaknesses of all currently available international guidelines on AK treatment using the two assessment instruments "Appraisal of Guidelines for Research and Evaluation (AGREE) II" and "AGREE-REX: Recommendation EXcellence" [14,15]. The widely used AGREE II appraisal tool from 2009 is an updated version of AGREE that was originally released in 2001. AGREE II consists of 23 items that are grouped in the six quality domains "scope and purpose", "stakeholder involvement", "rigor of development", "clarity of presentation", "applicability" and "editorial independence" as well as two items assessing the overall quality of the respective guideline. Recently, the AGREE-REX instrument was launched to complement guideline evaluation with AGREE II. AGREE-REX covers the topics clinical credibility and implementability and also assesses how values and preferences of target users, patients, policy and the guideline developers themselves have influenced the development of the recommendations.

Eligibility Criteria
In our appraisal, we only included guidelines developed by national or international consortia focusing on more than one option for the management of AK. They had to be published within the last 5 years, i.e., 2015-2019, as we only wanted to evaluate the most up-to-date guidelines. Furthermore, only English or German publications were included. Guidelines that had already expired or were not developed based on a systematic literature search followed by a structured consensus process, e.g., expert consent-based guidelines, were excluded.

Search Strategy and Selection of Guidelines
In order to identify potential guidelines for the evaluation, we systematically searched several guideline databases as well as Medline and Embase (both via Ovid) until 23 October 2019. The search included the terms "actinic keratosis/keratoses", "solar keratoses", "field cancerization", "senile keratoses" and "precancerous lesions". Besides, cross-references of included guidelines were screened as well. The detailed search strategies are shown in Supplementary Tables S1 and S2. The search results were screened for double hits. After their elimination, the remaining titles, abstracts or editorials were screened by two authors (M.V.H., T.S.), if they met the predefined eligibility criteria. Full-text guidelines of potentially relevant records were obtained and checked for eligibility again.

Data Extraction and Rating of the Guidelines
Background information including title, consortia and/or authors, country of origin, publication date, methodological approach and scope of all eligible guidelines were collected. Then, six independent reviewers (A.W., F.H., A.H., M.K., E.K., F.T.) evaluated their methodological quality using AGREE II and AGREE-REX as described previously [16]. Using AGREE II, the quality of each of the 23 items was assessed on a 7-point scale ranging from 1 ("strongly disagree") to 7 ("strongly agree") and similarly, the guideline's overall quality was evaluated on a 7-point scale ranging from lowest to highest possible quality. Furthermore, the question "I would recommend this guideline for use" was answered by each reviewer with "yes", "yes, with modifications" or "no". The 9 items supplied by AGREE-REX were also assessed on a 7-point scale ranging from 1 ("lowest quality") to 7 ("highest quality") as described previously [16]. All evaluations using AGREE II and AGREE-REX were blinded towards the other evaluators' assessments and performed independently. The platform "My AGREE Plus" provided by the AGREE consortium on https://www.agreetrust.org/ (last access: 31 May 2020) was used for evaluating the guidelines with the AGREE II instrument, whereas internally piloted data extraction spreadsheets (Microsoft Excel 2010) were used for the evaluation with AGREE-REX.

Analysis
Scores were calculated for each domain according to the instructions provided in the AGREE II and AGREE-REX instrument user manuals [15,17]. Total scores were expressed as percentages ranging from 0% as the worst to 100% as the best possible evaluation for each domain. Mean ± standard deviation (SD) was calculated for descriptive analyses. The Kruskal-Wallis (H) test was used to explore differences among subgroups and Spearman's correlation to examine the relationship between individual domains and items of the instruments. p-values < 0.05 were considered statistically significant. Ratings were grouped in the three categories "strongly agree" (6 and 7 points), "partly agree" (3 to 5 points) and "strongly disagree" (1 and 2 points) and Fleiss' Kappa was calculated in order to assess the interrater agreement of the six reviewers [18]. SPSS Statistics (version 24, IBM Corporation, Armonk, NY, USA) was used for all statistical analyses.

Guideline Identification
We initially identified 2612 records when searching the databases ( Figure 1). After the elimination of double hits (n = 126) and title and abstract screening, nine records remained for full-text review. Six records were excluded, as they had already expired (n = 2) [19,20], only dealt with cutaneous squamous cell carcinoma (n = 1) [21], were not evidence-based (n = 1) [22] or did not meet our language eligibility (n = 1) [23]. Another record was dismissed, as it was a review summary and not a guideline [24]. Finally, the following three guidelines met our eligibility criteria and, therefore, were included in our assessment: the guideline of the Canadian Non-Melanoma Skin Cancer Guidelines Committee [4], the guideline of the British Association of Dermatologists from the United Kingdom (UK) [6] and the guideline developed by the Association of the Scientific Medical Societies in Germany (AWMF) and the German Cancer Society (DKG) [5,10,25]. The full-length German guideline is also available in English in the "Supporting Information" section of the short version [10].

Evaluation of the Guidelines
The interrater agreement of the six reviewers regarding AGREE II and AGREE-REX was rated as fair with a Fleiss' Kappa of 0.299 (95% CI 0.263-0.336).

Scope and Purpose
This domain evaluates whether the main objectives of the guideline and the population for whom it was developed are clearly described. The average score was 5.17 (±1.49, Figure 2). The German guideline achieved the highest score fulfilling 89% of the criteria of this domain. The Canadian guideline was rated lowest with 48% and the UK guideline was rated in between achieving 71%. The German and the Canadian guideline significantly differed from each other (p = 0.01).

Stakeholder Involvement
This domain covers the topics involvement of appropriate stakeholders and whether the views of the users that should deploy the guideline are represented. The average score was 4.39 (±1.86). The German guideline achieved a very high value of 92%, while the UK and Canadian guidelines achieved only 37 and 41%, respectively. The German guideline significantly differed from both the Canadian (p = 0.012) and the UK guideline (p = 0.009).

Rigor of Development
The methodological approaches including a systematic and transparent identification of evidence are covered by the items of this domain. The mean score was 4.99 (±1.44). Again, the German guideline was rated as the one with the best methodological quality achieving 89% whereas the UK and the Canadian guidelines were rated worse with 63 and 48%, respectively. The German guideline also significantly differed from the Canadian guideline in this domain (p = 0.006).

Clarity and Presentation
This domain evaluates the presentation of the provided recommendations including the clarity of recommendations or if key recommendations can be easily found in the guideline text at a glance. The mean score was 6.02 (±0.87). Both the German and the UK guideline fulfilled almost all criteria (89 and 91%, respectively) and also the Canadian guideline fulfilled more than 2/3 of the criteria (71%).

Applicability
Processes concerning guideline implementation are evaluated in this domain. The mean score was 4.08 (±1.20). The German and UK guidelines achieved similar results (65 and 58%), while the Canadian guideline achieved the lowest rates fulfilling only 31% of the criteria. The German and Canadian guidelines also significantly differed in this domain (p = 0.004).

Editorial Independence
The role of funding and competing interest of the experts that were involved in the development process is evaluated in this domain. The mean score was 5.64 (± 1.23). Again, the German guideline was rated as the best (92%) followed by the UK (78%) and Canadian guideline (63%). Similar to the other domains, the Canadian and the German guidelines were rated to be significantly different of each other (p = 0.035).

Overall Assessment
This domain evaluates the overall quality and whether the reviewer would recommend to use the guideline in practice. The mean score was 4.79 (±1.12), and the German guideline was rated as the one with the best overall quality (83%). The UK and Canadian guidelines were rated lower with 63 and 47%, respectively. All reviewers recommended to use the German guideline without any modifications. The use of the UK guideline was also recommended, but half of the reviewers rated to use it with modifications. In contrast, the ratings regarding the recommendation to use the Canadian guideline were ambiguous: two reviewers rated to use it, two to use it with modifications while another two reviewers recommended not to use this guideline.

Clinical Applicability
This domain assesses whether the recommendations were developed based on a thorough review of the existing literature and whether they are applicable for the intended users (e.g., physicians, patients). The mean score was 4.89 (±1.20). The fulfilled criteria in this domain ranged from 45% (Canada) and 65% (UK) to 84% (Germany). In this domain, the German and the Canadian guidelines significantly differed from each other (p = 0.002).

Values and Preferences
This domain evaluates whether the preferences of the intended users, patients, policy/decision-makers and guideline developers have been taken into consideration during the guideline development process. The mean score was 3.92 (±1.16). The German guideline achieved 67%, whereas the UK and the Canadian guidelines were rated similar (46 and 33%, respectively). Here, the German and Canadian guidelines significantly differed again (p = 0.004).

Implementability
This domain asks how suitable the recommendations are for the patients and/or the health care system in which they should be implemented. The mean score of this domain was 4.83 (±1.11). The German guideline achieved the highest rates with 78%. The UK guideline was rated lower with 68%, and the Canadian guideline was rated with the lowest scores (46%). The German and Canadian guidelines also significantly differed in this domain (p = 0.004).

Correlations of the AGREE II and AGREE-REX Domains
Most of the AGREE II and the AGREE-REX domains were significantly positively correlated with each other (Figure 3). The domain "scope and purpose" was highly positively correlated with the domains "rigor of development" (r = 0.84) and "clinical applicability" of the AGREE-REX tool (r = 0.86). Additionally, the domain "stakeholder involvement" was highly correlated with the domains "rigor of development" (r = 0.81) and "values and preferences" (r = 0.83). Furthermore, the domains "clinical applicability" and "implementability" showed a high positive correlation (r = 0.84). and the Canadian guideline was rated with the lowest scores (46%). The German and Canadian guidelines also significantly differed in this domain (p = 0.004).

Correlations of the AGREE II and AGREE-REX Domains
Most of the AGREE II and the AGREE-REX domains were significantly positively correlated with each other (Figure 3). The domain "scope and purpose" was highly positively correlated with the domains "rigor of development" (r = 0.84) and "clinical applicability" of the AGREE-REX tool (r = 0.86). Additionally, the domain "stakeholder involvement" was highly correlated with the domains "rigor of development" (r = 0.81) and "values and preferences" (r = 0.83). Furthermore, the domains "clinical applicability" and "implementability" showed a high positive correlation (r = 0.84).

Discussion
AK are one of the most commonly diagnosed conditions in dermatology [26]. Due to the overwhelming number of available treatment options, choosing the most appropriate intervention for each patient can be challenging. In this study, we evaluated currently available guidelines on AK using the appraisal instruments AGREE II and AGREE-REX. The AGREE II tool and its previous version AGREE have already been successfully used in other evaluations in the field of dermatology and guideline development [27,28]. In an evaluation of published guidelines by the European Dermatology Forum (EDF), the assessment with AGREE highlighted that evidence-and consensusbased guidelines ("S3 level") generally received the highest score in comparison to guidelines derived

Discussion
AK are one of the most commonly diagnosed conditions in dermatology [26]. Due to the overwhelming number of available treatment options, choosing the most appropriate intervention for each patient can be challenging. In this study, we evaluated currently available guidelines on AK using the appraisal instruments AGREE II and AGREE-REX. The AGREE II tool and its previous version AGREE have already been successfully used in other evaluations in the field of dermatology and guideline development [27,28]. In an evaluation of published guidelines by the European Dermatology Forum (EDF), the assessment with AGREE highlighted that evidence-and consensus-based guidelines ("S3 level") generally received the highest score in comparison to guidelines derived through either a structured consensus process, a systematic literature assessment or on informal consensus only [27]. Thus, identifying evidence-and consensus-based guidelines and their methodological strengths and weaknesses is essential for improving the overall quality of national as well as international guidelines that can be used as a template for country-specific adaptions.
Surprisingly, although AK are a very common health problem especially in fair-skinned patients and account for large disease burden form a public health care perspective, we only identified three currently valid evidence-and consensus-based guidelines dealing with this topic, which were published within the last 5 years and matched our pre-defined eligibility criteria. The detailed international guideline on AK treatment developed by the International League of Dermatological Societies (ILDS) and published in 2015 was not included in this evaluation, as it had already expired in July 2018 [19].
Developing guidelines and keeping them up-do-date is labor-and cost-intensive. In the field of AK, developers are confronted with a vast number of treatment options including several monotherapies as well as combinations of them. Scanning all the evidence available is time-consuming and difficult. Besides, the quality of the body of evidence varies across interventions, making it difficult to compare the efficacy of different approaches and derive recommendations. However, regular updating is crucial for maintaining the quality of the provided guidance as seen in the case of ingenol mebutate (IMB), which was approved by the Food and Drug Administration (FDA) and European Medicines Agency (EMA) as a topical intervention for AK in 2012 [29]. Recently, the EMA decided to suspend its usage in January 2020, as a post-marketing surveillance study had shown that patients treated with IMB showed a higher incidence of skin cancer compared to imiquimod. Surprisingly, all evaluated guidelines recommended the use of IMB, but only the German guideline was amended in March 2020 in order to provide a footnote to no longer recommend the use of IMB [5]. This example underlines that continuous updating of guidelines is indispensable for state of the art patient care.
The German guideline achieved the highest scores in all domains of AGREE II and AGREE-REX ranging from 65% (applicability) to 92% (stakeholder involvement and editorial independence). On the other hand, the Canadian guideline was rated as the guideline with the poorest methodological quality in our appraisal with scores ranging from 31% (applicability) to 71% (clarity of presentation). Overall, the domain "clarity of presentation" was rated best among all evaluated domains ranging from 71% (Canadian guideline) to 91% (German guideline), indicating that the recommendations provided by the guidelines are unambiguous and clear and can be easily found in the guideline texts. In contrast, the domains "applicability" and "values and preferences" achieved the lowest scores ranging from 31 to 65% and 33 to 67%, respectively. Interestingly, these two domains also achieved only low scores in a recent appraisal of currently available evidence-and consensus-based melanoma guidelines [30], indicating that guideline developers may not pay sufficient attention to these domains in general and tend to neglect them. As the German evidence-and consensus-based guideline is being updated at the moment [10,25], it is of utmost importance to improve this weakness in the update.
Major differences between the guidelines were observed in the evaluation of the domain "stakeholder involvement". Here, the German guideline achieved a good result of 92%, whereas both the UK and the Canadian guideline were rated worse, achieving only 37 and 41%, respectively. The lack of participation of important target groups may severely hamper the implementation of the recommendations into the real world setting. Thus, when updating these guidelines, developers should particularly focus on this part to improve the overall quality in the future. Especially patient representatives should ultimately be involved in the development of guidelines and might be actively approached through patient support groups.
Overall, the German guideline achieved the highest scores in all domains in both instruments. This might be due to the fact that the AWMF and DKG, which guide the process of oncological guideline development in Germany, provide not only support but also build a solid methodological framework of rules for the guideline authors they have to adhere to. Besides, the German guideline provided by far the most detailed Supplementary Materials including very detailed descriptions, which facilitated the identification of relevant content for the appraisal. According to the UK guideline [6], the AGREE II instrument also served as a guide for its development. This might explain why the UK guideline achieved better results compared to the Canadian guideline, although both guideline texts are similarly short.
This study has several limitations. We only evaluated the methodological quality of the guidelines, but not the content or the medical content of the recommendations themselves. This might be problematic as seen in the abovementioned case of IMB. Furthermore, we cannot fully exclude that the six reviewers may have been biased as all of them are from Germany. Additionally, three members of this research team (T.S., C.B., M.V.H.) were at least partly involved in the development process of the evaluated German guideline. However, these three were not part of the appraisal team and did not evaluate the quality of any guideline. Furthermore, the language restrictions to English and German may have led to the exclusion of relevant guidelines and may have introduced risk for selection bias.

Conclusions
Taken together, we identified three currently available guidelines on AK treatment that were published within the last five years. Two of them showed substantial methodological weaknesses. Only the German guideline, which was rated as the best in this evaluation, fulfilled most of the evaluated criteria and, therefore, may be used as a role model for developing or updating future guidelines. Paying special attention to the domains "applicability" and "values and preferences" that achieved low scores in all three guidelines is required.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest: C.B. reports personal fees from Almirall-Hermal, personal fees from Galderma, grants and personal fees from Leo Pharma and grants from Biofrontera outside the submitted work. M.V.H. reports grants from German Cancer Aid (Deutsche Krebshilfe) outside the submitted work. T.S., C.B. and M.V.H. were involved in the development of the German S3 guideline "Actinic keratosis and cutaneous squamous cell carcinoma" but did not participate in the AGREE assessments. The remaining authors have no conflicts of interests to declare.