Structured Reporting of Lung Cancer Staging: A Consensus Proposal

Background: Structured reporting (SR) in radiology is becoming necessary and has recently been recognized by major scientific societies. This study aimed to build CT-based structured reports for lung cancer during the staging phase, in order to improve communication between radiologists, members of the multidisciplinary team and patients. Materials and Methods: A panel of expert radiologists, members of the Italian Society of Medical and Interventional Radiology, was established. A modified Delphi exercise was used to build the structural report and to assess the level of agreement for all the report sections. The Cronbach’s alpha (Cα) correlation coefficient was used to assess internal consistency for each section and to perform a quality analysis according to the average inter-item correlation. Results: The final SR version was built by including 16 items in the “Patient Clinical Data” section, 4 items in the “Clinical Evaluation” section, 8 items in the “Exam Technique” section, 22 items in the “Report” section, and 5 items in the “Conclusion” section. Overall, 55 items were included in the final version of the SR. The overall mean of the scores of the experts and the sum of scores for the structured report were 4.5 (range 1–5) and 631 (mean value 67.54, STD 7.53), respectively, in the first round. The items of the structured report with higher accordance in the first round were primary lesion features, lymph nodes, metastasis and conclusions. The overall mean of the scores of the experts and the sum of scores for staging in the structured report were 4.7 (range 4–5) and 807 (mean value 70.11, STD 4.81), respectively, in the second round. The Cronbach’s alpha (Cα) correlation coefficient was 0.89 in the first round and 0.92 in the second round for staging in the structured report. Conclusions: The wide implementation of SR is critical for providing referring physicians and patients with the best quality of service, and for providing researchers with the best quality of data in the context of the big data exploitation of the available clinical data. Implementation is complex, requiring mature technology to successfully address pending user-friendliness, organizational and interoperability challenges.


Introduction
The American Recovery and Reinvestment Act and the Health Information Technology for Economic and Clinical Health Act have indicated that structuring data in health records will lead to an important improvement in patient outcomes [1,2]. Since the radiology report is part of the health record, the current format of free-text reporting (FTR) should be organized and shifted toward structured reporting (SR). The issue of whether all radiological examinations should contain a structured report, and if so, what the actual report structure should be, remains open [1][2][3]. According to the European Society of Radiology's (ESR) paper on SR in radiology [1], the three main reasons for moving from FTR to SR are quality, data quantification and accessibility. A critical quality improvement dimension resulting from the use of SR is standardization. The use of templates in SR provides a checklist as to whether all relevant items for a particular examination have been addressed. Thanks to this "structure", the radiology report will also allow the association of radiological data and other key clinical features, leading to a precise diagnosis and personalized medicine. With regards to accessibility, it is known that radiology reports are a rich source of data for research. This allows automated data mining, which may help to validate the relevance of imaging biomarkers by highlighting the clinical contexts in which they are most appropriate, and to devise potential new application domains. For this reason, radiology reports should be structured via their content, based on standard terminology, and should be accessible via standard access mechanisms and protocols.
Weiss et al. have described three levels of SR [4]: 1.
The first level is a structured format with paragraphs and subheadings. Currently, almost all radiology reports display this structure, with sections for clinical information, examination protocol and radiological findings, and a conclusion to highlight the most important findings.

2.
The second level refers to consistent organization. For example, rectal cancer magnetic resonance imaging (MRI) describes all relevant features, such as tumor (T) stage, node (N) stage, anal sphincter complex involvement, tumor deposits in the mesorectal space, extramural vascular invasion, etc. 3.
The third level directly addresses the consistent use of dedicated terminology, namely, standard language.
Several proposals have been made by major International Societies of Radiology to support the use of SR [5][6][7][8][9][10]. The Italian Society of Medical and Interventional Radiology (SIRM) has created an Italian warehouse of SR templates, which can be freely accessed by all SIRM members, for the purpose of being routinely used in a clinical setting [11].
Despite these promising developments, SR has not yet been established in clinical routine. A survey of Italian radiologists found that the majority of those surveyed had heard of SR, but only a minority of them regularly used it in their clinical work [10]. Reasons for this include the current lack of usable templates and the minimal availability of software solutions for SR [10].
Lung cancer is the leading cause of cancer morbidity and mortality in men, whereas in women, it ranks third for incidence after breast and colorectal cancer, and second for mortality after breast cancer [12]. The incidence and mortality rates are roughly twice as high in men than in women, although the male-to-female ratio varies widely across regions. Lung cancer incidence and mortality rates are 3 to 4 times higher in transitioned countries than in transitioning countries; this pattern may well change as the tobacco epidemic evolves, given that 80% of smokers aged 15 years or older resided in low-income and middle-income countries in 2016 [12,13]. In the absence of symptoms to identify early lung cancer, screening high-risk individuals has the potential of shifting the diagnosis to earlier stages [14][15][16][17][18][19][20]. After more than 30 years of research, a large randomized controlled trial established that low-dose computed tomography (CT) improved mortality in patients at high risk for lung cancer. Subsequently, the majority of professional societies emphasize the importance of lung cancer screening. Although lung cancer screening is not unanimously recommended, the value of identifying early-stage lung cancer cannot be overstated. The majority of new cases of lung cancer present in advanced stages (III-IV), when a cure is unlikely or unattainable [21].
In this context, a disease-specific SR could be an effective tool for conveying all diagnostic imaging information needed for a correct lung cancer diagnosis and staging, while including clinical information required for personalized patient management.
The aim of the present study was to propose an SR template that can guide radiologists in the systematic reporting of CT examinations for lung cancer staging, in order to improve communication between radiologists and clinicians, particularly in non-referral centers.

Panel Expert
As a result of critical discussion between expert radiologists, a multi-round consensusbuilding Delphi exercise was carried out to develop a comprehensive and focused SR template for CT staging of patients with lung cancer.
A SIRM radiologist expert in thoracic imaging created the first draft of the SR template for lung cancer staging CT examinations.
A working team of 13 experts from the Italian College of Thoracic Radiologists and of Diagnostic Imaging in Oncology Radiologists from SIRM was established to iteratively revise the initial draft, with the aim of reaching a final consensus on SR.

Selection of the Delphi Domains and Items
All the experts reviewed the literature data on the main scientific databases (including Pubmed, Scopus and Google Scholar) from December 2000 to December 2020, in order to assess papers on lung cancer CT and radiological SR. The full text of the studies selected was reviewed by all members of the expert panel, and each of them developed and shared the list of Delphi items via email and/or teleconference.
The SR was divided into five sections: (1) Patient Clinical Data, (2) Clinical Evaluation, (3) Exam Technique, (4) Report and (5) Conclusion. A dedicated section for key images was added as part of the report.

1.
The "Patient Clinical Data" section included patient clinical data and previous or family history of malignancies, including previous lung cancer, risk factors or predisposing pathologies. In this section, the item of "Allergies" to drugs and contrast medium was included.

2.
The "Clinical Evaluation" section included previous examination results, a genetic panel and clinical symptoms. 3.
The "Exam Technique" section included data regarding the CT equipment used (including the number of detector rows and whether single or dual energy scans were performed) and information concerning reconstruction algorithm(s) and slice thickness. Data on the contrast protocol were also collected (including information regarding post-contrast acquisitions), as well as data concerning the contrast medium (such as contrast active principle, commercial name, volume, flow rate, iodine concentration, and ongoing adverse events). 4.
The "Report" section included data regarding lung cancer location, morphology, margin sharpness, texture (e.g., solid, ground glass), contrast enhancement pattern, size, local invasion, tumor stage, node stage and metastatic stage, according to the Italian Association of Medical Oncology (AIOM) guidelines [22]. In this section, a dedicated subsection for other types of primary lung cancers was included. 5.
The "Conclusion" section included diagnosis, TNM stage according to the 8th Edition of AJCC-UICC 2017 [23], annotations and comments.
Two Delphi rounds were carried out [24]. During the first round, each panelist independently contributed to refining the SR draft by means of online meetings or email exchanges. The level of panelist agreement for each SR model was tested in the second Delphi round, using a Google Form questionnaire shared by email. Each expert made individual comments for each specific template part (i.e., patient clinical data, clinical evaluation, exam technique, report and conclusion, images) using a five-point Likert scale (1 = strongly disagree, 2 = slightly disagree, 3 = slightly agree; 4 = modestly agree, 5 = strongly agree).
After the second Delphi round, the final version of the SR was generated on the dedicated Radiological Society of North America (RSNA) website (radreport.org), using a T-Rex template format in line with the IHE (Integrating the Healthcare Enterprise) and MRRT (Management of Radiology Report Templates) profiles, accessible as open-source software, with the technical support of Exprivia™. These determined both the format of the radiology report templates (using version 5 of the Hypertext Markup Language (HTML5)) and the transporting mechanism used to request, get back and stock these schedules [25]. The radiology report was structured using a series of "codified queries" integrated into the T-Rex editor's preselected sections [25].

Statistical Analysis
All ratings of the panelists for each section were analyzed using descriptive statistics measuring the mean score, the standard deviation value (STD) and the sum of scores. A mean score of 3 was considered good and a score ≥4 excellent.
To measure the internal consistency of the panelist ratings for each section of the report, a quality analysis based on the average inter-item correlation was carried out using the Cronbach's alpha (Cα) correlation coefficient [26,27]. The Cα test provides a measure of the internal consistency of a test or scale; it is expressed as a number between 0 and 1. Internal consistency describes the extent to which all the items in a test measure the same concept. The Cα correlation coefficient was determined after each round.
The closer to 1.0 the Cα coefficient, the greater the internal consistency of the items in the scale. An alpha coefficient (α) > 0.9 was considered excellent, α > 0.8 good, α > 0.7 acceptable, α > 0.6 questionable, α > 0.5 poor, and α < 0.5 unacceptable. However, in the iterations, an α of 0.8 was considered to be a reasonable goal for internal reliability.
The data analysis was carried out using Statistic Toolbox of Matlab (The MathWorks, Inc., Natick, MA, USA).

Structured Report
The final SR (Appendix A) version was built by including 16 items in the "Patient Clinical Data" section, 4 items in the "Clinical Evaluation" section, 8 items in the "Exam Technique" section, 22 items in the "Report" section, and 5 items in the "Conclusion" section. Overall, 55 items were included in the final version of the SR. In Appendix B, the first draft of the SR is illustrated.
The results obtained during the first Delphi round are reported in Table 1, and those obtained after the second Delphi round in Table 2.
In the final version of the SR, the following parameters were included: 1.
In the "Exam technique" section, the equipment used, the number of detector rows and CT modality (i.e., single or dual energy), the reconstruction algorithm(s) used and contrast protocol; 2.
In the "Report" section, the sites and the features of extrathoracic metastases were defined, identifying the target lesions in accordance with the Response Evaluation Criteria in Solid tumors (RECIST) 1.1 [28].

Consensus Agreement
Tables 1 and 2 show the single scores and the sums of scores of the panelists for staging with the SR in the first and second rounds, respectively.
In both the first and the second rounds, as reported in Tables 1 and 2, all sections received more than a good rating.
The overall mean score of the experts (13 experts) and the sum of scores for staging with the SR were 4.5 (range 1-5) and 631 (mean value 67.54, STD 7.53) (Table 1), respectively, in the first round. The items of the SR with higher accordance in the first round were primary lesion features, lymph nodes, metastases and conclusions ( Table 1).
The overall mean score of the experts (nine experts) and the sum of scores for staging with the SR were 4.7 (range 4-5) and 807 (mean value 70.11, STD 4.81) ( Table 2), respectively, in the second round.
The overall mean score of the experts in the second round was higher than the overall mean score of the experts in the first round, with a lower standard deviation value demonstrating the higher agreement reached among the experts in the SR in this round. The items of the SR in the second round that had higher "each reader" accordance were exam data and pulmonary involvement in multiple sites ( Table 2). The Cronbach's alpha (Cα) correlation coefficient was 0.89 in the first round and 0.92 in the second round for staging with the SR.

Discussion
In the present study, the panel of experts demonstrated a high degree of agreement in defining the different items of the SR. After the second Delphi round, the panelists' mean score and the sum of scores related to the SR models were 4.7 (range 4-5) and 807 (mean value 70.11, STD 4.81), respectively. All sections received more than a good rating in the second Delphi round; however, the weakest sections were "Patient Clinical Data" and "Clinical Evaluation". Moreover, the Cα correlation coefficient reached 0.92 in the second round.
The present SR is based on a multi-round consensus-building Delphi exercise performed to develop a comprehensive focus on the SR template for CT-based lung cancer staging, as a result of critical discussion between expert radiologists in thoracic and oncological imaging. This SR was based on a standardized terminology and structure, which are aspects required for adherence to diagnostic-therapeutic recommendations and for enrolment in clinical trials, thus reducing the ambiguity that may arise from non-conventional language, and enabling better communication between radiologists and clinicians [29][30][31][32][33]. Therefore, according to Weiss et al. [4], the present report is a third-level SR.
Several sections are included in the present template: "Patient Clinical Data", "Clinical Evaluation", "Exam Technique", "Report" and "Conclusion". Some points should be evaluated for each of these sections.
Regarding "Patient Clinical Data", this section included data regarding personal or family history of cancer, and exposure to different risk factors and any genetic mutations. Regarding predisposing diseases, the possibility of collecting data on Chronic Obstructive Pulmonary Disease (COPD) allows one to plan treatment tactics. COPD is generally defined as a chronic minimally reversible airflow obstruction based on spirometry (postbronchodilator forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC) less than 70%).
COPD and lung cancer share common features, including their high mortality and common risk factors (such as smoking), some genetic background, environmental exposures, and underlying common inflammatory processes. A ratio of FEV1 to FVC less than 0.7 is generally used to define airflow obstruction; however, other indices (such as FEV1/FVC under the lower limit of normal criteria, and a predicted reduction of FEV1%) have also been considered indicative of airway obstruction. In addition to these three main factors, the timing of COPD diagnosis, the degree of airflow obstruction, and the severity of emphysema have also been reported to exert a remarkable effect on the significance of the impact of COPD and/or emphysema on lung cancer risk. Although, at present, no solid evidence is available to clearly distinguish the roles of airflow obstruction and emphysema in lung cancer development, it is certain that the highest lung cancer risk occurs when airflow obstruction and emphysema coexist [12][13][14].
Such a painstaking process of data collection was subject to some disagreement among the panelists due to the opinion that this process could slow down the normal workflow and was not considered to be easy to use. However, it is necessary to point out that all SR sections are independent from each other, so that the Patient Clinical Data and Clinical Evaluation sections are optional and may be filled in or not upon user choice, although they were conceived with the aim of creating databases. In fact, the possibility of collecting all these data could allow the creation of a large database, not only for epidemiological studies, but also in the highest conception of radiology, to lay the foundations for radiomics studies [34][35][36][37]. Radiology reports should be rich in data that could potentially be pooled, analyzed and correlated with patient outcomes, thereby assisting future clinical and imaging guidelines. However, the use of non-standardized terminology limits the capacity for data collection across multiple institutions. In addition, the lack of consistent data extractable from SR could hinder the development of computerized applications to assist in reporting. Natural language processing applications can help extract the data from the reports with variable terminology, allowing the compilation of standardized data, which could then be used to develop multi-institutional data registries, as well as in clinical and research analyses. Moreover, the possibility of combining genomic data and radiological features allows for developing models of radiogenomics-models that today represent the highest level of advanced-precision medicine processes [38][39][40][41]. The fact that the present SR can be included in the picture archiving and communication system (PACS) is an added value; therefore, it is only necessary to enter these data once upon first entry into the radiology department.
With regard to the "Exam Technique" section, sharing the examination technique not only within one's own department, but also with the radiology departments of other centers, fulfills a dual purpose. On the one hand, it enables the standardization of CT protocols; on the other hand, it allows carrying out diagnostic accuracy studies among different centers in order to optimize CT protocols. For example, during follow-up, differences in CT acquisition parameters and segmentation algorithms are important factors that can lead to variability in volumetric measurements. Therefore, slice thickness and other protocol-related factors (such as the reconstruction kernel and field of view) should be kept constant for reliable measurements to be carried out. Although some software packages allow the customization of options (which changes density thresholds for segmentation), standardized parameters should exist between practices in order to keep these parameters homogenous and comparable. In the CT protocol optimization step, enhanced communication among different centers could theoretically lead to quality improvement by means of enhanced patient safety (e.g., by radiation dose reduction), contrast optimization, and image quality. With improved communication comes the sharing of knowledge and experience, along with the potential of reducing medical errors and improving clinical outcomes [42].
Some authors have reported that the use of a checklist could improve diagnostic accuracy [43][44][45]. In 2014, based on the results of several screening trials, the American College of Radiology (ACR) released version 1.0 of the Lung CT Screening Reporting and Data System (Lung-RADS) [46]. This is a standardized method of reporting with recommendations for the management of pulmonary nodules detected on CT for lung cancer screening. When utilized, it can reduce the false positive rate in lung cancer screening, without increasing the rates of false negatives [47,48]. Lung-RADS is now deeply embedded as a quality metric on which regulation and reimbursement is determined by the Centers for Medicare and Medicaid [49,50]. During the first 5 years of nationwide lung cancer screening, there was a significant accumulation of data and experience, with many opportunities for continued learning [49,50].
The present "Report" section was designed to report all the structural characteristics of the lesions, such as margins and density, as well as relationships with locoregional structures (e.g., the pleura), which allow correct staging, but could also impact the choice of a more suitable therapeutic treatment based on the individual patient. The advantages of SR over FTR include its standardized terminology and structure, aspects required for adherence to diagnostic-therapeutic recommendations and for enrolment in clinical trials. SR reduces the ambiguity that may arise from non-conventional language. However, it should be noted that SR templates usually include a free text box for reporting any additional data that cannot be embedded in the default template fields.
The wide implementation of SR is critical for providing referring physicians and patients with the best quality of service, and for providing researchers with best quality data in the context of the big data exploitation of available clinical information [51][52][53][54]. Implementation is complex, requiring mature technology to successfully address pending user-friendliness, organizational and interoperability challenge (with particular regard to the adequate storage of data, and easy and adequate connections with PACS and post-processing software). Consequently, the introduction of SR should be seen as a comprehensive effort, affecting all domains of radiology [55][56][57][58].
Despite the promising results obtained, this study has some limitations. First, the panelists were all radiologists; therefore, a multidisciplinary approach is lacking. A multidisciplinary validation of SR would have been more appropriate. Second, the panelists were of the same nationality; contributions from experts from multiple countries would allow for broader sharing, and would increase the consistency of the SR. Finally, this study was not aimed at assessing the impact of SR on the management of patients with lung cancer. This issue will be discussed in forthcoming studies.

Conclusions
The wide implementation of SR is a critical point for providing referring physicians and patients with the best quality of service, and for providing researchers with the best quality of data in the context of the big data exploitation of the available clinical information. Implementation is complex, requiring mature technology to successfully address pending user-friendliness, organizational and interoperability challenges (specifically, the adequate storage of data, and the easy and adequate connection with PACS and post-processing software). Consequently, the introduction of SR should be seen as a comprehensive effort, affecting all domains of radiology.
The authors have no conflict of interest to disclose. The authors confirm that the article is not under consideration for publication elsewhere. Each author participated sufficiently to take public responsibility for the content of the manuscript.

FIELD DETAIL ADMITTED VALUES Significant key images
[Images]