The Strategies to Homogenize PET/CT Metrics: The Case of Onco-Haematological Clinical Trials

Positron emission tomography (PET) has been a widely used tool in oncology for staging lymphomas for a long time. Recently, several large clinical trials demonstrated its utility in therapy management during treatment, paving the way to personalized medicine. In doing so, the traditional way of reporting PET based on the extent of disease has been complemented by a discrete scale that takes in account tumour metabolism. However, due to several technical, physical and biological limitations in the use of PET uptake as a biomarker, stringent rules have been used in clinical trials to reduce the errors in its evaluation. Within this manuscript we will describe shortly the evolution in PET reporting, examine the main errors in uptake measurement, and analyse which strategy the clinical trials applied to reduce them.


Introduction
Cancer is now the second most common cause of death immediately after cardiovascular acute events. For a number of many common cancers, treatment of disseminated disease is often non-curative, toxic, and costly. Due to the extreme variability of tumour response to antineoplastic treatment of different cancers, static and functional imaging plays an essential role in antineoplastic treatment efficacy assessment.
PET/CT has already been a fundamental tool for staging and restaging in lymphoma [3,4] and its use is now standardized in the so-called Lugano criteria [5]. Recently, several clinical trials demonstrated that treatment could be guided by PET/CT also during the treatment itself, changing the ongoing therapy based on its results [6,7]. The research in this field is still ongoing and requires a lot of effort, but PET/CT became a true tool for personalized medicine when its results started to be given on a discrete scale [8,9]. Indeed, PET/CT has always been interpreted in a binary way, identifying the areas of high uptake and comparing them with the surrounding background [10]. However, if PET/CT is used for therapy assessment by identifying the change in tumour uptake, the use of a discrete scale in which the tumour lesion is compared to reference tissues and organs is needed. Furthermore, the idea of quantitative assessment of the uptake in PET/CT scan has been introduced in literature [11] upgrading the discrete scale to a full continuous scale, a PET/CT biomarker able to detect whichever variations are present in the uptake. In addition, standardized uptake value (SUV), metabolic tumour volume (MTV) and total glycolytic volume (TGV) have been proposed as gauges of tumour burden and aggressiveness [12] and are valid candidates for this scale.

Continuous Scale
The areas of uptake, once identified, are measured by associating a continuous variable, which gives the size of the uptake, and a unit describing which physical entities we are measuring. The different PET-derived metrics, SUV, MTV and TGV are described in the following sub-chapters with their strengths and limitations.

Standardized Uptake Value
SUV is the semi-quantitative PET metric most frequently used for measuring tumour glucose metabolism. It is defined as the ratio of the decay-corrected concentration of activity ([Atissue]) in the tissue to the injected activity (A) normalized to the patient's body weight (BW): To account for the different bio-distribution of FDG in tissue and fat, SUV corrected for lean body mass (SUL) has been introduced recently [9] and it is defined by the following formulas: Even if SUL has not been yet used extensively in clinical practice, the general advice, furnished by some guidelines [21,22], is to collect SUL along with SUV data to further understand its relevance both in clinical practice and for experimental settings.
Semi-quantitative metrics are calculated inside a volume of interest (VOI) which is drawn on PET images and that is representative of the tissue being analysed. Depending on the VOI delineation, a few simple SUV-based metrics are defined:

Standardized Uptake Value
SUV is the semi-quantitative PET metric most frequently used for measuring tumour glucose metabolism. It is defined as the ratio of the decay-corrected concentration of activity ([A tissue ]) in the tissue to the injected activity (A) normalized to the patient's body weight (BW): To account for the different bio-distribution of FDG in tissue and fat, SUV corrected for lean body mass (SUL) has been introduced recently [9] and it is defined by the following formulas: Even if SUL has not been yet used extensively in clinical practice, the general advice, furnished by some guidelines [21,22], is to collect SUL along with SUV data to further understand its relevance both in clinical practice and for experimental settings.
Semi-quantitative metrics are calculated inside a volume of interest (VOI) which is drawn on PET images and that is representative of the tissue being analysed. Depending on the VOI delineation, a few simple SUV-based metrics are defined: • SUV max : it is the maximum value within the VOI; this index is simple to measure and provides information about the most active tumour foci. Its drawback is the strong dependence on image noise, because it corresponds to a single voxel measure.
• SUV mean : it is the average of all values belonging to the VOI; this metric evaluates the mean metabolic activity of tumour. It is much less vulnerable to image noise, but heavily depends on the delineation method used for drawing the VOI [23]. • SUV peak : this value represents the maximum tumour activity within a 1 cm 3 VOI in the hottest part of the tumour volume [22,23]. The rationale is to have an index measurement associated with the hottest part of the tumour, but in a standard volume, less dependent on noise.

Metabolic Volumes
Recently, two other metrics have been defined to characterize the tumour burden: they are called metabolic tumour volume (MTV) and total glycolitic volume (TGV), also indicated as total lesion glycolysis (TLG).
MTV measures the total volume of metabolically active tumour included within a VOI and is expressed in cm 3 or mL. This index evaluates the extent of disease and it is based on the assumption of a metabolic activity higher than the surrounding healthy tissue to be able to accurately define the tumour volume. MTV is less affected by noise since it includes hundreds or thousands of voxels.
TGV is defined by the following relation: The rationale is to combine tumour burden (volume) and its metabolic activity (uptake). This metric evaluates tumour aggressiveness.
Metabolic tumour volume measurements are of crucial importance to pursue a quantitative approach to PET: to the state of the art, MTV and TGV do not have the three qualities of precision, accuracy and repeatability, which are requisites of a good prognostic index.

Kinetic Modelling
Moreover, the most accurate metric-kinetic modelling that describes the delivery, retention and utilization of glucose-is difficult to perform since it requires long dynamic PET scans, personnel with specific knowledge and dedicated software for the analysis. It is usually used in single-centre trials to perform pharmacodynamics studies, in phase I-II studies or in novel non-FDG tracers.

Sources of Errors in Uptake Evaluation
The sources of error in uptake evaluation can be divided into three groups, according to the dependence on the PET/CT scanners, the site procedure for patient preparation and acquisition, and on the patient itself. Valid recommendations have been released by the US and European nuclear medicine associations [21,22] in order to provide minimal standards. The general prescription is to reach a high level of standardization: this aspect is essential to guarantee an efficient comparison of PET/CT metrics acquired at different time points (intra-patient) and between different patients (inter-patient), either at a single site or across multiple sites [24,25]. In the next paragraphs we will describe shortly which are the major sources of errors in uptake evaluation that should be taken in account while planning PET/CT-oriented clinical trials. We will also inform the reader of the aforementioned guidelines and to specific reviews [26,27] for the detailed description of all the factors affecting uptake evaluation.

Scanner-Related Factors
These factors depend on the equipment used by the site: the PET/CT scanner to acquire the images of the patient and the dose calibrator used for measuring the FDG activity. The scanner-related factors are usually addressed, within a clinical trial, by a central imaging corelab.

Cross Calibration of PET/CT Scanners
Cross-calibration of PET scanners and dose calibrators are mandatory to minimize uptake variability. Cross-calibration consists in the acquisition, as detailed by the PET/CT manufacturer's manual, of a cylindrical phantom in which an activity of 18 F, measured with the dose calibrator, has been inserted. A calibration factor is set on the PET/CT scanner in such a way that the activity measured by the scanner is the same measured by the dose calibrator. Since the dose calibrators are usually calibrated to 3%-10% precision and there could be some errors in the filling procedure of the phantom, the accepted difference in clinical practice between the activities measured by the scanner and the dose calibrator is 10%. It is advisable to always use the same dose calibrator to measure the activity employed for calibrating the scanner. If more than one calibrator are used, they should all be cross-calibrated with a traceable radioactive source. Indeed, the injected activity into the patient must always be measured with the calibrator, which has been cross-calibrated with the PET/CT scanner used for imaging. PET/CT sites not equipped with dose calibrators cannot get reliable uptake measures.
Variability among different scanners, even in a controlled environment of a multi-centre clinical trial, proved to be up to 25% [28]. Effort of cross-calibration through the measurements of different phantoms permits achievement of a variability less than 10% in multi-centre clinical trials [23][24][25][26][27][28][29][30], while 5% should be a good requirement for using PET/CT in a quantitative way [31,32].

Verification of Image Reconstruction Algorithm
A plethora of reconstruction algorithms exists and many differences arise when the measurements are carried out in small volumes, where loss of count due to spill-out to colder areas, the so-called partial volume effect [33], occurs. The recovery coefficient curve ( Figure 2) is a figure of merit describing the ratio between real and measured activity varying the dimension of the volumes. Ideally the ratio should be one with a 10% variation due to the scanner calibration as seen above. Actually, the ratio is one for larger volumes and decreases slowly when the dimensions of the volumes are smaller than 5-8 mL [21]. A National Electrical Manufacturers Association (NEMA) phantom with hollow spheres, which could be filled with a known activity, is used to tune the reconstruction algorithm. Several parameters of the reconstruction algorithm, such as number of iterations, number of subsets, and width of Gaussian smoothing filter could hence be tuned to achieve the best recovery. variability. Cross-calibration consists in the acquisition, as detailed by the PET/CT manufacturer's manual, of a cylindrical phantom in which an activity of 18 F, measured with the dose calibrator, has been inserted. A calibration factor is set on the PET/CT scanner in such a way that the activity measured by the scanner is the same measured by the dose calibrator. Since the dose calibrators are usually calibrated to 3%-10% precision and there could be some errors in the filling procedure of the phantom, the accepted difference in clinical practice between the activities measured by the scanner and the dose calibrator is 10%. It is advisable to always use the same dose calibrator to measure the activity employed for calibrating the scanner. If more than one calibrator are used, they should all be cross-calibrated with a traceable radioactive source. Indeed, the injected activity into the patient must always be measured with the calibrator, which has been cross-calibrated with the PET/CT scanner used for imaging. PET/CT sites not equipped with dose calibrators cannot get reliable uptake measures.
Variability among different scanners, even in a controlled environment of a multi-centre clinical trial, proved to be up to 25% [28]. Effort of cross-calibration through the measurements of different phantoms permits achievement of a variability less than 10% in multi-centre clinical trials [23][24][25][26][27][28][29][30], while 5% should be a good requirement for using PET/CT in a quantitative way [31,32].

Verification of Image Reconstruction Algorithm
A plethora of reconstruction algorithms exists and many differences arise when the measurements are carried out in small volumes, where loss of count due to spill-out to colder areas, the so-called partial volume effect [33], occurs. The recovery coefficient curve ( Figure 2) is a figure of merit describing the ratio between real and measured activity varying the dimension of the volumes. Ideally the ratio should be one with a 10% variation due to the scanner calibration as seen above. Actually, the ratio is one for larger volumes and decreases slowly when the dimensions of the volumes are smaller than 5-8 mL [21]. A National Electrical Manufacturers Association (NEMA) phantom with hollow spheres, which could be filled with a known activity, is used to tune the reconstruction algorithm. Several parameters of the reconstruction algorithm, such as number of iterations, number of subsets, and width of Gaussian smoothing filter could hence be tuned to achieve the best recovery.

Site-Related Factors
These factors are related to the procedure used by each PET/CT site for the patient's preparation and for acquisition of the images. These are influenced by the organization of the PET/CT unit and its equipment.

Imaging Parameters
Scan duration per bed position and amount of FDG activity directly affect the image quality and quantitative results [22]. Low scan duration and low activity reduce image counts with the results of noisier images. Even though it is counterintuitive, high-injected activity can also decrease the quality of the images, since they increase the proportion of random events, which do not participate in the creation of the images. Not exceeding in very low scans per bed (i.e., below 1 min), scaling the administered activity to the patient's weight [22] and considering increasing time instead of activity for larger patients [34] all assure a good homogenization among sites.

Patient's Weight
Measuring the patient's weight with a calibrated scale is of easy implementation also in a busy department and assures the lowest variability even if, as demonstrated recently [35], patients have a very good knowledge of their weight and a variation greater than 10% only happens in 2.5% of the cases.

Administered Activity
The administered activity is the difference between the activity measured in the syringe before administration and the activity measured in the syringe after the injection, often called residual activity. If injection is carried out with a three-way valve system and the syringe is flushed with physiological saline after the injection, the residual activity is less than 1% of the injected activity. As an example, Figure 3 shows the Bland-Altman plot of activity before and after administration. We can see that there is a small bias of about 3 MBq with only 5% of the difference being larger than 6 MBq.

Site-Related Factors
These factors are related to the procedure used by each PET/CT site for the patient's preparation and for acquisition of the images. These are influenced by the organization of the PET/CT unit and its equipment.

Imaging Parameters
Scan duration per bed position and amount of FDG activity directly affect the image quality and quantitative results [22]. Low scan duration and low activity reduce image counts with the results of noisier images. Even though it is counterintuitive, high-injected activity can also decrease the quality of the images, since they increase the proportion of random events, which do not participate in the creation of the images. Not exceeding in very low scans per bed (i.e., below 1 min), scaling the administered activity to the patient's weight [22] and considering increasing time instead of activity for larger patients [34] all assure a good homogenization among sites.

Patient's Weight
Measuring the patient's weight with a calibrated scale is of easy implementation also in a busy department and assures the lowest variability even if, as demonstrated recently [35], patients have a very good knowledge of their weight and a variation greater than 10% only happens in 2.5% of the cases.

Administered Activity
The administered activity is the difference between the activity measured in the syringe before administration and the activity measured in the syringe after the injection, often called residual activity. If injection is carried out with a three-way valve system and the syringe is flushed with physiological saline after the injection, the residual activity is less than 1% of the injected activity. As an example, Figure 3 shows the Bland-Altman plot of activity before and after administration. We can see that there is a small bias of about 3 MBq with only 5% of the difference being larger than 6 MBq.

Other Factors
Clock synchronization should be carried out on all the clocks of the department with respect to the scanner and the dose calibrator clock to avoid biases in time and, consequently, in uptake assessments. Iodinated contrast media could modify the uptake of a lesion and are therefore not recommended for PET/CT studies if SUV is used for referral [22]. If a diagnostic CT using contrast media is performed as part of the PET/CT, a general recommendation is to perform low dose scan CT for attenuation correction before the PET scan and subsequently the full dose diagnostic CT after PET exam [22].

Other Factors
Clock synchronization should be carried out on all the clocks of the department with respect to the scanner and the dose calibrator clock to avoid biases in time and, consequently, in uptake assessments. Iodinated contrast media could modify the uptake of a lesion and are therefore not recommended for PET/CT studies if SUV is used for referral [22]. If a diagnostic CT using contrast media is performed as part of the PET/CT, a general recommendation is to perform low dose scan CT for attenuation correction before the PET scan and subsequently the full dose diagnostic CT after PET exam [22].

Host Factors
These factors are related to the accumulation of FDG in patients, which depends on uptake time, plasma glucose levels and patient motion or breathing artefacts.

Uptake Time
It is a standard practice for most centres around the world to have an uptake time of 60 min, defining the uptake time as the interval between the intravenous administration of FDG and the starting of PET/CT image acquisition. Variations in uptake period are known to substantially influence measured SUV. It is generally accepted that uptake decreases in physiological organs [36] but increases in tumours. It has been postulated that imaging at longer times would improve the contrast between the tumour and normal tissue and, therefore, the ability to detect malignant lesions either at the primary or metastatic sites. However, for now, there is relative paucity of data to support this supposition.
To reduce the effect of SUV variation with uptake time, the PET/CT sites set an uptake time within a 20 min range (50-70 min) as advised in several guidelines for PET/CT procedure. Unluckily, in busy clinics, it is difficult to keep it constant and deviation occurs. The variability of uptake time, as we can see in the example of multi-centre clinical trials shown in Figure 4, is relatively high and only 30% of the PET/CT scans were acquired within this range.

Host Factors
These factors are related to the accumulation of FDG in patients, which depends on uptake time, plasma glucose levels and patient motion or breathing artefacts.

Uptake Time
It is a standard practice for most centres around the world to have an uptake time of 60 min, defining the uptake time as the interval between the intravenous administration of FDG and the starting of PET/CT image acquisition. Variations in uptake period are known to substantially influence measured SUV. It is generally accepted that uptake decreases in physiological organs [36] but increases in tumours. It has been postulated that imaging at longer times would improve the contrast between the tumour and normal tissue and, therefore, the ability to detect malignant lesions either at the primary or metastatic sites. However, for now, there is relative paucity of data to support this supposition.
To reduce the effect of SUV variation with uptake time, the PET/CT sites set an uptake time within a 20 min range (50-70 min) as advised in several guidelines for PET/CT procedure. Unluckily, in busy clinics, it is difficult to keep it constant and deviation occurs. The variability of uptake time, as we can see in the example of multi-centre clinical trials shown in Figure 4, is relatively high and only 30% of the PET/CT scans were acquired within this range.
The effect of uptake time on the SUV of liver and of blood pool that are often used as internal reference is shown in Figures 5 and 6. We can see how there is a general decrease of SUV (about 8% SUV in one hour), but it is lost in the general variability of SUV (about 25%).   The effect of uptake time on the SUV of liver and of blood pool that are often used as internal reference is shown in Figures 5 and 6. We can see how there is a general decrease of SUV (about 8% SUV in one hour), but it is lost in the general variability of SUV (about 25%).

Host Factors
These factors are related to the accumulation of FDG in patients, which depends on uptake time, plasma glucose levels and patient motion or breathing artefacts.

Uptake Time
It is a standard practice for most centres around the world to have an uptake time of 60 min, defining the uptake time as the interval between the intravenous administration of FDG and the starting of PET/CT image acquisition. Variations in uptake period are known to substantially influence measured SUV. It is generally accepted that uptake decreases in physiological organs [36] but increases in tumours. It has been postulated that imaging at longer times would improve the contrast between the tumour and normal tissue and, therefore, the ability to detect malignant lesions either at the primary or metastatic sites. However, for now, there is relative paucity of data to support this supposition.
To reduce the effect of SUV variation with uptake time, the PET/CT sites set an uptake time within a 20 min range (50-70 min) as advised in several guidelines for PET/CT procedure. Unluckily, in busy clinics, it is difficult to keep it constant and deviation occurs. The variability of uptake time, as we can see in the example of multi-centre clinical trials shown in Figure 4, is relatively high and only 30% of the PET/CT scans were acquired within this range.
The effect of uptake time on the SUV of liver and of blood pool that are often used as internal reference is shown in Figures 5 and 6. We can see how there is a general decrease of SUV (about 8% SUV in one hour), but it is lost in the general variability of SUV (about 25%).

Glucose Level
An elevated glycaemia decreases FDG uptake caused by the tumour because normal glucose competes with FDG, leading to erroneously low SUV values [21,32]. A constant plasma glucose level in the range of 4-7 mmol/L in an individual patient across all longitudinal studies, and a track of measured values are an achievable goal with a concerted team effort [22]. No corrections to SUV based on blood glucose level have proven to be reliable.

Extravasation
Even if is known that extravasation affects SUV, its effect is poorly reported in the literature. Some authors [37] reported 18% of occurrence of extravasation with a small fraction of patients presenting considerable extravasation up to 22% of the injected dose. Generally, a three-way valve system is preferable to syringe for tracer injection and could avoid most of extravasation. The presence of extravasation could be established if the locum of injection (i.e., the arm) is included in the axial field of view of PET scans, though no correction has, as of yet, been validated.

Other Factors
Hydration, rest and comfort on the PET scanner table are minimal requirements to avoid uptake in muscles and patient's motion during the scans. Breathing protocols could be used to reduce the probability of artefacts at the dome of the liver [38]. PET and CT fusion images should be visually analysed to identify possible patient motion near a lesion and, should this happen, SUV and related metrics must not be trusted.

Strategies of Error Reduction in Onco-Heamatological Clinical Trials
Many clinical trials use PET in decision-making. We browsed www.clinicaltrials.gov on 16 August 2016 with the following conditions on clinical trials: prospective, interventional, phase 2 and 3 and with the following keywords: PET and lymphoma. A total of 58 clinical trials were recruiting, and 19 had finished the enrolment in the last five years. We browsed www.pubmed.gov for the already completed studies and, excluding those that were single site and whose data were not published yet, we found 14 studies. In four of them, no information was given on the use of PET/CT and they were not considered in this review. We hence analysed the ten remaining studies and highlighted in a numbered-list fashion which of the three issues (1. the use of a central review for PET/CT assessment; 2. the equalization of PET/CT scanner; and 3. the harmonization of PET/CT procedure) was addressed and explained briefly how.

Glucose Level
An elevated glycaemia decreases FDG uptake caused by the tumour because normal glucose competes with FDG, leading to erroneously low SUV values [21,32]. A constant plasma glucose level in the range of 4-7 mmol/L in an individual patient across all longitudinal studies, and a track of measured values are an achievable goal with a concerted team effort [22]. No corrections to SUV based on blood glucose level have proven to be reliable.

Extravasation
Even if is known that extravasation affects SUV, its effect is poorly reported in the literature. Some authors [37] reported 18% of occurrence of extravasation with a small fraction of patients presenting considerable extravasation up to 22% of the injected dose. Generally, a three-way valve system is preferable to syringe for tracer injection and could avoid most of extravasation. The presence of extravasation could be established if the locum of injection (i.e., the arm) is included in the axial field of view of PET scans, though no correction has, as of yet, been validated.

Other Factors
Hydration, rest and comfort on the PET scanner table are minimal requirements to avoid uptake in muscles and patient's motion during the scans. Breathing protocols could be used to reduce the probability of artefacts at the dome of the liver [38]. PET and CT fusion images should be visually analysed to identify possible patient motion near a lesion and, should this happen, SUV and related metrics must not be trusted.

Strategies of Error Reduction in Onco-Heamatological Clinical Trials
Many clinical trials use PET in decision-making. We browsed www.clinicaltrials.gov on 16 August 2016 with the following conditions on clinical trials: prospective, interventional, phase 2 and 3 and with the following keywords: PET and lymphoma. A total of 58 clinical trials were recruiting, and 19 had finished the enrolment in the last five years. We browsed www.pubmed.gov for the already completed studies and, excluding those that were single site and whose data were not published yet, we found 14 studies. In four of them, no information was given on the use of PET/CT and they were not considered in this review. We hence analysed the ten remaining studies and highlighted in a numbered-list fashion which of the three issues (1. the use of a central review for PET/CT assessment; 2. the equalization of PET/CT scanner; and 3. the harmonization of PET/CT procedure) was addressed and explained briefly how.

RAPID
In the RAPID trial [39] from the National Cancer Research Institute (NCRI), 602 patients with newly diagnosed stage IA and IIA HL received three cycles of ABVD (doxorubicin, bleomycin, vinblastine, and dacarbazine) and then underwent PET/CT scanning. Patients with negative PET/CT were randomly assigned to receive involved-field radiotherapy or no further treatment; patients with positive PET/CT received a fourth cycle of ABVD and radiotherapy.
(1) The images were transmitted to the core laboratory at St. Thomas' Hospital, King's College, London, for central review. Two experienced reporters independently scored the scans with the use of the 5-point Deauville scale to evaluate the degree of FDG uptake, if present, as well as the likelihood of residual disease. Any differences in opinion were resolved by consensus. (2) PET scanning was performed on full-ring PET or PET/CT cameras at sites within the United Kingdom NCRI PET Research Network. Sites complied with commonly agreed methods for quality control to ensure that the performance of imaging equipment, data transfer, and image quality were within an acceptable range which was pre-specified by the core laboratory [29].
Physicists from the core laboratory visited each PET site and scanned two phantoms to check image quality and quantitative accuracy before starting the study. If the difference between expected and measured activity in the cylindrical phantom used for cross-calibration is below 10% and if the recovery coefficient of the different scanners are within ±0.25 SUV variation, the PET/CT scanner is qualified for the trial. (3) Before undergoing scanning, patients fasted for 6 h, after which 350-400 MBq of FDG was administered intravenously. Scans were acquired 60 min later from the skull vertex or base of the brain to the upper thighs.

RATHL
In the RATHL trial from NCRI, the Lymphoma Study Association (LYSA) and the Italian Foundation for Lymphoma (FIL), 1214 patients with newly diagnosed advanced classic HL underwent a baseline PET/CT scan, received two cycles of ABVD, and then underwent an interim PET-CT scan. Patients with negative PET/CT were randomly assigned to continue ABVD or omit bleomycin in cycles three through six. Those with positive PET/CT received BEACOPP (bleomycin, etoposide, doxorubicin, cyclophosphamide, vincristine, procarbazine, and prednisone). Radiotherapy was not recommended for patients with negative findings on interim scans.
(1) Scans were centrally reported by a network of national core laboratories in the United Kingdom, Italy, Sweden, Denmark, and Australia. Images were centrally reviewed with the use of the 5-point Deauville scale. Two readers at each core laboratory who were unaware of the patient's clinical status scored the scans. Differences were resolved by consensus between two doctors at the same core laboratory or, when agreement could not be reached, by a third doctor at another core laboratory. (2) No information was given about the equalization of PET/CT scanners. Patients from the UK nevertheless rely on the NCRI network for PET sites as described in the RAPID trial. (3) Baseline PET/CT was performed within 28 days before enrolment. Interim PET/CT scanning was performed 9 to 13 days after the preceding dose of chemotherapy. Patients underwent PET/CT scanning with low-dose unenhanced PET/CT scans and were acquired at 60 ± 10 min after the intravenous injection of 350-550 MBq of FDG. Subsequent PET/CT scanning was performed under the same conditions and on the same scanner as baseline scanning.

S0816
In S0816 trial [40] from Southwest Oncology Group (SWOG), Cancer and Leukemia Group B/Alliance (CALGB), Eastern Cooperative Oncology Group (ECOG), and the AIDS Malignancy Consortium, 358 previously untreated patients with stage III and IV HL underwent a PET/CT scan at baseline and after two initial cycles of ABVD. PET-negative patients received an additional four cycles of ABVD, whereas PET-positive patients were switched to escalated BEACOPP for six cycles.
(1) There were 331 of 358 PET/CT scans submitted for centralized review to the CALGB imaging core lab. The CALGB Imaging Core Lab enables internet-based visual and virtual conferences that allow the simultaneous display of and mutual communication between participating sites and the core lab in a secure manner. The central PET/CT review was completed in less than two days in 78% and in less than four days in 95% of the patients. The 5 reviewers scored the scans using the 5-point Deauville scale. There was one adjudicator in the CALGB Core Lab, for cases where major discrepancies existed between the local site and the central PET/CT interpretation. Scans given Deauville scores 1 to 3 were considered PET2-negative, and scans given Deauville scores 4 to 5 were considered PET-positive. (2) Only full-ring dedicated PET/CT scanners were acceptable and older "stand-alone" PET scans were not adequate for this study. A documented daily quality control procedure had to be in place at each imaging facility. The proposed data acquisition/reconstruction protocol (including details of all the parameters above) had to be discussed with the core lab prior to the start of the study. (3) The clinical trial protocol foresaw very detailed instruction for PET/CT scanning including patient preparation, FDG administration, uptake time in the range 60-80 min which are furnished in supplemental material along with the published article [40]. The CT of the PET/CT was used for attenuation correction of PET data and anatomic localization. CT settings followed institutional guidelines (usually 120-140 kV, at least 60 mA).

HD15
The HD15 study [41] from German Hodgkin Study Group (GHSG) enrolled 2126 patients with advanced-stage HL from hospitals and practices in Germany, Switzerland, Austria, The Netherlands, and the Czech Republic. Six to eight cycles of BEACOPP-based chemotherapy were administered. Patients achieving partial remission after chemotherapy with at least one involved nodal site of 2.5 cm in the maximal long axis diameter underwent PET/CT scanning. PET/CT scanning by using FDG was performed two to six weeks after the end of chemotherapy to allow a prompt start of radiotherapy no later than six weeks after chemotherapy in PET-positive patients.
(1) A multidisciplinary panel consisting of a medical oncologist, a radiologist, a radiation oncologist, and a nuclear medicine physician, accompanied by a statistician, reviewed all PET/CT and CT scans as well as any available X-rays. The therapeutic decision for or against radiotherapy was based on the interpretation of the PET/CT scan and was recommended by the central review panel in consensus. Partial remission was defined by protocol as tumour volume shrinkage by 50% but continuing presence of residual tissue. A positive PET/CT scan was defined visually as a focal or diffuse uptake above the activity of mediastinal blood pool structures within residual tissue.
(2) and (3) No information was given for PET/CT scanner equalization or procedure harmonization.

H10
The H10 [42] trial from European organization for the research and treatment of cancer (EORTC), LYSA and FIL, enrolled 1137 patients with untreated clinical stage I/II HL. Early PET/CT scan was performed after two cycles of ABVD. Patients with a negative PET scan in the experimental arm continued ABVD treatment omitting involved node radiotherapy. Patients with PET/CT-positive scans were intensified to BEACOPP escalated.
(1) Prospective central reading of the early PET scan was planned in the protocol using a real-time on-line blinded independent central review [43]. For technical reasons, centralized reviews for the LYSA group started from the initiation of the trial, and for EORTC/FIL groups, it occurred from 2008 onward. In case of absence of a timely (72 h) centralized reading, the local result of the earlier PET/CT scan was decisional for further treatment in the experimental arm. A blinded second central PET/CT review was performed retrospectively after the recommendations of the independent data monitoring committee by four experts on the scans of 52 patients with events (including patients with early PET-negative and early PET-positive scans) and 52 randomly selected patients. Out of the 104 scans, 20 could not be used for second central review for logistic or technical reasons; 84 were compared with the results of the first review. Two experts from LYSA reviewed EORTC/FIL scans and vice versa. PET/CT images were scored according to the International Harmonization Project (IHP) criteria [10].
(2) and (3) No information was given for PET/CT scanner equalization or procedure harmonization.

BV-ABVD
The BV-ABVD [44] pilot phase II trial performed by FIL enrolled 12 untreated HL patients. Patients were administered with two cycles of brentuximab vedotin at a dose of 1.8 mg/kg followed by three or six cycles of ABVD, depending on risk group, with or without RT. The response rate after brentuximab vedotin, but before starting ABVD, was assessed by PET/CT scan. PET/CT scan was performed at baseline, after two cycles of brentuximab vedotin, and at the end of treatment.
(1) A panel consisting of three independent reviewers centrally assessed baseline and after two cycles of brentuximab vedotin PET/CT scans. The review process was coordinated through the Widen website platform [17]. Widen is a system for real-time central review which collects PET/CT images from participant sites, automatically verifies protocol violations, distributes images to reviewers, permits the online submission of the PET/CT evaluations and automatically merges them, transmitting the final results of the PET/CT reviews to the local investigators and to the clinical trial data centre. The response was assessed by adopting the 5-point scale Deauville criteria as a qualitative index and the SUV max as a quantitative index. A response was defined as a reduction in the Deauville score or, if there was no change in it, a reduction in SUV compared to baseline. (2) Scanners underwent clinical trial qualification for semi-quantitative analysis from the FIL core lab [30]. That is, before patients' accrual, all PET/CT sites acquire two phantoms and sends them to the core lab. If the difference between expected and measured activity in the cylindrical phantom used for cross-calibration is below 10% and if the recovery coefficient curves are smooth and within the limits given by the European Association for Nuclear Medicine (EANM) guidelines [35], the PET/CT scanner is qualified for the trial. (3) All FIL PET/CT-oriented protocols starting after 2011 use a shared PET/CT procedure that follows 2010 EANM guidelines [35] for patient preparation, FDG and other contrast agent administration, and PET/CT acquisition protocol.

HD0607
An interim analysis of the HD0607 [44] clinical trial performed by FIL with 512 untreated stage IIB-IV HL patients reported the effect of PET review. Patients after two cycles of ABVD were assessed by interim PET/CT scan. Patients with positive PET scans were addressed to e-BEACOPP while those with PET-negative scans continued with another four cycles of ABVD.
(1) A panel consisting of six independent readers assessed baseline and after two cycles of ABVD PET/CT scans. The response was assessed by adopting the 5-point scale Deauville criteria, considering score 4-5 positive. The readers reviewed independently the interim PET/CT scans and inserted the review in the Widen website platform [17] that calculated automatically the majority and forwarded the results of the review to the participating site. Real-time independent review was carried out: the average and median times for diagnosis exchange were 48 h and 38 h, respectively.
(2) Scanners underwent clinical trial qualification for visual analysis from the FIL core lab [30]. That is, before patients' accrual, all PET/CT sites acquire a cylindrical phantom and send them to the core lab. If the difference between expected and measured activity in the cylindrical phantom used for cross-calibration is below 10%, the PET/CT scanner is qualified for the trial. (3) The participating site used a shared PET/CT procedure that followed 2003 EANM guidelines [45] for patient preparation, FDG and other contrast agent administration, and PET/CT acquisition protocol.

E3404
In the E3404 trial [46] from ECOG-American College of Radiology Imaging Network (ACRIN), 100 previously untreated patients with diffuse large B-cell lymphoma (DLBCL) stage III, IV, or bulky II, were enrolled. PET/CT scan was performed after three cycles of R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, prednisone). PET-positive patients received four cycles of R-ICE (rituximab, ifosfamide, carboplatin, etoposide), PET-negative patients received two more cycles of R-CHOP.
(1) PET/CT scans after three cycles of R-CHOP were all submitted for central review. The interpretation took place during the fourth cycle of R-CHOP chemotherapy. Institutional results were used for six of 70 PET/CT scans because images could not be acquired for central review. The review was a binary visual interpretation, which the central reviewer based on modifications of the IHP [10], customized for this trial and deemed the "ECOG criteria": (1) only sites of abnormality at baseline were evaluated; (2) abnormal activity required both a focal appearance and intensity greater than average liver; (3) all positive nodal sites had to have an anatomic correlate; (4) activity in bone marrow and spleen was considered abnormal only if focal and clearly discernible; (5) symmetric abnormal foci in the mediastinum and hilum were considered abnormal only if the remainder of the scan was positive; and (6) new foci were considered positive only if the remainder of the scan was positive or a new lesion was focal, very intense and associated with a lesion on CT. A dedicated publication [47] was done presenting the results of central review of this clinical trial.
(2) and (3) No information was given for PET/CT scanner equalization or procedure harmonization. ACRIN firstly developed a network for clinical trial [31]] in which its core lab verified that the difference between expected and measured activity in the cylindrical phantom, used for cross-calibration, was below 10% and it is reasonable thinking that the network was active for this trial.

SAKK38/07
There were 138 retrospective evaluations of previously untreated patients with DLBCL with a baseline positive PET scan and a measurable lesion of at least 15 mm in the SAKK38/07 clinical trial [48] from the Swiss Oncology Society (SAKK). PET/CT scan was performed after two cycles of R-CHOP and at the end of treatment.
(1) The central review was retrospectively performed at the Nuclear Medicine Department of the University Hospital of Zürich (Zürich, Switzerland). PET scans were analysed using the Deauville 5-point scale and semi-quantitative analysis; PET was considered negative if variation in SUV max between baseline and interim scan was >66% [49]. (2) No information was given for PET/CT scanner equalization.
(3) All patients were instructed to fast for at least 4 h before injection of 370 MBq of FDG. Blood glucose level had to be measured before injection of the radiotracer. Whole-body PET scans were performed after a standardized uptake time of 60 min. Interim PET scans were performed between day 11 and day 14 of the R-CHOP-14 cycle and between day 14 and day 28 after the last rituximab infusion (after treatment ended).
(1) Baseline and interim PET/CT scans were retrospectively evaluated by central review in 51/71 patients by a single nuclear medicine expert. Interim PET/CT was evaluated along baseline scan with the 5-point scale Deauville criteria-considering scores of 1, 2 or 3 as negative and scores of 4 or 5 as positive-and with semi-quantitative analysis considering PET as negative if variation in SUV max between baseline and interim scan was >66% [49]. (2) Each patient was scanned on the same PET/CT machine for baseline and subsequent assessments.
(3) No information was given for PET/CT procedure harmonization.

IELSG26
The IELSG26 [51] clinical trial from International Extra-Nodal Study Group, NCRI and FIL enrolled 125 patients with histopathologically proven primary mediastinal B-cell lymphoma (PMBCL) of any stage, previously untreated and eligible for intensive chemo-immunotherapy with curative intent at 21 institutions from five countries. Baseline PET/CT scans were carried out within 14 days before starting treatment. Prior treatment with corticosteroids alone for up to 1 week for the relief of local compressive symptoms was allowed. For 20 patients who required urgent treatment and for whom the PET/CT scans could not be performed before therapy started, the baseline scan was omitted after discussion with the clinical coordinators. PET/CT scans were then performed at 3 to 4 weeks after the end of the chemo-immunotherapy.
(1) CD-ROMs together with essential information on the PET/CT acquisition were sent to the core laboratory for central review. A single physician with expertise in nuclear medicine performed this after the end of treatment. Uncertain interpretations were resolved with the agreement of a second expert. The review was blinded to the clinical information. The achievement of a metabolic complete response was defined, according to the IHP criteria [10] equating score 1 or 2 on the Deauville scale. The post-chemotherapy and post-radiotherapy scans were assessed according to the Deauville scale. Diffuse uptake in the spleen or marrow on the post-chemoimmunotherapy scan is considered to be a result of chemotherapy and was not scored as active disease. (2) PET/CT imaging was performed on full-ring integrated PET/CT systems. Baseline and response PET/CT examinations for a patient were performed in the same centre by using the same PET/CT system. Each centre was required to follow active quality control and quality assessment programs. (3) PET and CT images were acquired in the same session. Intravenous CT contrast media were not administered before the PET study. If a diagnostic CT scan using contrast was routinely performed as part of the PET/CT examination, it was performed after the PET scan. All patients fasted for at least 6 h before the injection of 4.5 MBq/kg of FDG. Serum glucose level measured before injection of the radiotracer was less than 160 mg/dL in all patients. After a standardized uptake time of 55-65 min, PET/CT data were acquired from the mid-thigh toward the base of the skull in two-dimensional or three-dimensional mode. The PET/CT acquisition time was at least 3 min per cradle position.

PRIMA
PET/CT after induction therapy was performed on160 patients with previously untreated high-tumour burden FL enrolled in the PRIMA [52] clinical trial from Groupe d'Etudes des Lymphomes de l'Adulte (GELA) and GOELAMS. Patients were treated with six cycles of R-CHOP or eight cycles of rituximab plus cyclophosphamide, vincristine and prednisone (R-CVP).
(1) For the retrospective central review, all participating investigators having performed PET/CT scans in the initial analysis were asked to submit on CD-ROM the PET/CT data at baseline and/or post-induction. The scans were read independently by two experienced nuclear medicine physicians. In the event of a discrepant interpretation, a third reader provided adjudication. Post-induction PET response was assessed using the IHP criteria and the Deauville 5-point scale.
SUV max was measured for each involved nodal and extranodal site.

PET-Folliculaire
In the PET-Folliculaire [53] clinical trial from Groupe d'Etudes des Lymphomes de l'Adulte (GELA) and GOELAMS, 121 patients with previously untreated high-tumour burden FL were treated with six cycles of R-CHOP plus two cycles of rituximab, without rituximab maintenance. PET/CT was performed before treatment, after four cycles of R-CHOP (interim PET), and at the end of treatment.
(1) PET scans were centrally reviewed by three experienced nuclear medicine physicians on a dedicated network of workstations [43]. Differences between observers were resolved by majority view. PET/CT results were reported using the Deauville 5-point scale. Two different thresholds were compared to define positivity and negativity: residual activity greater than the liver activity (scores 4 and 5), and residual activity greater than the mediastinal blood pool (scores 3, 4, and 5). (2) Regular testing of image quality performed by a qualified physicist as recommended by the SFPM (French Society of Medical Physics) was required from each centre. (3) PET/CT was performed in each centre on a dedicated PET/CT scanner according to standardized modalities, taking into account the technical characteristics of each camera. Patients fasted for at least 6 h before each scan and had to have a blood glucose concentration <10 mmol/L. They were administered intravenous injections of 3.5 to 8 MBq/kg (minimal activity, 185 MBq) FDG and were asked to lie in supine position for 1 h to avoid muscular uptake. Imaging was performed to cover a volume starting from the upper thigh to the skull base. Images were reconstructed iteratively with and without attenuation correction.

Evaluation of Error Reduction Strategies in Onco-Heamatological Clinical Trials
All the clinical trials analysed in the previous section recognize the need of standardizing and harmonizing PET/CT in clinical trials and address the following three issues: (1) the use of a central review for PET/CT assessment; (2) the equalization of PET/CT scanner; and (3) the harmonization of PET/CT procedure with different strategies.

Central Review
The use of a central review is fundamental to assess the highest reproducibility of the results and is considered binding when the primary endpoint of a clinical trial is based on tumour measurement (e.g., progression-free survival or objective response rate) [54]. The central review could be classified in several categories: independent (I) vs. consensus (C) vs. adjudicator (A), multi (M) vs. single readers (S), concentrated (C) vs. distributed (D), stand-alone (S) vs. mixed (M), and real-time (RT) vs. retrospective (R). Table 1 categorizes the central review of the analysed trials based on these categories. The readers of the images could interpret the PET/CT scans independently from the others or together in consensus. The majority of concordant results gives the overall result of the review, in the case of independent readers. In this case, the readers should be odd numbered and at least more than three. In case of consensus, there should be at least two readers and the overall result of the review is obtained by discussion among them. A third kind of review is when two reviewers analyse independently the scan and a third reviewer, un-blinded from the results, adjudicates the final result.

Multi versus Single Readers
Single-reader review is a particular type of central review since is carried out by a single reader, usually an internationally renown expert in the field. It is used to homogenize the readings, but compared to multi-readers, it is affected by the bias of having a single reader.

Concentrated versus Distributed
This category refers to the way in which images are exchanged for the review. In the central review, all the images are collected in a single site, usually the core lab of the clinical trial where readers sit. Investigators from participating sites upload the images to the core lab and the readers analyse them there. In the distributed review, the readers are not gathered together but could be anywhere. Hence images need to be distributed through the network to give access to remote readers. In this case, results need to be circulated again through the network and combine in the final review.

Stand-Alone versus Mixed
Sometimes the evaluation of the local investigator in the participating site could be mixed with the one of the central review to form the final evaluation. This is usually done when not enough readers could be gathered or when the review is urgent.

Real-Time versus Retrospective
A real-time review requires considerably more effort than a retrospective one. If PET/CT results are used for changing therapy, it is mandatory that the review results come out as soon as possible, usually within 3-5 days. In this case, all the technical issues due to PET/CT should be expeditiously resolved. An automatic system for image exchange is mandatory.
In prospective clinical trials in which PET/CT is used for patient management or it influences the primary end-point multi-readers stand-alone independent review is mandatory. Several trials demonstrated that it could be done in real-time if the result of the PET/CT affects immediately the patient's management. Distributed review is the best alternative for reaching a higher number of independent reviewers and does not rely on a single site's experience.

PET/CT Scanner Equalization
It must be noted that in multi-centre clinical trials, an SUV measurement variation across PET/CT scanners in the range of 10%-25% due to intrinsic variability of the instruments is common [55]. Hence, cross-calibration of PET/CT scanners and ancillary instrumentation is the first condition to achieve an accuracy in tracer uptake measurement to 5%-10% [28]. Several programs for the cross-calibration of PET/CT scanners have been carried out in recent years by imaging and oncology societies: the EANM (EANM) accreditation program for site of excellence carried out by EARL Ltd. (Wien, Austria) [23,56], the UK NCRI PET Clinical Trial Network [29], the ACRIN program [31], the Clinical Trial Network (CTN) of Society of Nuclear Medicine and Molecular Imaging (SNMMI) [32,57], the JSCT NHL10 trial [58] in Japan and the FIL Core lab in Italy [30]. While these programs are common nowadays, at the time the clinical trial discussed in this manuscript started, only the RAPID, the S0186 and the BV-ABVD clinical trials addressed the problem of standardization of scanners through a thorough clinical trial qualification process consisting of the verification that the requirements for PET/CT scanners were fulfilled. Others, such as the RATHL, the PET-Folliculaire and the IELSG26, only required documentation from the local centre assessing that they were fulfilling the national or international standards for quality assurance.

Harmonization of PET Procedure
Despite of the publication of guidelines for scanning patients [35,59], the lack of standardization [60,61] has hampered in the past the use of SUV as a biomarker in clinical trials. Now, thanks to better knowledge of the factors affecting SUV measurements [26], guidelines for patient scanning and PET/CT image acquisition [62] are recommended to improve data quality and reproducibility [62,63]. All the studies prescribed correct timing for PET/CT scanning in respect to concurrent therapy in order to avoid the effects of medicaments on PET/CT studies (i.e., steroids). RAPID, S0186, IELSG26 and PET-Folliculaire adopted international guidelines to be followed and report the parameters used in the trials.

Conclusions
Major clinical trials that use positron emission tomography computed tomography (PET/CT) as a management tool use a thorough and careful program to achieve the reproducibility of the PET/CT data. As we saw, most of the trials used PET in a qualitative way, and therefore some of the requirements for clinical trial qualification are more relaxed in respect to a quantitative approach. Several new clinical trials that are running foresee use of PET/CT-derived standardized uptake value (SUV) metrics and are hence using a severer program for trial qualification that permits lower variability among scanners below 5% and use an acceptance/rejection schema for low standardized PET/CT scans, which have some factors, such as uptake time, that are outside predefined limits.