Comparison of Different Methods for Evaluating Quantitative X-ray Fluorescence Data in Copper-Based Artefacts

: Handheld X-ray Fluorescence devices (HH-XRF) have given archaeologists and conser-vators the opportunity to study a wide range of materials encountered in their work with great accessibility and flexibility. The investigation of copper-based artefacts is a frequent application of these instruments in the field of cultural heritage as it gives direct and rapid quantitative results that can provide very important information about them, such as their fabrication technology. This paper discusses the comparison of quantitative results, obtained by a commercial handheld XRF device “Bruker Tracer 5g” on certified standards, compositionally significant in copper-based alloys of interest in the field of cultural heritage. The measured elemental concentrations were derived using three different calibrations, which were examined for their accuracy. Two of them were based on the empirical coefficients approach, performed by the built-in calibration/software (copper alloy calibrations provided by Bruker manufacturer and the Bruker EasyCal software), while the third one was performed off-line by processing the spectra with an independent fundamental parameters (FP) software (PyMca version 5.9.2., a X-ray fluorescence analysis software developed at the European Synchrotron Radiation Facility). The results highlight that although HH-XRF devices simplify data collection, for optimal quantitative results, the correct choice of analysis conditions and calibration method still requires a detailed understanding of the principles of X-ray spectrometry.


Introduction
X-ray fluorescence (XRF) spectrometry is used as one of many analytical techniques to explain the human past in twenty-first century archaeology [1].The non-destructive capabilities of XRF are indeed particularly suited to research in the field of cultural heritage, where the sample is unique or its integrity has significant technical or esthetic value [2].In recent decades, technological developments in X-ray generation and detection have led to the production and commercialization of different types of portable instruments [3], with the most widely used being the so-called hand-held XRF devices (HH-XRF) [4].These instruments are characterized by highly miniaturized hardware and powerful software and are capable of qualitative and quantitative in situ analysis, which make them suitable for work inside museums and conservation laboratories for the characterization of a wide range of cultural heritage objects.
X-ray fluorescence (XRF) analysis by portable spectrometers has long been applied to the study of ancient metals for the identification and characterization of different alloys [5,6].A large category of metal objects that exist in museums but also in the field are copper-based artefacts, which range from small tools to large-scale statues.Many studies have been carried out concerning the analysis of ancient and historic copper alloys using portable XRF spectrometers, including HH-XRF devices [7][8][9][10][11][12].The quantitative results provided by XRF measurements in a copper-based artefact are very important because the alloy's composition can provide researchers in the field of cultural heritage information about the artefact's authenticity and fabrication technology.
Quantitative XRF analysis consists of converting the measured intensities of the characteristic radiation into the concentrations of the analyzed elements [13].Nowadays, two calibration methods are commonly used: empirical calibration and the fundamental parameters (FP) approach [14][15][16].
Empirical calibration requires the use of several standards, whose matrix characteristics are as close as possible to that of the unknown samples.In detail, it relies on algebraic equations that correlate the analyte concentration in a sample with its measured X-ray intensity and the coefficients of such equation are determined by least squares regression [4].The most common methods in use within the class of empirical calibration are Compton Normalization [17] and Multiple Linear Regression of Lucas Tooth and Price [18,19].
Whereas the fundamental parameters approach is based on the Sherman equation [20], which is later improved by Shiraiwa and Fujino [21], a contemporary FP-based calibration is predicated on a system of exact equations correlating measured intensities of the analytes with their concentrations via the use of fundamental laws, principles, and physical constants governing the interaction of X-rays with matter [22].Accuracy depends on the uncertainty with which the parameters describe the sample spectrometer system [23][24][25].
Handheld XRF devices offer different approaches to perform calibration procedures [26].The most rapid and common method is using a built-in general-purpose calibration of modern alloys, which allows for quick reading of results.However, a question arises regarding the accuracy of quantitative results in applications like studying archaeological copperbased artifacts and not modern one.Otherwise, additional calibration software based on empirical coefficients or the FP approach are used.To perform these calibrations, several standards are needed and need to be as close as possible to that of the unknown samples.
Obtaining appropriate standards for XRF analysis of cultural heritage material can be complex, particularly in the case of copper archaeological ancient material due to the wide range of concentrations found in heritage copper alloys [27].However, in 2010, a new set of certified reference materials (CRMs) was developed to assist scientists and conservators working in cultural heritage fields with quantitative X-ray fluorescence (XRF) analysis of historical and prehistoric copper alloys [27].This set of CRMs is known as the Copper CHARM Set (Cultural Heritage Alloy Reference Material Set).
The HH-XRF devices feature a user-friendly design that can sometimes lead archaeologists and conservators to view them as "black boxes" without considering some complex aspects of the calibration and proceeding independently for the material characterization [26].However, this perception can be problematic in interpreting quantitative XRF data accurately.
For this purpose, this research aims to compare various calibration methods used for HH-XRF devices to perform more accurate quantitative X-ray fluorescence analysis.The final goal is to provide guidance to operators on how to perform proper instrument set-up and calibration before analyzing historical copper alloys.By highlighting the limitations of built-in calibrations and the importance of selecting appropriate voltage for excitation of elements, this study can aid in improving the accuracy of quantitative analysis in the field of archaeometry.
In detail, this paper evaluates the quantitative results obtained by a commercial HH-XRF, i.e., Bruker Tracer 5, on 26 copper-based standards and using three different calibration procedures.Two of them are based on the empirical coefficients approach, performed by the built-in software.One was provided by Bruker with the instrument, whereas the other was a customized calibration that we implemented following the Bruker procedure.The third calibration was performed off-line by processing the spectra with the fundamental parameters software PyMCA (version 5.9.2) [28].

Results
The XRF spectrum of the certified reference material B21 is reported in Figure 1 as an example of the spectra acquired on the 26 standards.The spectrum highlights the characteristic lines of its alloying constituent elements.

Results
The XRF spectrum of the certified reference material B21 is reported in Figure 1 as an example of the spectra acquired on the 26 standards.The spectrum highlights the characteristic lines of its alloying constituent elements.The R squared and Root Mean Squared Error (RMSE) values obtained from Linear Regression Analysis on the elements present in copper-based cultural heritage alloys are summarized in Table 1.Based on the results, the Pymca and Customized Bruker methods show more accurate calibration, closest to an R squared value of 1 and at the same time to the lowest RMSE value, compared to the Bruker Built-in method.Specifically, for elements such as Mn, Fe, Ni, and As, the Pymca approach is the most accurate, while for elements such as Ag, Cd, Pb, and Bi, the Customized Bruker method provides the best results.Finally, for elements such as Co, Zn, Ag, Sn, and Sb, both methods show an accurate calibration.The R squared and Root Mean Squared Error (RMSE) values obtained from Linear Regression Analysis on the elements present in copper-based cultural heritage alloys are summarized in Table 1.Based on the results, the Pymca and Customized Bruker methods show more accurate calibration, closest to an R squared value of 1 and at the same time to the lowest RMSE value, compared to the Bruker Built-in method.Specifically, for elements such as Mn, Fe, Ni, and As, the Pymca approach is the most accurate, while for elements such as Ag, Cd, Pb, and Bi, the Customized Bruker method provides the best results.Finally, for elements such as Co, Zn, Ag, Sn, and Sb, both methods show an accurate calibration.
Figures 2 and 3 display quantitative results obtained from the three different calibrations for Tin (Sn) and Antimony (Sb), high-Z elements representative and relevant for studying copper materials.In detail, the graphs (Figures 2 and 3) indicate that the Linear Regression Line of the Bruker Built-in Calibration has a bias from the bisector.This suggests the presence of systematically positive deviations from the nominal concentration values, which means a poor calibration performance.However, we observe minimal deviations in the calibrated concentrations of Sn and Sb in both Off-Line PyMca and Customized Bruker compared to the Built-in Calibration.In Figure 4, the Linear Regression Analysis of Iron (Fe), a low-Z element, is underlined.Here, Iron is considered a low-Z element because it is one of the last detectable elements in copper materials using our XRF configuration.This is accomplished by comparing three calibration procedures.From the results, in this case the Built-in Calibration also has a bad performance, while the other two calibrations have more reliable quantitative results.The same trend is pointed out in Figure 5, which compares the RMSE values between the three different calibration procedures on all elements present in the copperbased alloys.Built-in Calibration has the highest RMSE value for all elements, while the other two calibrations have lower and comparable RMSE values to each other.In Figure 4, the Linear Regression Analysis of Iron (Fe), a low-Z element, is underlined.Here, Iron is considered a low-Z element because it is one of the last detectable elements in copper materials using our XRF configuration.This is accomplished by comparing three calibration procedures.From the results, in this case the Built-in Calibration also has a bad performance, while the other two calibrations have more reliable quantitative results.The same trend is pointed out in Figure 5, which compares the RMSE values between the three different calibration procedures on all elements present in the copperbased alloys.Built-in Calibration has the highest RMSE value for all elements, while the other two calibrations have lower and comparable RMSE values to each other.In Figure 4, the Linear Regression Analysis of Iron (Fe), a low-Z element, is underlined.Here, Iron is considered a low-Z element because it is one of the last detectable elements in copper materials using our XRF configuration.This is accomplished by comparing three calibration procedures.From the results, in this case the Built-in Calibration also has a bad performance, while the other two calibrations have more reliable quantitative results.The same trend is pointed out in Figure 5, which compares the RMSE values between the three different calibration procedures on all elements present in the copper-based alloys.Built-in Calibration has the highest RMSE value for all elements, while the other two calibrations have lower and comparable RMSE values to each other.

Discussion
As highlighted by Figures 2 and 3, the poor fit of the built-in calibration for Tin (Sn) and Antimony (Sb) high-Z elements in Tracer 5g can be attributed to the fixed "copper alloy" set-up characterized by a low voltage of 15 kV (as shown in Table 2).Due to this low voltage, only the L-lines of Sn and Sb were utilized for the quantitative analysis.However, the family of L lines of these two elements are in the same energy range, leading to overlapping and making it difficult to obtain accurate quantitative results.Instead for the other two calibration approaches, offline PyMca and Bruker Customized Calibrations, we used a higher voltage to excite the K-lines of Ag, Sn, and Sb.This optimization resulted in improved calibration accuracy, as shown by the results.

Discussion
As highlighted by Figures 2 and 3, the poor fit of the built-in calibration for Tin (Sn) and Antimony (Sb) high-Z elements in Tracer 5g can be attributed to the fixed "copper alloy" set-up characterized by a low voltage of 15 kV (as shown in Table 2).Due to this low voltage, only the L-lines of Sn and Sb were utilized for the quantitative analysis.However, the family of L lines of these two elements are in the same energy range, leading to overlapping and making it difficult to obtain accurate quantitative results.Instead for the other two calibration approaches, offline PyMca and Bruker Customized Calibrations, we used a higher voltage to excite the K-lines of Ag, Sn, and Sb.This optimization resulted in improved calibration accuracy, as shown by the results.

Discussion
As highlighted by Figures 2 and 3, the poor fit of the built-in calibration for Tin (Sn) and Antimony (Sb) high-Z elements in Tracer 5g can be attributed to the fixed "copper alloy" set-up characterized by a low voltage of 15 kV (as shown in Table 2).Due to this low voltage, only the L-lines of Sn and Sb were utilized for the quantitative analysis.However, the family of L lines of these two elements are in the same energy range, leading to overlapping and making it difficult to obtain accurate quantitative results.Instead for the other two calibration approaches, offline PyMca and Bruker Customized Calibrations, we used a higher voltage to excite the K-lines of Ag, Sn, and Sb.This optimization resulted in improved calibration accuracy, as shown by the results.Regardless of the voltage value, there are cases where overlapping is unavoidable in Low-Z elements.A typical example is Iron (Fe) (Figure 4).The fact that we have smaller deviations in the calibrated Iron concentrations in both Off-Line PyMca and Customized Bruker relative to the Built-in Calibration is because we were able to optimize the results for that element separately during the calibration procedure.Both external PyMca and internal Easycal software (PyMca version 5.9.2. and the Bruker EasyCal software) enable the user to highlight a particular element contribution in order to improve its quantitative results by inter-element corrections.For example, in the calibration procedure to avoid the overlapping between the Lead and Arsenic elements, we choose to use in the calibration Pb Lb1 to avoid the overlap of As Ka1, and for the calibration of As we choose As Kb1.Unlike this, the Bruker Built-in fixed set-up does not allow the improvement of the individual element, which means that the final results are directly In Figure 4 and Table 1, the results clearly indicate a difference in accuracy.The R squared value for the Iron (Fe) element in the Bruker Built-in Calibration is 0.86, while the other two calibrations have an R squared value of 0.99 (PyMca) and 0.98 (Customized Bruker) (Table 1).The lower accuracy of elemental quantitative results observed in the Built-in Calibration across all elements present in copper-based alloys (Figure 5) can be explained by the fact that general-purpose standards were used in the calibration procedure rather than specific ones having a similar composition to cultural heritage copper alloys.
Non-specific and modern reference materials do not provide a complete representation of the minor elements that are commonly present as impurities in copper alloys found in historical artefacts.In fact, these elements are either not present or found in lower levels than those typically observed in copper archaeological artefacts, especially for elements such as Fe, As, Bi, Ag, and Sb.This highlights the importance of using appropriate standards for the XRF analysis of historical copper alloys [27].For this reason, for the other two calibrations, a set of certified and compositionally significant standards, as they represented heritage copper-based artefacts, was utilized.

Reference Materials
A group of 26 certified reference materials (CRMs), compositionally significant in heritage copper-based artefacts, was used for the experiments (see Table 2).Part of the references are from the Charm Set [27], while the others were added to expand the composition range, similar to that of ancient copper alloys.For each standard, the uncertainty values were generated from the 95% confidence interval and in our calibration certified values of the following elements were used: Mn, Fe, Co, Ni, Zn, As, Ag, Cd, Sn, Sb, Pb, Bi.The majority of standards were in the shape of a disk, with a diameter ranging from 28 to 55 mm and a thickness ranging from 5 to 17 mm.Moreover, the standards were not constituted by multilayers but had the same composition, and the single suppliers guaranteed their homogeneity through statistical assessment and testing.Based on the article of Porcinai et al. [29], the subsurface layers that provide 99% of the maximum fluorescent intensity for the Kα-line of Antimony (the most energetic line fluorescence present in the copper material) have a thickness of about 210 microns.Therefore, we can assume that the measurements are conducted under infinite thickness conditions, considering that the size of the reference is several millimeters.For each calibration procedure, 10 series of measurements were performed on the whole group of standards, so the data set for each method separately was made of 260 records.

Off-Line PyMca Calibration
For the first calibration procedure, we used the Fundamental Parameters software PyMca [28].It is a user-friendly program for X-ray Fluorescence Analysis, which has been developed at the European Synchrotron Radiation Facility, so it is an independent software from the instrument manufacturer used in this work.Before starting the experiments, a study was carried out to find the optimal voltage, current, and primary filtration for Tracer 5g.Then, we performed 10 series of measurements on reference materials and collected the different spectra to process and quantify them off-line with PyMca.Once the quantification had been performed, the spectra were subjected to calibration by Weighted Linear Regression (WLS) method.Heginbotham et al. [30] have described a protocol for quantification of heritage copper alloys by ED-XRF with the of PyMca.However, we followed a modified method described by a previous study from Konstantakopoulou et al. [31], where the optimal conditions for Tracer 5g are reported, as well as the detailed steps for the calibration procedure.

Customized Bruker Calibration
Bruker provides software EasyCal and a detailed procedure for creating customized calibrations in Tracer 5g based on the empirical coefficients method.According to the procedure suggested by the manufacturer, before the start of the experiments it was necessary to choose standards and define the optimal excitation parameters.For this purpose, we used the same working conditions as those of Off-Line PyMca Calibration.With the use of EasyCal toolbox, we were able to set up the calibration curves but also to improve them for every element separately regarding several parameters such as line overlaps, standard deviation, etc.
After completing all required steps, the Customized Bruker calibration was installed in Tracer 5g and it was ready to be used.We performed 10 series of measurements on reference materials and the quantitative results corresponding to the customized calibrated values were displayed in real time on the device screen.

Bruker Built-In Calibration
There are several built-in calibrations in Tracer 5g corresponding to a variety of materials.These calibrations are based in empirical coefficients, are pre-installed by Bruker, and provided through the remote control of the instrument.For this study, we chose the one named "Copper alloys".
Compared to the previous two methods, Bruker Built-in Calibration has predetermined working conditions for Tracer 5g, which cannot be changed and at the same time the operator has no access to the details of the calibration procedure that has been performed by the manufacturer.Having these limitations, the only step we followed in this case was to perform 10 series of measurements on the reference materials.The elemental quantitative results were provided directly on display without requiring further processing.

Statistical Techniques
To compare the quantitative results obtained from the 3 different calibration procedures, we performed Linear Regression Analysis, using the nominal elemental concentration as the independent variable and the calibrated one as the dependent variable.Determining the R squared (R 2 ) for the three different calibrations is a direct and reliable way to predict the most appropriate model for each element.As a measure of accuracy, we used the Root Mean Squared Error (RMSE), which is the standard deviation of the Linear Regression Residuals.

Instrumentation
This work considers the commercial Hand-Held XRF device "Bruker Tracer 5g".It has a Rhodium target X-ray source with a maximum high voltage of 50 kV and 4 W maximum power consumption but has automatic limitations on the tube current depending on the high voltage.Moreover, Tracer 5g has an SDD detector, while it is provided with 2 collimators (3 and 8 mm) and 4 filters.
In all calibrations, a collimator with 3 mm beam spot size was selected.This was deemed necessary as in copper-based artefacts there is inhomogeneity due to the patina.The working conditions of each calibration procedure are summarized in Table 3.

Conclusions
We have observed that the quantitative data obtained through the Built-in Bruker Calibration for "Copper alloy" shows a significant deviation from the nominal values, indicating poor performance.We have also shown that customized Bruker and PyMca Calibrations provide more accurate results for the detection of both high-and low-Z elements.These calibrations can be combined for quantifying copper-based manufactures.The customized Bruker Calibration is helpful during field acquisition and immediately provides quantitative results, while the PyMca Calibration can be used for more detailed spectrum processing.Therefore, selecting similar reference materials to copper archaeological artefacts, performing the correct set-up measurements, and adopting an assurance calibration can help obtain XRF quantitative reliable data.By using the approach described here to evaluate quantitative XRF data in copper-based artefacts using Hand-Held XRF devices, we can avoid a simplistic automatic use of the device as a "black box" and prevent inconsistencies in the quantitative data.

Figure 1 .
Figure 1.XRF spectrum of the certified reference material B21 derived from Bruker Tracer 5g.The characteristic emission lines in XRF spectrum are presented in logarithmic scale.

Figure 1 .
Figure 1.XRF spectrum of the certified reference material B21 derived from Bruker Tracer 5g.The characteristic emission lines in XRF spectrum are presented in logarithmic scale.

Figure 2 .
Figure 2. Linear Regression Analysis on Sn and comparison of three calibration procedures.

Figure 3 .
Figure 3. Linear Regression Analysis on Sb and comparison of three calibration procedures.

Figure 2 .
Figure 2. Linear Regression Analysis on Sn and comparison of three calibration procedures.

Figure 2 .
Figure 2. Linear Regression Analysis on Sn and comparison of three calibration procedures.

Figure 3 .
Figure 3. Linear Regression Analysis on Sb and comparison of three calibration procedures.

Figure 3 .
Figure 3. Linear Regression Analysis on Sb and comparison of three calibration procedures.

Figure 4 .
Figure 4. Linear Regression Analysis on Fe and comparison of 3 calibration procedures.

Figure 5 .
Figure 5. RMSE values obtained for all elements present in copper-based alloys.

Figure 4 .
Figure 4. Linear Regression Analysis on Fe and comparison of 3 calibration procedures.

Figure 4 .
Figure 4. Linear Regression Analysis on Fe and comparison of 3 calibration procedures.

Figure 5 .
Figure 5. RMSE values obtained for all elements present in copper-based alloys.

Figure 5 .
Figure 5. RMSE values obtained for all elements present in copper-based alloys.

Table 1 .
R 2 squared and RMSE values obtained from Linear Regression Analysis.

Table 3 .
Bruker Tracer 5g working conditions of each calibration procedure.