How to Assess the Measurement Performance of Mobile/Wearable Point-of-Care Testing Devices? A Systematic Review Addressing Sweat Analysis

Recent advances in technologies for biosensor integration in mobile or wearable devices have highlighted the need for the definition of proper validation procedures and technical standards that enable testing, verification and validation of the overall performance of these solutions. Thus, reliable assessment—in terms of limits of detection/quantitation, linearity, range, analytical and diagnostic sensitivity/specificity, accuracy, repeatability, reproducibility, cross-reactivity, diagnostic efficiency, and positive/negative prediction—still represents the most critical and challenging aspect required to progress beyond the status of feasibility studies. Considering this picture, this work aims to review and discuss the literature referring to the available methods and criteria reported in the assessment of the performance of point-of-care testing (PoCT) devices within their specific applications. In particular, without losing generality, we focused on mobile or wearable systems able to analyze human sweat. In performing this review, the focus was on the main challenges and trends underlined in the literature, in order to provide specific hints that can be used to set shared procedures and improve the overall reliability of the identified solutions, addressing the importance of sample management, the sensing components, and the electronics. This review can contribute to supporting an effective validation of mobile or wearable PoCT devices and thus to spreading the use of reliable approaches outside hospitals and clinical laboratories.


Introduction
Recent advances in nanotechnologies, microfluidics, printed electronics, additive manufacturing, and biomaterials have brought about a significant improvement in the integration of sensing elements within wearable devices for point-of-care testing (PoCT) applications [1]. These technological evolutions contributed to the design of complete integrated systems, including not only the specific biosensing components but also the sample treatment and distribution solutions, and the overall electronics [2]. This efficient integration allows these devices to work in a standalone modality and perform all the steps required to obtain a reliable detection of specific analytes, including sample collection and result interpretation [3]. This novel and continuously evolving approach appears to be particularly promising for on-site and (almost) real-time feedback on various biomarkersi.e., any measurable indicator related to the health status of a subject-via non-invasive approaches. One of the main advantages of this method is the possibility of performing all these analyses without the necessity of accessing health facilities and to promote homebased and decentralized modalities [4]. Of course, several requirements, related to both the functions and applications, need to be satisfied in order to make these tools a solid and reliable method accepted by both the medical-scientific community and end users [5]. In particular: • Optimal levels of specificity and sensitivity are needed for the early detection of specific disease-related markers [6]. • Repeatability and reproducibility of the methods can ensure reliable results, which must be strictly non-dependent on users/testers or the environmental conditions. • Stability over time is required to ensure reliable outcomes even when multiple tests need to be performed across a long span of years, as happens, for example, in the monitoring of chronic pathologies. • Usability aspects (e.g., invasiveness and complexity of the procedures for sample extraction and preparation, user interface, etc.) must be addressed to ensure a wide acceptance of the technology, and also the overall direct and indirect costs of the proposed solutions, which can be assessed in terms of general cost-effectiveness.
The literature research identified 255 papers published from 2010 to 2021 according to the string used as reported above. After carefully checking, conference papers and review articles were excluded from the analysis (51), further manuscripts focused on novel materials characterization or without metrological information reported in abstract with base research application (84) were excluded. After this first article selection, we had identified a panel of 120 manuscripts well characterized from a metrological point of view but among them only a few also reported on-body testing. Thus, 15 manuscripts were finally identified as the most representative to discuss validation procedures and technical standards for practical on-body testing for use in the near future. Based on in-depth analysis of those papers, the following sections address the most relevant standard parameters for validation, the validation procedures and the electronic design considerations. Finally, an overall discussion is provided on the main opportunities and challenges highlighted in the comparative analysis.

Standard Parameters for Validation
The performed review of scientific literature allowed identifying the main metrics that are used to validate the mobile or wearable PoCT devices.

Sensitivity
The sensitivity in electrochemical biosensors often refers to the slope of the calibration curve obtained by measuring the response of the sensor at different concentrations of the target analyte. It is usually expressed as the ratio between an electrical parameter and the analyte concentration, and it is one of the most relevant indicators of the performance of the biosensor. As the sensitivity increases, the ability to differentiate among very similar concentrations increases. Since one of the main issues in electrochemistry is unspecific binding, to guarantee optimal performance and the possibility of detecting low concentrations of analytes, high specificity should be combined with a limited signal derived from the background noise sample (blank or matrix sample) and low intrinsic variability of the sensing device with respect to the blank sample.

Limit of Detection and Limit of Quantification
The limit of detection, often referred to as LOD or LLOD (lower limit of detection), represents one of the most adopted quantities to characterize the performance of a biosensor and compares outputs among different solutions. LOD expresses the lowest quantity of target analyte that can be distinguished from the absence of that substance (a blank value) with a stated confidence level (generally 99%). It is estimated from the mean of the blank, the standard deviation (SD) of the blank, the slope (analytical sensitivity) of the calibration plot, and a defined confidence factor (usually 3SD) [27,28]. It can be also considered an indicator of the resolution of the system obtained with a statistical approach since it takes into consideration the contributions of both uncertainty and resolution [27]. Together with the LOD or LLOD, which look at the smallest detectable quantity, the upper limit of detection (ULOD) is often considered, looking at the greatest quantity that can be quantified. This is useful in particular for those biosensors involving selective membranes or enzymes that can become saturated above a certain concentration of analyte [29]. Another metrological parameter that can be associated with the LOD is the limit of quantification (LOQ), which is useful whenever the testing environment may lead to poor reliability due to the significant influence of surrounding variables. The LOQ is usually computed following the same protocol but using a simple "rule of thumb", such as ten times the SD [30] or three times the LOD [27].

Error Analysis and Accuracy
Accuracy in a biosensor is computed as the maximum divergence from the most reliable "gold standard" in terms of assay output. It is usually expressed as a percentage of error of the output from the validated biosensor with respect to the output of traditionally accepted laboratory equipment. For example, pH measurement accuracy is often provided according to pH-meter output, flow rate measurement accuracy is provided according to optical measurement [31], while the gold standards for ion quantification are typically atomic absorption spectrophotometer (AAS) and ion chromatography (IC) [29]. Further, results from biosensors for metabolites such as glucose or lactate are usually compared with the outputs from commercially available strips [32], and the results of alcohol quantification are compared with a commercial FDA-approved breath analyzer [32]. This comparison is a fundamental step for validating and confirming the reliability of a new method, taking as reference already commercialized technologies.

Selectivity
Selectivity represents the ability of the biosensor to correlate changes to a specific analyte, reducing the cross-sensitivity, thus detecting a given analyte in a sample containing a mixture of other analytes and contaminants. An ideal biosensor should present an output variation only in response to a variation in target analyte concentration, while no output variation should be detected for a change in any interfering substance. Therefore, the selectivity of a specific biosensor can be quantitatively expressed simply as a dimensionless scalar number, usually a percentage referring to the variation in the output when an interferent is added to a target analyte solution in equilibrium condition. Its quantification is useful to compare different sensors and different interfering effects, as can be found in [33], where the authors computed the percentage by which the variability in sensor response falls due to potential interferents or, as described in terms of cross-sensitivity, as the slope of the calibration curve when the biosensor is subjected to different concentrations of an interferent [34]. Often selectivity can be considered qualitatively good when no interfering peaks or additional contributions in the output are recorded in the presence of interfering analytes [35].

Repeatability and Reproducibility
Repeatability and reproducibility represent relevant parameters to characterize the performance of biosensors since they take into consideration the variability associated with the measurements, intrinsic due to fabrication variability or extrinsic due to variability in the measurement protocol. They are both usually expressed as percentage relative standard deviation with respect to average measurements.
Repeatability is the ability to respond with limited variability when identical input stimuli are applied under the same working conditions and measurement setup. Specific protocols for evaluating repeatability need to take into account modifications introduced in transducer properties during each measurement, thus applying a suitable protocol to restore the same condition before starting each measurement used to calculate repeatability [36,37].
Reproducibility refers to the ability to respond with limited variability when the same input stimuli are applied under different working conditions (different operators, different instrumentations, different setups).
During the biosensor validation phase, repeatability is the most frequently evaluated parameter since repeated measurements with the same instrumentation are performed on the same sensors or sensors from the same batch treated with the same functionalization protocol. When replicated measurements require a duplicated experimental setup then it is more correct to refer to reproducibility. This is the case, for example, when sensors are integrated into a paper-based or microfluidic setup. When two measurements are performed using the same sensors integrated into different setups, the variability among them, indicated in terms of standard deviation, is referred to as reproducibility. In other cases, repeatability refers to measurements made with a single biodevice, using a solution with the same analyte concentration, while reproducibility refers to the measurements provided by different sensors, such as in [38].
The main disadvantage of using poorly reproducible sensors is that they need to be frequently re-calibrated. Calibration is a necessary procedure that allows one to compensate for sensor-to-sensor fabrication variations, thus improving the accuracy of the sensors, and for sensor drift, mainly due to the instability of sensors. Calibration represents one of the main problems limiting the use of biosensors. Therefore, the information concerning the calibration of a given wearable biosensor should be specified in the biosensor datasheet.

Stability
Stability represents the degree of susceptibility to environmental disturbances and other factors that could take place in and/or around the biosensing system. The output of an unstable biosensor typically presents a drift that affects the quality of the measurement information. Stability is an important parameter when the sensor is applied in applications involving a continuous monitoring of the measurand. Two different approaches can be used to evaluate stability in biosensors for wearable devices. The first approach is to evaluate the stability of the biosensor when it is undergoing a continuous, long, and constant stimulus (e.g., a fixed analyte concentration). The second approach is to perform discrete measurements of the same sample concentration at specific time points. In both cases, stability outcome can be quantified by computing the maximum difference between starting and ending response of the sensor (absolute or as a percentage of the initial measurement) [32], or by computing the decreasing or increasing rate expressed as electrical quantity variation per unit of time [29].
In more detail, depending on the considered time period, the concept of "stability" can refer to (1) operational stability or (2) shelf/storage stability. In general, operational stability is evaluated over short periods of time-typically hours-so as to estimate the changes that might take place during a single phase of continuous measurement. Shelf/storage stability is evaluated over longer time periods-typically months-during which periodically repeated measurements are performed to check any possible change in the quantitative outcomes [39].

Linear Working Range and Linearity
Together with LOD and sensitivity, further relevant parameters that are extracted from the calibration curve are the limits of the linear working range and the linearity of the curve within those limits.
The definition of the linear working range consists in defining the lower and the upper concentrations of the analyte in which the calibration curve can be considered linear. Linearity is a quantitative indicator that defines how much the real curve differs from the ideal regression line. It is usually expressed as R-square [29]. The linear range might be strongly affected by the medium in which the target analyte is dissolved, therefore different results might be obtained from calibrations performed in standard solution with respect to those performed in artificial or real biofluid containing interfering molecules.

Response Time
The response time is a quantitative indicator of the time needed by the biosensor to provide an output that can be reliably correlated with the concentration of the analyte tested. It is usually computed thanks to a dynamic calibration, evaluating the time needed from the output to reach a fixed percentage of the maximum steady-state values. For traditional sensors, the response time is usually the time needed to go from 10 to 90% of the steady-state value, while for biosensor validation, high variability in the meaning of response time can be found. In some cases, as in [29], the response time to the electrolyte in potentiometric sensors is calculated from the in vitro calibration, as the time needed to go from the steady-state baseline value at the lowest concentration to the steady-state value of the concentration after a standard addition. In other cases, as in [31], the response time calculated from both in vitro and in vivo measurements is considered as the time between the injection of the target analyte and the moment in which the signal starts to rise or fall, changing from the steady-state in which it was before. In [38], since the biosensor was characterized according to the amperometric response, the times elapsed in different conditions were evaluated: (1) the time for the baseline stabilization, (2) the time for the signal appearance after the injection of the analyte, and (3) the time to reach the steady-state condition. This analysis allowed the determination of the minimum time for the best correlation between the analyte concentration and the biodevice output.

Recovery Values
Considering the in vitro validation of electrochemical sensors, another important metrological parameter is the recovery rate. This metric is usually obtained when a known amount of the target analyte is spiked into a sample matrix and then measured with the novel technologies, then, the variation between the measured concentration and the known value is calculated [40]. A bad outcome of the test suggests that the measure is affected by the matrix effect that occurs when there are differences between the sample matrix and calibrator diluent that affects the response in the signal [41].

Validation Procedures
A further important step in the definition of the correct assessment standards is to highlight the possible validation procedures used in the context of PoCT devices, visually summarized in Figure 1.

Recovery Values
Considering the in vitro validation of electrochemical sensors, another important metrological parameter is the recovery rate. This metric is usually obtained when a known amount of the target analyte is spiked into a sample matrix and then measured with the novel technologies, then, the variation between the measured concentration and the known value is calculated [40]. A bad outcome of the test suggests that the measure is affected by the matrix effect that occurs when there are differences between the sample matrix and calibrator diluent that affects the response in the signal [41].  [42] Copyright 2021 American Chemical Society; (c) example of a standard selectivity test for K + potentiometric sensor reproduced with permission from [43], copyright 2020 Elsevier; (d) example of an operation stability test, reproduced with permission from [29] copyright 2020 Elsevier; (e) example of a typical setup for an exercise-induced sweat test, reproduced from [44] (open access); (f) example of a typical setup for a iontophoresis-induced test, reproduced with permission from [33], copyright 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Validation Procedures
A further important step in the definition of the correct assessment standards is to highlight the possible validation procedures used in the context of PoCT devices, visually summarized in Figure 1.

Laboratory Validation
The first stage of validation of any point of care refers to the analysis of the results obtained for the biosensor in controlled-environment laboratory conditions. Since in this phase the aim is not to characterize the complete PoCT device during its real use, portability is not required, and thus certified commercial instrumentation which serves as "gold standard" is often exploited to guarantee optimal accuracy, stability, resolution, and reproducibility of the measurement system. Moreover, these results are usually used as a  [42] Copyright 2021 American Chemical Society; (c) example of a standard selectivity test for K + potentiometric sensor reproduced with permission from [43], copyright 2020 Elsevier; (d) example of an operation stability test, reproduced with permission from [29] copyright 2020 Elsevier; (e) example of a typical setup for an exercise-induced sweat test, reproduced from [44] (open access); (f) example of a typical setup for a iontophoresis-induced test, reproduced with permission from [33], copyright 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Laboratory Validation
The first stage of validation of any point of care refers to the analysis of the results obtained for the biosensor in controlled-environment laboratory conditions. Since in this phase the aim is not to characterize the complete PoCT device during its real use, portability is not required, and thus certified commercial instrumentation which serves as "gold standard" is often exploited to guarantee optimal accuracy, stability, resolution, and reproducibility of the measurement system. Moreover, these results are usually used as a reference comparison to validate any customized mobile or wearable electronic solution, and thus to ensure the reliability of in vivo [38] measurements. This phase is fundamental for optimizing the detection protocol, without taking into consideration further variability introduced by an on-body or a home-based use of the complete PoCT device. All the information acquired from the analyzed papers and concerning these approaches is summarized in Table 3. Furthermore, in the laboratory evaluation, all the issues about interferents need to be completely characterized in order to provide a controlled analysis of the effect of non-target analytes on the final output. The two main classes of laboratory validation are standard solution-based and artificial biofluid-based.

. Validation Tests with Standard Solutions
The first stage to validate any electrochemical sensors included in wearable devices encompasses evaluation of the performance performing an in vitro calibration using standard solutions prepared in a controlled environment, containing only the target analyte, without any interfering agents.
The ranges of concentration used to perform sensor calibration depend on the specific target analyte's physiological concentration expected during real operation. Focusing on validation of sensors for sweat analysis, standard solutions are usually obtained using deionized water to dilute electrolytes [29,31] or a diluting buffer containing a supporting electrolyte as phosphate-buffered saline (PBS) [32], potassium chloride [35], or acetate buffer [33]. The choice of the most suitable solvent depends mainly on three factors: the solubility of the analyte, the window voltages of interest to detect the analyte, and the material of the working and counter electrodes. Thus, a suitable solvent should ensure no response in the potential window tested, to avoid interfering signals in the baseline. The window potential in which this takes place is named the "stability window" and its definition represents one of the most crucial operations to establish the most suitable supporting electrolyte solution and to ensure the reliability of any signal recorded when the analyte is present.
Usually, two different protocols are employed to evaluate and calibrate the sensor response under analyte concentration changes. The first protocol refers to a standard calibration, often known as "single point" or "static" calibration, in which several sensors equal to the number of tested concentrations are employed to build a calibration curve. Each sensor is exposed to a specific concentration and its electrical response related to the value of concentration tested, in terms of potential or current [33]. If several sensors are tested with the same concentration, the standard deviation of each calibration point can be calculated, providing a quantitative idea of the reproducibility of the measurement. The resulting calibration plot correlates the analyte concentration and the electrical response of the sensor, and provides quantitative information in terms of sensitivity, LOD, LOQ, and RSD.
The second protocol, also known as "continuous" or "dynamic" calibration, refers to a calibration in which the same sensor is successively exposed to different steps of rising and falling concentration of the target analyte, and its electrical output quantity measured. Knowing exactly the time in which the concentration is changed, from the measured output signal it is possible to extract useful information concerning the response time, the transitory, and the steady-state duration. The value reached by the electrical output during the steady state before each successive injection is sampled and related to the corresponding concentration. Thus, a calibration curve can be extracted from these data, similarly to what is performed during a single-point calibration. Repeatability can be evaluated by performing the very same pattern of increasing and decreasing concentration on multiple sensors. This protocol is essential to validate sensor performance over time.
This continuous validation can be realized by using two different setups. The first one, more traditionally employed in electrochemistry, is based on sensor immersion in a beaker initially containing only the supporting electrolyte solution continuously stirred to ensure homogeneity. Before starting with the following additions, the sensor needs to be left in the supporting electrolyte solution for a period ranging from 2 to 10 min to stabilize. This period depends on the geometry of the sensor and the amount of liquid. Each step of increasing concentration is then achieved by adding standard additions of a highly concentrated solution, each decreasing, adding further diluting supporting electrolyte solution after the maximum concentration is reached. The amount of liquid and the velocity of the stirring process should be optimized since these aspects can affect the sensitivity of the resulting calibration curve.
The second setup, which resembles more realistically the real-operation conditions of the sensors, is based on the integration of the sensor itself within a dedicated microfluidic circuit. Essential elements of this circuit-which can be realized with various technologies-are: (1) a sample pad or wick used to collect the sample; (2) a thin channel necessary to continuously deliver the sample on top of the electrodes; (3) a waste reservoir needed to discard the already measured samples. The starting point level is usually obtained by providing a continuous flow of supporting electrolyte solution on top of the sensing electrodes to reach a steady-state level. After that, flows with increasing and decreasing concentrations are provided to the electrodes, and their response is measured. Using this second setup, the velocity of the flow is the main variable that can affect the response time and the sensitivity of the measurements. Thus, particular attention in dynamic calibration performed with the sensor integrated into microfluidic circuits is addressed to the optimization of the materials and geometries used to form the wick, channels, and waste reservoir in order to provide a continuous flow [31]. Furthermore, additional tests of the sensors integrated into standardized setups with controlled flows are often performed, in order to find the influence of flow rate on sensor sensitivity [35].
The time stability of both the setups described can be then evaluated by long-term stability tests of the response of the sensor to a fixed concentration and flow rate over a long period, continuously [29,31], or at discrete time points [32].

Validation Tests with Interfering Analytes
The second stage required to validate electrochemical sensors included in mobile or wearable devices encompasses the in vitro evaluation of their performances in presence of controlled concentrations of interfering agents in addition to the target analyte. By using this approach, it is possible to estimate the selectivity of the biosensor under near realistic working conditions. This analysis is usually referred to as an "interferent test" or "selectivity evaluation" since it focuses on the evaluation of the selectivity of the designed sensor against the target analyte even in presence of other analytes (e.g., electrolytes, metabolites, or proteins) that are physiologically present in human sweat.
The evaluation of the performance of sensors in non-ideal conditions can be either performed by adding standard solutions with a controlled concentration of interfering analytes [32,33] or by exploiting more complex multi-analyte solutions, also known, in this case, as "artificial sweat" [29]. In the first case, a solution containing a fixed concentration of the target analyte is measured until a steady state is reached. Then, during the same measurement (e.g., potentiometric, amperometric), interferents of known volume and concentration are added. In the final measurement, knowing the timepoint at which the addition of interferents was performed, it is possible to quantify their effect on the steadystate current measured. While performing those tests the change in sensor response (i.e., potential level, current peak amplitude, steady-state current) should be evaluated and quantified-in absolute or percentage terms [33]. For this specific application, the types of interfering agents are usually chosen considering the physiological composition of human sweat and are mainly: uric acid, ascorbic acid, glucose, lactate, and anions or cations other than the target ones.
Artificial sweat and sweat from control patients are some of the most useful fluids used directly as diluting buffer for the target analyte. The influence of interfering agents on the output of the sensor is usually evaluated considering different proportions between the target analyte and the interfering elements, to understand if there is a specific threshold above which they start to interfere. This can be obtained by employing different dilutions of the artificial sweat to change the concentration of interferents in the final solution [45].

In Vivo/Clinical Validation
Once the laboratory validation of the biosensor has been completed, the following step refers to the validation of the overall mobile or wearable PoCT device in non-controlled conditions, during realistic operation. Since, in this phase, it is essential to test the device as a whole stand-alone platform, highly portable, customized electronics, properly interfaced with the biosensor are used. This stage is essential to account for all the contour variables including influences of interferents, movement artifacts, time, humidity, and overall reliability of the mobile or wearable customized electronics. Concerning these issues, comprehensive information from the analyzed papers is summarized in Table 4.
For the identified application, the validation with real sweat samples can be distinguished mainly between ex situ analysis, taking into consideration samples previously sampled from human subjects, and in situ analysis, evaluating the wearable device during its real on-body operation.
Both ex situ and in situ sweat analysis can be performed on sweat collected spontaneously as a consequence of the subject performing physical exercises or thanks to a sweat induction through standard iontophoresis. The indoor cycling bike [12] or the treadmill [11] were typically used to control physical activity and the workload. For example, an elevation of the lactate concentration can be obtained by increasing the treadmill speed. Sweat induction through standard iontophoresis is a fairly widespread technique [38], especially for those experiments that are focused not on monitoring sweat during exercise, but on evaluation of particular physio-pathological conditions (e.g., stress or cardiac issues) [32,46]. In other cases, sweat analysis was used as an alternative to invasive blood analysis; for instance, in [38], the ethanol concentration measured in sweat by the proposed biosensors was demonstrated as a non-invasive method for the determination of ethanol concentration in the blood and a valuable alternative to the breathalyzer, especially for preventing measurement alteration by the users. The typical sweat induction is referred to as active iontophoresis (IZASA). Briefly, two recessed stainless-steel electrodes, covered with a gel containing pilocarpine are strapped on the target site of the subject's body and a small electrical current (usually around 1.5 mA) is passed through the electrodes for 5 min by using a battery-powered device. The electrodes are usually then removed and two paths can be followed: the induced sweat is directly collected and analyzed ex situ [33] or the wearable biodevice is positioned onto the same spot and the measurement performed directly in situ [32,38]. According to the characteristics of the biosensor (i.e., disposable or reusable, repeatability, reproducibility, stability, etc.), the device could be tested in continuous single measurement modes [38]. Although the definition of the sample size is fundamental from the clinical perspective, among the considered papers, the number of volunteers-used as a control group-ranged from 1 to 40 [38].

Validation against "Gold Standard" Assays
Proper validation of any biosensor performances cannot be considered complete without a comparison of the results obtained with the outputs from "gold standard" methods, which are well accepted and certified in laboratories. The choice of the specific method depends on the category to which the target analyte belongs (e.g., protein, metabolite, or electrolyte). Those methods are represented by the "gold standard" techniques that are routinely adopted in wet laboratories (e.g., spectrophotometry, atomic force microscopy, optical measurements, chromatography, and standard protein quantification assays) but also by commercially available certified portable devices (e.g., glucometers, lactometers, and breath analyzers).
Regarding pH electrodes, the most-adopted reference technique is a standard pH meter comprising a solid electrode in a bulb and that is re-calibrated before any measurement to ensure high accuracy [34,42].
Regarding sweat rate measurement, the commonly adopted methods rely on an optical system (e.g., Macroduct system) that represents a standard sweat collection system used in cystic fibrosis diagnosis. This solution is extremely useful in optical sweat rate measurement since it can be worn on a small region of the body to measure the local sweat content and sweat rate [31].
Further, the performance of sensors addressing metabolites (e.g., glucose or lactate) is usually validated using commercial strips combined with a 5glucometer/lactometer [32,45].
Differently, biosensors addressing alcohol quantification have been reported to be validated following different methods. A first approach relies on the use of a commercial FDA-approved breath analyzer that is able to quantify the analyte in breath [32], whereas a second approach is based on the use of a gas chromatography-based blood analysis of prewarmed, free-flowing, fingertip punctures into heparinized capillary tubes [38]. A third method relies on the measurement of salivary lactate using a commercial immunoassay kit [22].

Electronics Design Considerations
Regarding the different examples of conditioning and transmission electronics proposed in the analyzed papers, despite the evident use of different designs and specific components, a common scheme could be recognized; this overall approach allowed us to track useful indications for researchers of the field. It is worth further noting, that the electronic components play a fundamental role in the overall reliability of any complete PoCT device. All the information concerning electronics design and choices acquired from analyzing the papers is summarized in Table 5 and visually summarized in Figure 2.
Further, the performance of sensors addressing metabolites (e.g., glucose or lactate) is usually validated using commercial strips combined with a 5glucometer/lactometer [32,45].
Differently, biosensors addressing alcohol quantification have been reported to be validated following different methods. A first approach relies on the use of a commercial FDA-approved breath analyzer that is able to quantify the analyte in breath [32], whereas a second approach is based on the use of a gas chromatography-based blood analysis of prewarmed, free-flowing, fingertip punctures into heparinized capillary tubes [38]. A third method relies on the measurement of salivary lactate using a commercial immunoassay kit [22].

Electronics Design Considerations
Regarding the different examples of conditioning and transmission electronics proposed in the analyzed papers, despite the evident use of different designs and specific components, a common scheme could be recognized; this overall approach allowed us to track useful indications for researchers of the field. It is worth further noting, that the electronic components play a fundamental role in the overall reliability of any complete PoCT device. All the information concerning electronics design and choices acquired from analyzing the papers is summarized in Table 5 and visually summarized in Figure 2.  First of all, essential considerations for the design of suitable electronics include, from one side, the requirements related to portability, ease, and duration of use and, from the other side, the requirements related to accuracy, resolution, and precision.
Regarding the first set of requirements, one of the most relevant aspects is the material of the substrate on which the electronic components are soldered, as well as its enclosure and integration with the sensor's microfluidic part. Despite most of the examples reported still proposing standard miniaturized circuits realized on rigid printed circuit boards (PCB) [31,33], an interesting example of evolution toward flexible electronics can be observed. The first approach presented in [35] proposed ensuring flexibility of the electronic part by enclosing a rigid PCB in a properly designed flexible material, to avoid displacement during exercise. The second approach presented in [29,32] reported the use of flexible materials, such as polydimethylsiloxane [45] and polyimide [29,48] as useful substrates on which electronic components can be directly fixed thanks to suitable conductive epoxy glues.
Another relevant aspect to deal with is the battery life and overall power consumption. To enable stand-alone wireless functioning of the mobile or wearable PoCT devices, rechargeable lithium-ion batteries with a nominal voltage of 3.0 V [32,48] or 3.7 V [31,33] are usually adopted [33], which ensures a use duration of a few hours [32,35,49]. Furthermore, the conditioning system is typically a single-supply system, and the electronic components are selected in order to reduce the power consumption and thus extend the overall battery life. An interesting approach for the power supply is the design of battery-free biosensors, where power is harvested in the proximity of NFC-enabled smartphones then modulated to a stable voltage output for MCU and AFEs (˜2.76 V) [29].
The first stage of the electronics interfacing with the connections to the electrodes of the biosensor depends upon the category of electrochemical biosensors, i.e., amperometric, potentiometric, or impedimetric. In amperometric sensors, a three-electrodes layout is usually adopted, with working (WE), reference (RE), and auxiliary (AE) electrodes. The transducer circuit is composed of two elements: a control circuit, to keep the voltage potential of the electrochemical cell stable, and a current-to-voltage (I/V) converter, based on a transimpedance amplifier, to measure very low currents reducing the effect of changes in the output impedance [33]. In potentiometric and impedimetric sensors a two-electrode layout is usually adopted, with only WE and RE [31]. For potentiometric measurements, the conditioning circuit is composed of a control circuit and an instrument amplifier, that ensures amplification of the input voltage between the electrodes WE and RE, by guaranteeing, at the same time, a very high input impendence. A combination of a variable frequency generator and a frequency analyzer is generally used for impedimetric measurements [35]. This allows stimulation of the electrochemical cell at different frequencies, and the analysis of the resulting alternate current to compute the cell impedance. Operational amplifiers are used also for active filter stages to filter the signal and remove the noise as well as for gain stages. For example, in [20], the signal conditioning circuit was designed to amplify the current measured by the biosensor between 1000 and 2831 times to obtain a measurable voltage output. Rail-to-rail inputs and outputs, output stability over time, and high gains at lower supply voltages are fundamental features for the selection of the operational'amplifiers.
The remaining blocks of the conditioning electronics are quite common among all the electronics proposed, and, in general, include: • Customized high impedance analog front ends to process voltage signals of the two or three electrodes to output stable voltages; • Built-in analog to digital converters (ADCs) to convert the analog signal into digital signals; • A micro-controller unit (MCU), usually programmable, that manages all the operations on the board; • A display [20]; • A transceiver (typically a Bluetooth module) to wirelessly transmit the data provided by the MCU to a user interface for displaying the measurements on a laptop or mobile. Although integrated Bluetooth low energy (BLE) represents the most frequently employed method for wireless transmission from the wearable device to laptops [22,31], or smartphones [31], interesting examples can be found adopting also an NFC-based transmission [29,34]. This recent method can improve miniaturization since the power supply can be harvested from mobile phones via NFC, thus resolving the need for battery integration in the wearable device. As opposed to BLE-based PoCT, the NFC-based solutions need a short distance between the PoCT and the reader and allow single-point measurements. • A DC-DC converter (or low-dropout voltage regulator, etc.) to realize regulated and stable voltage from one single power supply and suitable for all the electronic sections-the analog front end and the digital components and the transceiver-which can need different voltage supplies [33].
If the platform consists of multiple sensors, multiplexes or high-voltage switches are required to measure multiple sensors within the same measurement chain [32,34,44]. In this way, the size and power consumption of the platform do not depend on the number of the sensors and can be kept to a minimum.

Discussion
The analysis of the works included in this review allowed us to identify not only recurring trends and useful guidelines but also limitations and gaps in the validation procedures more commonly adopted to validate complete PoCT devices, in a stand-alone configuration.
Although similar protocols and validation phases can be recognized, the papers analyzed often show several missing steps and stages, thus preventing recognition of a shared uniform standardized workflow that can-and should-be used for characterization.
The effort in trying to standardize methods for characterizing chemical and biological sensors can be highlighted in the literature produced during the last two decades [50,51]. However, emerging from this review is a lack of procedure standardization and the absence of clear scientific references able to explain why the validation protocol was organized in that specific way. This represents the main gap that exists between the validation of biochemical sensors, and the widespread physical sensors (e.g., accelerometers, pressure sensors), making them less robust and not yet ready for the commercialization stage. Despite this delay, the performed analysis shows an improvement in the procedures in terms of standardization and the number of metrological parameters reported in the last 5 years. Increased attention to, and a more accurate description of protocols used to characterize the biosensors, at least for the in vitro phase, can be seen [45].
An element of weakness that could be noticed even in papers with a relatively complete and standardized characterization procedure is the lack of indication of the number of samples or subjects evaluated and how average measures and standard deviation information were calculated; indeed, this represents a limitation since only the knowledge of sample dimension can discriminate between a proof-of-concept and a properly working biosensor that can proceed further to clinical trial testing or commercialization.
Another crucial point not always considered in the analyzed papers is represented by a proper comparison of the output of the wearable PoCT device under assessment with respect to commercially available certified systems. This comparison should include two different stages. The first unavoidable step is the comparison of the performance of customized electronics with results obtained using benchtop or portable commercial potentiostats while performing the same electrochemical measurement (e.g., amperometric, potentiometric, or impedimetric); this represents a preliminary fundamental step of the validation procedure, which is essential to ensure the reliability of any customized electronics integrated into mobile or wearable platforms. Often this comparison is missing within the same publication [33,38], whereas in a few papers authors declare only that the customized electronics provides similar measurements to the ones provided by the benchtop equipment [47]. In further studies, the performances of the mobile or wearable electronics during in vivo measurements are validated by comparing the results of analyte quantification with results from an ex situ analysis performed by benchtop instrumentation [43,52]. Once this reliability is guaranteed, it is possible to proceed with a second comparison: the validation of the outputs with those ones obtained from "gold standard" approaches, routinely adopted in wet labs. This validation is essential to ensure that the transducing principle of the biosensors included in the wearable devices is coherent with "gold standard" techniques, both in terms of absolute results and, especially, in terms of reproducibility and stability over time. Regarding this type of comparison, even when present, it often lacks proper quantitative parameters that could help to better characterize it. Accuracy and recovery value thus are often missing from the parameters indicated in papers. Indeed, as reported quantitatively [29,33,34,43] or only qualitatively [42,45] by some of the papers analyzed, these parameters represent a significant figure of merit to verify the reliability of the proposed novel methods in comparison with existing ones.
Finally, information about the comfort and the ease of use represent crucial aspects in the development of wearable solutions; however, this point was not always provided by the authors [38], even when the design optimization was studied extensively [35]. In more recent publications, movies of the real-time ex situ analyses operated by the proposed PoCT are provided [47] or extensive and detailed information about the practical use of the developed customized application are reported [29,33,43,48]. Regarding comfort, most of the analyzed works simply state that thanks to soft encapsulating materials (e.g., PDMS) the device can be comfortably worn by subjects [43,52]. Interesting analysis was performed in [29], where a specific combination of PDMS and silver nanowires was tested to achieve high stretchability of the electrodes with an all-printing technique, to accommodate to skin deformations, without compromising the distinct conductivity change of the wires. Another interesting analysis to ensure proper compatibility between comfort and performances was observed in [52] where extensive mechanical characterization of the device was reported, simulating stress due to continuous use.
In general, the focus of the analyzed papers is on the sensor design and its performance, while the measurement system is lightly described, or its description is only included in the supplementary material. In some cases, the power source and its nominal value are not reported in the text, such as in [38,44,47], as well as the power consumption [31,33]. The main thing to remember is that the circuit contributes significantly to the final results (for example on the accuracy and LOD). For example, the operational amplifiers and the passive components can increase the electrical noise when the input and output of the impedance converter are not properly biased because the metal electrodes can be polarized and this alters the measurements [31]. A proper filtering circuit is needed for a reliable measurement. An RC low-pass filter is often chosen [48,49], while active filters guarantee the best performance [31,33]. The sensor-electronics interface should also be properly designed. For example in [35], the electronics are rigidified in order to reduce the voltage offset due to the pressure induced by the patch on the case.
The nonuniformity in metrological parameters represents another key element that can be highlighted from the present analysis. Although at least one metric of sensitivity, slope, or LOD is given, they are often presented in non-standardized forms and often reported only graphically and/or left to be deduced from calibration curves. A similar discussion can be had on linearity, often reported with non-standardized indicators, or just stated qualitatively, and for repeatability and reproducibility, often confused or mistakenly overlapped. Further, selectivity and stability are often commented only in a qualitative way, without proper quantitative indicators, as reported in [35,43] but not provided in [31]. However, defining those parameters in quantitative terms represents a crucial point in the comparison among repetitive measurements even when considering replicated setups. Focusing on stability, accurate computation of both operational and storage stability is fundamental to improve sensor design and to better understand their behavior, as highlighted in [34,45]. Interesting examples can be found in [52], where results from the evaluation of stability are exploited to define a proper protocol for periodic calibrations; in [49] this information is used to improve the process for the fabrication of the microfluidic system in which biosensors are integrated; and finally, in [53], the authors discussed how stability results can be used to eventually design proper algorithms for compensating biosensor performances degradation during both operation and their shelf-life.      Combined single point and continuous calibration of Na + sensor and flow rate sensor: (1) fixing the flow rate, sensing electrodes exposed to three different Na + concentrations.
(2) fixing the Na + concentration, spiral electrode exposed to three different flow rate values. Continuous calibration of a Na + ISE integrated in a patch to various salt solutions (0.1,1,10,100 mM) at a flow rate of 5-10 µL min. All measurements performed with commercial instrumentation and using an in vitro fluidic system for artificial sweat delivery to the patch Change in sensor response measured when K + is added in a Na + environment. n.a.

Kim 2018 [32] 3 PBS
Single point calibration: response of the glucose and alcohol biosensor tested in response to increasing ethanol concentrations tested with separate chronoamperometric measurements of 60 s.
Change in sensor response measured when relevant electroactive species (glucose, lactate, creatine, ascorbic acid, and uric acid) were added to 10 × 10 −3 m ethanol.
Repetitive measurements of ethanol (left) and glucose (right). were measured. Additional testing performed to evaluate the influence of mechanical bending and using artificial sweat diluting solution.
Change in sensor response measured when relevant electroactive species are added. Shared solid-state Ag/AgCl or PVB reference electrodes were used respectively for metabolites and ions. Data recording was paused for 30 s for the addition of each analyte.
Repetitive measurements performed over a period of 4 weeks.  Task: 30 min stationary biking load is initially increased, then maintained constant for up to ∼23 min, and then decreased.
The sweat secretion and subsequent analysis of the data (real time presented) were started after 8-10 min of exercise activities. Data obtained from in vivo calibrated device were compared with ex situ measurements.
Comparison with results from high-performance liquid chromatography (HPLC), flame atomic absorption spectrometry (FAAS), and a pH meter. Task: subjects wear the device for two consecutive days and, on each day, execute a cycling exercise (15 min to 20 min) in the morning in fasting state, followed by another cycling bout 20 min after consuming 150 g of sweetened drink and then again 2 h after lunch in the evening. Data acquisition: real-time data acquisition during each trial occurs either through a compact, short-range reader or an extended, long-range reader that was positioned in the vicinity of the device.
During human trials, the subjects paused to take an image of the device using a smartphone camera, for colorimetric analysis. Mean values extracted from three random points (n = 3) from the colorimetric assays yielded the chloride concentration and pH. Mean concentration values calculated from the data generated by the electrochemical lactate and glucose sensors during the last 1 min (n = 60) of cycling resulted in concentrations.
Analyses using conventional techniques such as commercial blood lactate (Lactate Plus; Nova Biomedical, MA) and blood glucose (Accu-Chek Nano blood glucose meter, Roche Diabetes Care Inc.) meters capture blood lactate and glucose levels before and after each cycling bout, as points of comparison. Task: high-intensity exercise on a bicycle or a treadmill with the patch on the subject's back, adjacent to the spine on the latissimus dorsi muscle and/or thoracolumbar fascia in the region of the upper lumbar vertebra. Data acquisition and transmission: The sweat patch and associated electronics module were monitored remotely via Bluetooth for the duration of the exercise sessions. Sample rate: 100 Hz, then moving averaged at 15 s All work was carried out under a scope of work that was determined not to be human subject testing (E&I Institutional Review Board). As a result of this process, the data collected during on-body was specifically not calibrated to prevent the generation of physiologic data on the subject, thus all data for on-body testing is presented in the raw form (mV).
n.a. Discrete values acquired compared among the different conditions tested.
The correlation blood level and sweat current signal for both glucose and alcohol was quantitatively described using R 2 parameter, Immediately before starting the experiment, blood glucose and alcohol levels were measured using commercial glucose strips (Accu-Chek Aviva Plus) and a commercial FDA-approved breath analyzer (Alcovisor Mars Breathalyzer, Hong Kong) to validate the sensor performance.
Gao 2016 [52] 26 Ex situ sensor performance conducted by testing sweat samples collected from the subjects' foreheads.
Sweat samples were collected every 2-4 min by scratching cleaned foreheads with microtubes Task: three trials: constant workload cycle ergometry (14 subj), graded workload cycle Ergometry (7 subj), and outdoor running (12 subj). An ergometer providing real-time monitoring of heart rate, oxygen consumption, pulmonary ventilation and power output was used. The FISAs were packaged inside traditional sweatbands during the indoor and outdoor trials. The sensor arrays were calibrated, and worn on cleaned foreheads and wrists.
Due to differences in absolute potential values for ISEs in the same solution. Therefore, one-point calibration in a standard solution containing 1 mM KCl and 10 mM NaCl was performed for Na + and K + sensors before each use. The measured potential of ISEs in the standard solution was then set to zero by the microcontroller.
The accuracy of on-body measurements was verified through the comparison of on-body sensor readings from the forehead with ex situ (off-body) measurements from collected sweat samples, no additional method used.
Sweat was induced by an IP system (Macroduct) for 5 min. Task: alcoholic beverage assumption in 5-10 min (gin, rum or whisky mixed with a cola or juice soft drink) then alcohol monitoring following two protocols: continuous mode → biodevice on the skin after sweat generation, continuous measurement from 30 min to 2 h. single measurement mode → placing the biodevice in contact with the skin for 5 min every 15 min.
Data analyzed by computing the correlation between the ethanol content in sweat and in blood by gas chromatography, the correlation between BAC and the current measured in sweat with the biodevice time response and maximum ethanol concentration with respect to blood results.
Reference standard blood analysis using gas chromatography method was performed 30 min after alcohol intake and every 10 min or after the single measurement.

Bluetooth Low Energy
The output of the voltage follower circuit is followed by an RC low-pass filter to minimize the noise and interference in the measurements. The potential of the reference electrode, from a voltage divider network, is constant and then digitized and recorded using the ADC.
Zhang 2021 [43] Copper and polyimide flexible film + PDMS cover Battery-free, power harvested from smartphone thanks to Near Field Communication interface chip The SD14 module of the integrated NFC chip is a multi-channel sigma-delta analog-to-digital converter with up to 14 bits of resolution integrated into the NFC chip consisting of a programable gain amplifier (PGA) and a sigma-delta analog-to-digital converter (ADC). The output of the [K + ] sensor was read through a 14-bit ADC convertor.
Near Field Communication (NFC). The core of an integrated circuit communicates with other modules via a memory data bus and a memory address bus. Instructions and data are sent from the smartphone to the RFID chip through the ISO15693 analog front end and decoded through the ISO15693 decoding module.
The smartphone application was developed based on an Android studio software program that displays the [K + ] concentration by reading and calibrating the ADC's output voltage.
Vinoth 2021 [42] Standard rigid PCB 200 mAh Li-ion rechargeable battery; output regulated using a low dropout regulator to obtain precisely 3.0 V and stable power for every circuit component.
For chronoamperometry, potential voltage generated by a 16-bits DAC, feedback loop compares its output with reference, a driver circuit controls the potential of the CE and a trans-impedance amplifier (TIA) converts WE current into voltages. For potentiometry, three differential amplifiers built with operational amplifiers with extremely low input bias current (typically 0.5 pA) with rail-to-rail operation, enabling precise and wide-range measurement. Both outputs are sampled and digitized by an ADC integrated in the MCU, that controls all the PCB via serial peripheral interface (SPI).

Bluetooth low energy (BLE)
The real-time data are displayed using a custom-made graphic interface in the host platform. The regional dependence in sweat secretion and its biomarker composition is investigated by performing the sweat analysis at the underarm and upperback locations.  A set of high voltage switches to switch between glucose and alcohol sensing, as well as to enter a high-impedance state during iontophoretic processing.
Gao 2016 [52] Flexible PCB Rechargeable lithium-ion polymer battery (3.7 V, 105-mAh) Programmable microcontroller through an in-circuit serial programming interface. The conditioning path for each sensor was implemented in relation to the corresponding sensing mode. In the case of the amperometric-based glucose and lactate sensors, the originally generated signal was in the form of electrical current.

Bluetooth
All the analog signal conditioning paths concluded with a corresponding unity gain four-pole low-pass filter, each with a −3-dB frequency at 1 Hz to minimize the noise and interference in our measurements.
A mobile application (the Perspiration Analysis App) was designed to accompany the FISA and to provide a user-friendly interface for data display and aggregation    (1) Observed signal in static calibration resulted in a highly linear response (2) linear correlation with R 2 = 0.998 between blood and sweat glucose and R 2 = 0.999 blood-alcohol level and sweat-alcohol using the same subject.
No response time evaluated: waited at least 5 min after the analyte intake to be sure to record steady state values n.a.

Conclusions
The present work revised and discussed the main validation procedures and standards for the assessment of mobile or wearable PoCT devices, specifically focusing on sweat analysis. The metrological aspects highlighted showed that recent years have brought much attention to making those devices more reliable. However, a further effort is still needed in order to standardize the procedure that should be used in validating stand-alone electronic platforms based on electrochemical biosensors; only by having common and shared guidelines, will it be possible to enable the reliable use and commercialization of such systems for multiple biomarker detection in wearable devices.