In-Country Method Validation of a Paper-Based, Smartphone-Assisted Iron Sensor for Corn Flour Fortification Programs

Food fortification in low-income settings is limited due to the lack of simple quality control sensing tools. In this study, we field validated a paper-based, smartphone-assisted colorimetric assay (Nu3Px) for the determination of iron in fortified flours against the gold standard method, atomic emission spectrometry (AES). Samples from commercial brands (n = 6) were collected from supermarkets, convenience stores, and directly from companies in Mexico and characterized using both Nu3Px and AES. Nu3Px’s final error parameters were quantified (n = 45) via method validation final experiments (replication and comparison of methods experiment). Qualitative pilot testing was conducted, assessing Nu3Px’s accept/reject batch decision making (accept ≥ 40 μg Fe/g flour; reject < 40 μg Fe/g flour) against Mexico’s fortification policy. A modified user-centered design process was followed to develop and evaluate an alternative sampling procedure using affordable tools. Variation of iron content in Mexican corn flours ranged from 23% to 39%. Nu3Px’s random error was 12%, and its bias was 1.79 ± 9.99 μg Fe/g flour. Nu3Px had a true mean difference from AES equal to 0 and similar variances. AES and Nu3Px made similar classifications based on Mexico’s policy. Using simple, affordable tools for sampling resulted in similar output to the traditional sampling preparation (r = 0.952, p = 0.01). The affordable sample preparation kit has similar precision to using analytical tools. The sample preparation kit coupled with the smartphone app and paper-based assay measure iron within the performance parameters required for the application to corn flour fortification programs, such as in the case of Mexico.


Introduction
Micronutrient deficiencies continue to afflict populations living in low-and middleincome countries [1]. Though other strategies to address it exist, food fortification has been heralded as the most cost-effective strategy to improve micronutrient status among vulnerable populations [2]. Despite its effectiveness, the food industry, as well as local governments, lack tools to monitor fortified food entering the different markets [3]. This is partially due to the limited availability of affordable and valid analytical and sensing tools to support these efforts [4,5].
Sensors are tools frequently used in the food and agriculture industry in recent decades for monitoring several important factors such as environmental changes (e.g., temperature and moisture), quality (e.g., texture and nutrient content), and safety (e.g., microbial and chemical hazards) of food products. The advantages to using sensors are that they are often low cost and provide fast, actionable data [6,7]. Sensor development includes the establishment of several performance parameters such as accuracy, reliability, specificity, dating the Nu3Px sensor, in which we included and validated a sample collection tool kit that is aligned with WHO's ASSURED criteria and used commercial corn fortified samples from Mexico. Ultimately, this technology can support monitoring and evaluation efforts of food fortification programs worldwide.

Collection and Characterization of Mexican Corn Flour Samples
Samples (n = 25) of corn flour from six commercial brands were collected from supermarkets, convenience stores, and directly from the companies in the Querétaro region, Saltillo, and Cuetzalan, Mexico. Convenience sampling was used to collect samples. Using atomic emission spectroscopy (AES, AOAC Official Method 984.27 [25]), the following mineral concentrations were measured: nitrogen, phosphorus, magnesium, potassium, calcium, sulfur, boron, manganese, copper, zinc, aluminum, sodium, and iron. Briefly, samples (0.3 g) were wet-ashed using 5 mL trace mineral-grade nitric acid and 2 mL 30% hydrogen peroxide. After closed vessel ashing, samples were diluted with ultrapure water to fit external standard curves for selected elements (Sigma-Aldrich, St. Louis, MO, USA). Samples were injected into an atomic emission spectrometer 4100 MP-AES (Agilent Technologies, Santa Clara, CA, USA) equipped with a plasma torch, a standard glass concentric nebulizer, and a cyclonic spray chamber. The MP-AES was calibrated using a diluted ICP-OES wavelength calibration solution (1:10 v/v) as an internal standard. Multielement standards were purchased from Sigma-Aldrich (St. Louis, MO, USA) to create external calibration curves. Nitrogen was measured in a CE440 Elemental Analyzer (Exeter Analytical, Inc., Chelmsford, MA, USA) based on thermal conductivity detection after combustion and reduction. All analyses used double deionized water. All glassware was either washed with acid solution prior to use or exchanged for plastic counterparts.

Replication Experiment (Determination of RE)
Using characterized samples, the Nu3Px sensor's measurements were evaluated against AES by running tests to show between-day variability (coefficient of variation, CV b ) based on 3 replicates collected at different times over 2 weeks of the same sample [24]. While the within-day CV% (CV w %) served as a preliminary measure of precision [17], CV b (%) serves as the final indicator of random error as is customary in validation experiments [24].

Comparison of Methods Experiment (Determination of SyE)
The systematic error was determined by a comparison of methods, as described by Westgard [24]. The constant and proportional error for iron determination using the paperbased assay was assessed by using in-country corn samples from various companies and collection time points. A plot was constructed with the AES measurement on the x-axis and the paper-based measurement on the y-axis. A linear regression line was fit, where the slope indicates a proportional error (deviation from 1) and the y-intercept indicates constant error (deviation from 0), whereby proportional error and constant error together sum to SyE. However, if the linear regression coefficient (R 2 ) is less than 0.99, a better indication of bias is by determining the mean and standard deviation of the differences between measurements from both methods. Then the total analytical error (TE) was determined by totaling CV b (RE, replication experiment) and bias (SyE, comparison of methods experiment) [24].
Results from the comparison of methods experiment were used to make decisions whether a food processor (i.e., the technician who fortifies the flour) would hypothetically reject or accept a batch of corn flour based on Mexico's current policy, in which batches fortified under 40 µg Fe/g flour would be rejected [26]. Hypothetical decisions were made using both AES and Nu3Px analyses. For each sample, results from AES analysis were considered the "true" result. If the Nu3Px analysis agreed with AES, it was considered a "true" accept or reject. If the Nu3Px analysis resulted in a different decision from AES, it was considered a "false" accept or reject. From these pass/reject matches or non-matches between tools, several qualitative performance parameters were assessed, including falsepositive rate, false-negative rate, sensitivity, and specificity [27].
False-positive rate or α, a type I error, [28] refers to the probability that a batch is rejected by Nu3Px, but that has been accepted by AES (Equation (1)): in which fp refers to the number of false-positive test samples and tn refers to the true known number of negative test samples. False-negative rate, also known as β, or a type II error, ref. [28] refers to the probability that a batch is accepted by Nu3Px, but that has been rejected by AES (Equation (2)): in which fn refers to the number of false-negative samples and tp refers to the true known number of positive samples. Sensitivity, or an assay's power, [28] refers to the probability that a batch is accepted by Nu3Px and also by AES (Equation (3)): Finally, specificity refers to the probability that a batch is rejected by Nu3Px and also by AES (Equation (4)): Ideally, an assay will have high sensitivity and specificity, with low false-negative and false-positive rates, meaning the assay allows to make decisions that are close to those made using the gold reference method. If the assay were perfect, the sensitivity and specificity would be 100%, and the false-negative rate and false-positive rates would be 0%. However, usually, an increase in sensitivity will imply a decrease in specificity, and vice versa, as they are inversely proportional [29].

Development of an ASSURED-Designed Sampling Preparation Kit
To overcome the challenges outlined in the introduction, a new sample preparation kit was developed and tested. A user-centered design was employed to ensure a simple and intuitive kit that can be used by untrained personnel. The purpose of user-centered design is to create a product that will be used in practice as it is intended to be used, requiring minimum effort by the end user [30]. Applying a user-centered design framework ensures that the sample preparation kit also complies with the ASSURED framework, particularly ensuring the kit is user friendly.
(1) Technological research. The initial technological research was conducted by Waller et al. [17], in which it was identified that a user-friendly, affordable sample preparation kit was needed to comply with the ASSURED criteria of the assay.
(2) Initial feature requirements. Initial feature requirements were identified by the three key action components of the sample preparation: sample deposition, sample weight, and sample dilution.
(3) Prototype testing. In this case, a simple and inexpensive eyedropper, glass Pasteur pipette, or plastic Pasteur pipette were tested to replace the micropipette (i.e., to deposit the sample on the paper); a 1 /2 tablespoon scoop was tested to replace the analytical balance (i.e., to weigh the food sample); and a conical tube marked with a line that indicates a specific volume was tested to replace the volumetric pipette (i.e., for sample dilution) ( Figure 1). Vortexing the sample was replaced by vigorously shaking for 10 s. Each tool in the sample preparation kit was tested for its internal analytical error by weighing water (eyedropper, Pasteur pipettes, and conical tube) and corn flour (sample scoop) on an analytical balance with five replications to estimate the expected increase in random error using the sample preparation kit compared to the analytical laboratory tools.
(4) Pilot implementation. The kit's total CVw% variability was compared to the original method's total CVw% (15.9%) by measuring one Mexican corn flour sample × 16 replicates to assess variability in precision (i.e., the closeness between multiple measurements).
(5) Pilot field test. Commercial samples collected in Mexico were analyzed using the sample preparation kit and the paper-based assay and compared to its reference values (obtained by AES), and the values obtained using the more precise laboratory tools. For iron determination, 1 scoop (2.5 g) was used to take an amount of sample and placed it into the volumetric test tube marked at the 40 mL line. Then, acidified solution (0.25 M HCl) was added until reaching the 40 mL mark. Samples were shaken for 10 s and let settle for 30 min. An aliquot of the supernatant was taken with a Pasteur pipette, and a drop was deposited on the paper-based sensor. The color was let develop for 5 min and then measured using a smartphone with the Nu3Px app as described before [17].

Statistical Analysis
Data were statistically analyzed using IBM SPSS 24 [32] and Microsoft Excel, including means, standard deviations, confidence intervals, % coefficient of variation (CV%), Pearson coefficient (r), determination coefficient (R 2 ), bivariate correlations, total analytical error, paired sample t-test, Levene's test for homogeneity of variances (F-test), McNemar test, and chi-square test (χ 2 ). The degree of agreement was analyzed using bivariate correlations (p < 0.05) on method comparison plots. Bland-Altman plots were constructed, in which the reference method (AES) was plotted on the x-axis, and the difference between the novel method (paper-based) and the AES was plotted on the y-axis. The majority of data points should be within 1σ (68%), with acceptable methods having almost all data points within 1.96σ (95%). The σ of the Bland-Altman plots are known as upper (4) Pilot implementation. The kit's total CV w % variability was compared to the original method's total CV w % (15.9%) by measuring one Mexican corn flour sample × 16 replicates to assess variability in precision (i.e., the closeness between multiple measurements).
(5) Pilot field test. Commercial samples collected in Mexico were analyzed using the sample preparation kit and the paper-based assay and compared to its reference values (obtained by AES), and the values obtained using the more precise laboratory tools. For iron determination, 1 scoop (2.5 g) was used to take an amount of sample and placed it into the volumetric test tube marked at the 40 mL line. Then, acidified solution (0.25 M HCl) was added until reaching the 40 mL mark. Samples were shaken for 10 s and let settle for 30 min. An aliquot of the supernatant was taken with a Pasteur pipette, and a drop was deposited on the paper-based sensor. The color was let develop for 5 min and then measured using a smartphone with the Nu3Px app as described before [17].

Statistical Analysis
Data were statistically analyzed using IBM SPSS 24 [32] and Microsoft Excel, including means, standard deviations, confidence intervals, % coefficient of variation (CV%), Pearson coefficient (r), determination coefficient (R 2 ), bivariate correlations, total analytical error, paired sample t-test, Levene's test for homogeneity of variances (F-test), McNemar test, and chi-square test (χ 2 ). The degree of agreement was analyzed using bivariate correlations (p < 0.05) on method comparison plots. Bland-Altman plots were constructed, in which the reference method (AES) was plotted on the x-axis, and the difference between the novel method (paper-based) and the AES was plotted on the y-axis. The majority of data points should be within 1σ (68%), with acceptable methods having almost all data points within 1.96σ (95%). The σ of the Bland-Altman plots are known as upper and lower limits, which is the σ of the differences, plotted +/− from the bias (mean of the differences) [33]. Data points outside 1.96σ are not ideal but may be considered outliers if proven to be the case.

Results
A schematic of the sample preparation and readout is presented in Figure 2. This figure shows the sample collection and steps before adding a drop of sample on the paper-based sensor. The sample is then read and measured using the smartphone and the app as described before [17].

Results
A schematic of the sample preparation and readout is presented in Figure 2. This figure shows the sample collection and steps before adding a drop of sample on the paperbased sensor. The sample is then read and measured using the smartphone and the app as described before [17]. . Schematic indicating the sample collection, preparation, and deposition on the paperbased sensor using simple tools as well as its detection and readout as shown previously [17]. Table 1 characterizes the mineral contents of nixtamalized corn flours collected at various markets in Mexico. For completeness, a total of 13 minerals were analyzed. All samples were collected in Querétaro, except for sample #5 (Cuetzalan, Mexico) and samples #7 and #18 (Saltillo, Mexico). Samples #19-29 were produced locally in Querétaro. Samples #1-18 were produced in other parts of the country.

Characterization of Mexican Corn Flours
The nixtamalization process used in the processing of these corn flours is very well understood and described in detail in other studies [34][35][36].
Samples #19-21 were collected over a 3 h period from the same batch of fortified corn flour. The CV% within batch in the iron content was 23%. Samples #22-27 were collected over a 6.5 h period from the same batch, and the CV% of iron content was 38.8%. These CV%s are higher than the previously reported variability of iron-fortified corn flour of Figure 2. Schematic indicating the sample collection, preparation, and deposition on the paper-based sensor using simple tools as well as its detection and readout as shown previously [17]. Table 1 characterizes the mineral contents of nixtamalized corn flours collected at various markets in Mexico. For completeness, a total of 13 minerals were analyzed. All samples were collected in Querétaro, except for sample #5 (Cuetzalan, Mexico) and samples #7 and #18 (Saltillo, Mexico). Samples #19-29 were produced locally in Querétaro. Samples #1-18 were produced in other parts of the country.

Characterization of Mexican Corn Flours
The nixtamalization process used in the processing of these corn flours is very well understood and described in detail in other studies [34][35][36].
Samples #19-21 were collected over a 3 h period from the same batch of fortified corn flour. The CV% within batch in the iron content was 23%. Samples #22-27 were collected over a 6.5 h period from the same batch, and the CV% of iron content was 38.8%. These CV%s are higher than the previously reported variability of iron-fortified corn flour of 15% [37] and highlight the variability of the industrial fortification mixing process within batch.
The company with the largest number of samples collected is company B (n = 14 samples). Of the 14 samples, the CV% of iron is 31.0%. This high variability emphasizes the randomness of the fortification process within company.
Under the Norma Official Mexicana (Mexico's food standards) NOM-247-SSA1-2008, all corn flours marketed and distributed for consumption should be fortified with a minimum of 40 µg Fe/g flour (as ferrous sulfate or fumarate) [26]. Under this standard, there is no established maximum or upper limit for iron content in fortified flours. Based on the AES and paper-based sensor results, all but 2 (#17 and #22) of the 29 flours met this minimum requirement.

Replication Experiment
A between-day replication experiment was conducted over 2 weeks as suggested by Westgard [24]. Results can be found in Table 2. The average estimated random error was calculated to be 12% variation.

Comparison of Methods Experiment
A minimum of 40 samples is necessary to conduct the comparison of methods experiment. These were collected from local markets (n = 25) and fortified in laboratory (n = 20) and analyzed using both reference and new methods of measurement. The comparison plot (Figure 3) shows the error and variation associated with both measurements. If the two methods were identical, the linear regression line would have a slope of 1 and a y-intercept of 0. When sources of systematic error are present, the linear regression line is used to determine systematic error via the slope's digression from 1 (proportional error) and the y-intercept's digression from 0 (constant error).
plot (Figure 3) shows the error and variation associated with both measurements. If the two methods were identical, the linear regression line would have a slope of 1 and a yintercept of 0. When sources of systematic error are present, the linear regression line is used to determine systematic error via the slope's digression from 1 (proportional error) and the y-intercept's digression from 0 (constant error).
The current sampling procedure uses 2.5 g of flour and 10 mL of acid. However, this sampling procedure was designed to not exceed a maximum iron concentration of 115 μg Fe/g flour. Seven of the samples have iron concentrations higher than 115 μg Fe/g flour. Following the recommended sampling preparation procedure as is, the methods comparison plot demonstrated large systematic error (y = 0.8085x + 11.835, R 2 = 0.83). Because the R 2 value of 0.83 is under 0.99, a better determination of systematic error is via the determination of bias (mean of differences). The bias (mean ± SD) was calculated to be −0.12 ± 14.1 μg Fe/g flour. Due to the low R 2 value, it was apparent that overestimation at higher concentrations (115 μg Fe/g flour) due to the color saturation of the assay was skewing the linear regression line. Thus, in response, the samples over 115 μg Fe/g flour were diluted using a larger volume (i.e., 25 mL dilute acid instead of 10 mL) and reanalyzed using this dilution factor to modify the output. By doing so, the overall linear regression equation (y = 0.97x + 3.84; R 2 = 0.92) and the average bias (1.79 ± 9.99 μg Fe/g flour) indicated less systematic error. This demonstrates that dilution for higher concentration samples is a feasible modification to maintain a lower systematic error ( Figure 3A).
The comparison of methods data can be transformed to fit a Bland-Altman plot (Figure 3B), which displays the variability of the method by plotting the reference method on the x-axis and the difference between the two methods on the y-axis. In addition, plotted are ±1.96σ and ±1σ of the differences (σ = 9.99 μg Fe/g flour). The majority of the data The current sampling procedure uses 2.5 g of flour and 10 mL of acid. However, this sampling procedure was designed to not exceed a maximum iron concentration of 115 µg Fe/g flour. Seven of the samples have iron concentrations higher than 115 µg Fe/g flour. Following the recommended sampling preparation procedure as is, the methods comparison plot demonstrated large systematic error (y = 0.8085x + 11.835, R 2 = 0.83). Because the R 2 value of 0.83 is under 0.99, a better determination of systematic error is via the determination of bias (mean of differences). The bias (mean ± SD) was calculated to be −0.12 ± 14.1 µg Fe/g flour.
Due to the low R 2 value, it was apparent that overestimation at higher concentrations (115 µg Fe/g flour) due to the color saturation of the assay was skewing the linear regression line. Thus, in response, the samples over 115 µg Fe/g flour were diluted using a larger volume (i.e., 25 mL dilute acid instead of 10 mL) and reanalyzed using this dilution factor to modify the output. By doing so, the overall linear regression equation (y = 0.97x + 3.84; R 2 = 0.92) and the average bias (1.79 ± 9.99 µg Fe/g flour) indicated less systematic error. This demonstrates that dilution for higher concentration samples is a feasible modification to maintain a lower systematic error ( Figure 3A).
The comparison of methods data can be transformed to fit a Bland-Altman plot ( Figure 3B), which displays the variability of the method by plotting the reference method on the x-axis and the difference between the two methods on the y-axis. In addition, plotted are ±1.96σ and ±1σ of the differences (σ = 9.99 µg Fe/g flour). The majority of the data points (68%) should fit within ±1σ, and almost all of the data points (95%) should lie within ±1.96σ [33].
A paired sample t-test was conducted to understand the similarity between the two methods' true mean values. The null hypothesis was that the true mean difference between the methods is equal to 0. Based on the data, we failed to reject the null hypothesis (p > 0.05). An F-test using Levene's test was used to evaluate the homogeneity of variance. The null hypothesis was that the variances were similar. We failed to reject the null hypothesis (p > 0.05).
A contingency table (Table 3) was constructed comparing the method comparison data to Mexico's fortification policy (<40 µg Fe/g flour, reject; ≥40 µg Fe/g flour, pass). From this table, qualitative performance parameters false-positive rate (21.4%), false-negative rate (16.1%), sensitivity (83.9%, CI 95% : 70.9-96.8%), and specificity (78.6%, CI 95% : 57.1-100.0%) were calculated. When validating quantitative methods that determine qualitative decisions, the AOAC International recommends chi-square and McNemar tests to assess differences between methods [38]. A chi-square test was performed to compare the classifications of the two methods. The null hypothesis was that Nu3Px and AES made classifications independent from each other. We rejected the null hypothesis that the two classifications are independent of one another (Pearson χ 2 = 16.411, p < 0.01). For the McNemar test, the null hypothesis was that the acceptance and rejection percentages are equal between the two methods. We failed to reject the null hypothesis and conclude that the two proportions were not statistically different, p = 0.727 (two-sided).

Development of a Sample Preparation Kit
Prototype testing. The sample preparation kit consisted of a simple tool to deposit the sample (eyedropper, plastic Pasteur pipette, or glass Pasteur pipette), a tube for sample dilution, and 1 /2 tablespoon scoop for sample weight. The variability of the sample preparation kit tools was determined by measuring water or flour, weighing the amount measured, and calculating the CV% of several replicates (n = 5). The eyedropper was found to have a CV% of 7.24%, the plastic Pasteur pipette was found to have a CV% of 2.75%, and the glass Pasteur pipette was found to have a CV% of 6.48%, indicating a plastic Pasteur pipette as the most reliable tool for sample deposition. For sample dilution, the conical tube was found to have a CV% of 1.03%. For sample weight, the scoop was found to deliver a mean weight of 2.55 g and CV% of 3.02% (Table 4). In total, the use of the ASSURED sample preparation kit is expected to increase random error as compared to using the laboratory analytical tools, which inherently possess less random error due to their design (microliter pipette 0.0% CV, volumetric pipette 0.550% CV, and analytical balance 0.006% CV).
Pilot implementation. The initial precision of the sample prep kit was determined by measuring one Mexican corn flour sample (n = 16 replicates) and calculating its mean, % difference from the true mean, standard deviation, and CV%. Using the sample prep kit (with eyedropper) and the smartphone app, the mean of the sample was found to be 52.52 µg Fe/g corn flour. The AES true value was 50.4 µg Fe/g corn flour, for a % mean difference of 4.12%. Between the 16 replicates, the standard deviation was 7.44 µg Fe/g corn flour, with a CV w % of 14.17%. These findings warranted further exploration of the precision of the sample prep kit.
Pilot field testing. On average, the plastic Pasteur pipette deposited 33.34 ± 0.92 µL of supernatant (n = 5 replicates) or approximately 6.7 times the amount of supernatant that is deposited using a 5 µL conventional pipette. Because more iron is being deposited onto the detection zone, a stronger response was detected using the dilution procedure as is. Additional diluting volumes (i.e., 20, 40, and 70 mL) were tested; the final diluting volume of 40 mL showed the most accurate results. Therefore, the dilution procedure was modified to 2.5 g flour in 40 mL of 0.25 M HCl, and this dilution factor was applied to the output. Twenty-five commercial samples were tested using the sample prep kit (plastic Pasteur pipette, conical tube with a line to 40 mL, and 1 /2 tablespoon scoop) and a dilution modification as indicated before (Figure 1). The results using the sample kit significantly correlated to the output using the laboratory precise tools (bivariate correlation r = 0.914, p < 0.01) as well as the AES reference output (bivariate correlation r = 0.952, p < 0.01). After a paired t-test, we failed to reject the null hypothesis that the true mean differences between the laboratory precise tools and the sample prep kit are different from 0 (p > 0.05).

Discussion
The purpose of this study was to validate (i.e., quantify total error) a paper-based, smartphone-assisted assay for the determination of iron in fortified flours, also known as Nu3Px [17], using commercial fortified nixtamalized corn flour samples collected from several companies and collection points in Mexico. Additionally, these samples' mineral profiles were characterized and can be compared to the Mexican food standards (NOM) compliance [26]. A sample prep kit that aligns with the WHO's ASSURED guidelines was developed and pilot tested by comparing its error performance parameters to that of Nu3Px using conventional laboratory tools. It was found that Nu3Px performed within acceptable error parameters: 12% random error and 1.79 ± 9.99 µg Fe/g flour systematic error. Using both the gold standard method of analysis and Nu3Px results in a similar classification of samples under Mexican regulations. Though most of the corn flours collected complied with current regulations, these samples failed to comply with theoretical fortification parameters recommended by experts for upper limits.
While new methods for clinical diagnostics have published acceptable error ranges (i.e., blood iron, TE < 20% [24]), conventional food matrices (i.e., not food formulated for specific medical needs such as infant formula) often do not have well documented acceptable error ranges. This is largely due to the different effects of misdiagnosis. For example, while false-positive or -negative results from an assay responsible for determining a clinical diagnosis (i.e., HIV or pregnancy) directly affect people's lives, product adulteration and misbranding, in the case that the food has less nutrient addition than specified by the law, will result in economic impacts on the food company, which may include product seizures, imprisonment, or fines [39]. The economic impacts of food piracy have been estimated to account for USD $200 billion in the industry [40].
Allen et al. argue that fortification policy enforcement (quality control) relies on accurate, precise, and reproducible methodologies [37]. Their recommendations state that monitoring technologies should be able to measure the micronutrient content such that it is known whether a sample meets the target fortification level (TFL) and is within the minimum fortification level (minFL) and the maximum fortification level (maxFL). In the case of iron, minFL and maxFL are equivalent to the legal minimum level (LminL) and the maximum tolerable level (maxTL), respectively. The LminL and maxTL are used for policy enforcement and any applied retribution (i.e., fines if the product is found to have iron content outside of the limits) for not meeting compliance, specific to each country's policies. Equations to estimate LminL and maxTL (µg Fe/g flour) are shown below (Equations (5) and (6)): For iron fortification in nixtamalized corn flour, the Fe CV% is 15% [37]. Based on the Mexican policy guidelines [26], the TFL is 40 mg Fe/kg. Then, the calculated LminL and maxTL based on Mexico's known TFL is 28 and 52 µg Fe/g flour, respectively. Thus, it is necessary that the validated paper assay will have a limit of detection below the LminL (28 µg Fe/g flour) and a maximum in the working range above the maxTL (52 µg Fe/g flour) for it to be effectively used to assess compliance. Furthermore, bias (systematic error; mean of the differences between reference and novel methods) should be kept to under 6 µg Fe/g flour (i.e., 25% of the range from LminL to maxTL) for each parameter to be distinguishable from the others. The bias (systematic error, mean ± standard deviation) of the paper-based sensor is 1.79 ± 9.99 µg Fe/g flour, which complies with the allowable mean systematic error.
Random error. Regarding allowable random error for CV b %, performance targets to meet can be determined by comparing CV% performance of similar paper-based assays in the literature. Mentele et al. measured iron in aerosols on a paper-based assay and a computer scanner with a CV% = 26.1% [21]. Martinez et al. measured glucose and protein in urine on a paper-based assay with a camera phone with CV% between 15.5-21.7% and 16.6-29.2%, respectively [20]. Thus, a reasonable performance target to meet is CV b ≤ 25%, which is an improvement to other paper-based assays, such as that of Mentele et al. [21]. Consequently, the CV b % of 12.0% complies with the allowable random error.
Total error. The final performance parameters are demonstrated in Table 5, with a 12% random error and a bias (systematic error; difference of means) of 1.79 ± 9.99 µg Fe/g flour (Table 5). Compliance with current and theoretical regulations. Table 6 shows the paired results for both AES and Nu3Px for each sample and classifies each sample whether it was within or outside of the theoretical policy's allowable range according to Allen et al. (28 to 52 µg Fe/g flour) and whether each sample was within or outside of Mexico's policy allowable range (≥40 µg Fe/g flour) [37]. Based on Mexico's current policy [26], Nu3Px agreed 100% of the time with the classification of Mexican samples (n = 25) based on AES results. However, according to Allen et al.'s theoretical policy limits using calculated LminL and maxTL, Nu3Px would have provided false positives 24% of the time [37]. As Mexico's policy currently stands, Nu3Px is a ready and applicable monitoring and evaluation tool for compliance, as 100% of the time, Nu3Px agreed with the gold reference method whether to reject the batch or not. However, if Mexico modifies its policy in the future to align with fortification policy-making experts, further research will need to be conducted to reduce the prevalence of false positives and negatives in the paper-based assay.
Other Central American countries that have corn flour fortification policies have adopted similar policies to Mexico's, with only minimum fortification levels and no upper limits. Costa Rica's policy requires a minimum of 22 mg Fe/kg flour [41], El Salvador's policy requires a minimum of 40 mg Fe/kg flour [42], and Guatemala's minimum requirement is 17 mg Fe/kg flour [41]. Thus, Nu3Px is expected to perform well under other Central American regulatory policies as well.
Qualitative performance parameters of Nu3Px. When quantitative assays are used to make qualitative decisions (i.e., binary decisions such as yes/no or reject/pass), several performance parameters are recommended to be calculated, such as the false-positive rate, false-negative rate, and sensitivity [27,38]. When comparing all 45 samples to Mexico's current fortification policy as written in the NOM, Nu3Px showed a false-positive rate of 21.4%, a false-negative rate of 16.1%, sensitivity of 83.9%, and specificity of 78.6%. The samples that were closer to the cut-off point (40 µg Fe/g flour) were more likely to show disagreement with the AES due to the paper-based method's inherent random error.
While there are no published allowances of specificity and sensitivity for new paperbased assays for micronutrients, we can compare Nu3Px's performance to other published rapid, qualitative decision-making tools. iCheck Iodine, a rapid field test for checking iodine content in salt, reported a sensitivity of 92.4% and a specificity of 100% [43]. A similar detection method, the iCheck Iron, uses the bathophenantrolin reagent that reacts with reduced iron creating a deep magenta. The absorbance is then quantified with a photometer. This method has been validated using iron-fortified fish and soy sauces; however, evaluations of sensitivity and specificity were not included [44]. A paper-based assay for screening sickle cell anemia (yes/no presence) reported a sensitivity and specificity of 93% and 94%, respectively [45]. A microchip assay coupled with a smartphone to detect semen count above and below the WHO threshold (100,000 sperm/mL) was found to have a sensitivity of 92.86% and a specificity of 100% [46]. Though Nu3Px's performance parameters are lower than these examples, it is important to note that none of the above-mentioned examples use messy food samples, a smartphone, simple tools for samples collection, and a paper-based assay for detection, all of which provide great challenges to overcome. Though there is still work to be done in improving the performance of Nu3Px, Mexico's fortification program can still benefit from using this rapid monitoring tool, as it is more precise than the iron spot test (i.e., iron and potassium thiocyanate reaction) [47], which is the current internal monitoring method.
Sample characterization. Based on AES, two of the samples (#17 and 22) were lower than the minimum level per Mexico's regulation (40 µg Fe/g flour), and 22 of the 25 samples contained amounts of iron greater than the theoretical maxTL of 52 µg Fe/g flour [37]. Low iron concentrations threaten the program's ability for positive nutritional impact. On the other hand, high iron concentrations can have negative consequences for human health. It is known that when consumed in toxic quantities, iron accumulates in organs such as the liver, spleen, or kidneys [48]. For these reasons, Allen et al. recommend the addition of upper levels to fortification programs [37]. As demonstrated, Nu3Px can serve as an internal quality control checkpoint for food processors to monitor the levels of iron in the fortified flour, with a particular focus on ensuring toxic upper levels of iron are not met, as was the case in 72% of the samples collected, with iron levels reaching 3 times the target amount.
Sample preparation kit. The sample prep kit was piloted with measuring tools commonly available and affordable around the world. Plastic pipettes or eye droppers can be found at most pharmacies, 1 /2 tablespoon scoop used in home cooking can be found in most kitchenware stores or simply manufactured, and a conical tube with a line drawn to 10 or 40 mL, depending on the diluting volume to be used, is an inexpensive alternative that can easily be manufactured.
In the case of the sample prep kit, the bias (systematic error, mean of the differences) was +2.12 µg Fe/g flour, or a % mean difference of 4.12%. The results from measuring commercial samples in Mexico were comparably similar to the Nu3Px output using laboratory precise tools. This meets the performance requirements to be used in the Mexican corn flour fortification program.
There are several limitations associated with the method validation experiment and the development of a sample preparation kit. First, the method validation experiment conducted herein obliges by the minimum sample requirement (n = 40); however, a larger sample size is always preferred. The method has not been validated across individuals or laboratories, i.e., an inter-laboratory validation. This is the preferred form of method validation, though an expensive process. In cases where expenses are limited, intralaboratory method validation is suggested [49]. Nu3Px showed a 24% false-positive rate when comparing the theoretical policy limits. If countries such as Mexico are to modify their policies to the recommended policy limits, further research is warranted to reduce the prevalence of false-positive to implement Nu3Px.
Due to the method's logarithmic calibration curve, at high concentrations (i.e., above 115 µg Fe/g flour), the results are variable. This can pose an issue for an end-user food company if their flours tend to fall to the higher concentration values, as is the case for Mexico. Sample dilution is the best strategy, though it has to be corrected in the final calculation, as was demonstrated.
The sample preparation kit's design assumes that the end user has access to an eyedropper, a conical tube with a volumetric marking, and a tablespoon scoop. Secondly, the error quantification is specific to the tools used and is presented as an example case study. These experiments would need to be repeated upon finalizing the sample preparation kit design and manufacturing.

Conclusions
A sample preparation kit was developed that aligns with the WHO's ASSURED criteria (i.e., Affordable, User friendly, and Equipment-free), which has similar precision to using analytical methods but at a lower cost and greater access. The sample preparation kit, coupled with the smartphone app and paper-based assay, measures iron within the performance parameters required for the application to corn flour fortification programs, such as the case in Mexico. A validated ASSURED-designed technology can be useful for monitoring fortified staple foods, specifically ensuring that the flours meet government specifications.