Domain Adaptation for In-Line Allergen Classification of Agri-Food Powders Using Near-Infrared Spectroscopy

The addition of incorrect agri-food powders to a production line due to human error is a large safety concern in food and drink manufacturing, owing to incorporation of allergens in the final product. This work combines near-infrared spectroscopy with machine-learning models for early detection of this problem. Specifically, domain adaptation is used to transfer models from spectra acquired under stationary conditions to moving samples, thereby minimizing the volume of labelled data required to collect on a production line. Two deep-learning domain-adaptation methodologies are used: domain-adversarial neural networks and semisupervised generative adversarial neural networks. Overall, accuracy of up to 96.0% was achieved using no labelled data from the target domain moving spectra, and up to 99.68% was achieved when incorporating a single labelled data instance for each material into model training. Using both domain-adaptation methodologies together achieved the highest prediction accuracies on average, as did combining measurements from two near-infrared spectroscopy sensors with different wavelength ranges. Ensemble methods were used to further increase model accuracy and provide quantification of model uncertainty, and a feature-permutation method was used for global interpretability of the models.


Introduction
Powdered agri-food materials are widely used in food and drink manufacturing due to their long shelf life, small volume, and ability to be easily added during processes [1]. Examples of agri-food powders include flour, coffee, dairy powders, nutritional supplements, and flavorings [1][2][3]. One problem encountered during food production is human error causing the wrong material to be added to a conveyor line. A 2019 annual report from the UK Food Standards Agency (FSA) stated that production errors, including formulation and labelling errors, accounted for 18% of product allergen incidents during the preceding year [4]. It is estimated that up to 10% of people have a food allergy in Western countries and the number is increasing [5]. Therefore, these errors can lead to product rework or waste if the final composition analysis identifies that the food contains the wrong ingredients. Furthermore, the product may have to be recalled if it has already been transported from the factory. Early detection of incorrect materials in the production line would reduce food waste and improve productivity, economics, and sustainability of agri-food systems.
Near-infrared (NIR) spectroscopy measures the absorbance of near-infrared light, which is dependent on the composition of a material. NIR is now a leading analytical tool for evaluation of food safety, authenticity, and material properties, owing to its advantages of not requiring sample preparation, being nondestructive and noninvasive, and producing real-time measurements [6]. It is also a suitable sensing technique for in-line applications for flowing solid and liquid samples derived from animal or plant sources [7,8]. NIR has been widely used to monitor properties of food powders, such as detecting adulteration [9] Table 1. Food powder materials measured using near-infrared spectroscopy. In total, 19 food powder materials were measured.

Allergen Materials
Gluten Spelt, rye, buckwheat, oat, barley, brown, wheat (three brands), and wheat gluten flours Gluten-free Gluten-free white (three brands), coconut, tapioca, corn, and rice flours Peanut Peanut and peanut butter powders Tree nut Almond flour (three brands) Egg Whole egg, egg yolk, and egg white powders

Near-Infrared Spectroscopy Measurement
All materials were measured using two NIR sensors: NIRONE S2.0 (wavelength range: 1550 to 1950 nm, 1 nm resolution) and S2.5 (wavelength range: 2000 to 2450 nm, 1 nm resolution) (Spectral Engines, Oulu, Finland). Before acquisition of each spectrum, a white reference spectrum at 90% light intensity and a dark background spectrum at zero light intensity were collected. The distance between the NIR sensors and the sample surface was maintained at 2 cm for all measurements. Samples weighing approximately 100 g were placed in a 12 cm-diameter glass petri dish and compressed to obtain a flat surface ( Figure 1). The samples were measured under static and motion conditions at three speeds: 0.017, 0.036, and 0.068 ms −1 (designated as slow, medium, and fast throughout, respectively). For the static measurements, between each spectrum acquisition, the petri dish was rotated so that spectra were collected from all areas of the sample. For the measurements under motion conditions, the petri dishes were placed on a rotational sample holder that had variable speed settings. Five samples were measured from each material and ten spectra were acquired from each sample, producing a total of 50 spectra for each material and for each sensor. For the materials sourced from three brands (see Table 1), 3 × 50 spectra for each material and each sensor were collected. Therefore, in total, 1250 spectra were collected using each sensor at each speed (stationary, slow, medium, and fast).

Allergen Materials Gluten
Spelt, rye, buckwheat, oat, barley, brown, wheat (three brands), and wheat gluten flours Gluten-free Gluten-free white (three brands), coconut, tapioca, corn, and rice flours Peanut Peanut and peanut butter powders Tree nut Almond flour (three brands) Egg Whole egg, egg yolk, and egg white powders

Near-Infrared Spectroscopy Measurement
All materials were measured using two NIR sensors: NIRONE S2.0 (wavelength range: 1550 to 1950 nm, 1 nm resolution) and S2.5 (wavelength range: 2000 to 2450 nm, 1 nm resolution) (Spectral Engines, Oulu, Finland). Before acquisition of each spectrum, a white reference spectrum at 90% light intensity and a dark background spectrum at zero light intensity were collected. The distance between the NIR sensors and the sample surface was maintained at 2 cm for all measurements. Samples weighing approximately 100 g were placed in a 12 cm-diameter glass petri dish and compressed to obtain a flat surface ( Figure 1). The samples were measured under static and motion conditions at three speeds: 0.017, 0.036, and 0.068 ms −1 (designated as slow, medium, and fast throughout, respectively). For the static measurements, between each spectrum acquisition, the petri dish was rotated so that spectra were collected from all areas of the sample. For the measurements under motion conditions, the petri dishes were placed on a rotational sample holder that had variable speed settings. Five samples were measured from each material and ten spectra were acquired from each sample, producing a total of 50 spectra for each material and for each sensor. For the materials sourced from three brands (see Table 1), 3 × 50 spectra for each material and each sensor were collected. Therefore, in total, 1250 spectra were collected using each sensor at each speed (stationary, slow, medium, and fast). The distance between the NIR sensors and the sample surface was 2 cm. Samples weighing 100 g were measured in a 12 cm-diameter petri dish. The samples were measured under stationary conditions and at three speeds (slow-0.017 ms −1 , medium-0.036 ms −1 , fast-0.068 ms −1 ) using the rotating stage.

Domain Adaptation
ML models correlate inputs with outputs (also called labelled data) during training. When the distribution of the input data changes, erroneous predictions can be obtained. Domain adaptation is a category of ML technique that alters how a model is trained or predicts to enable it to accurately predict on data outside its training set distribution. In this work, the aim is to transfer ML models trained on NIR spectra acquired under The distance between the NIR sensors and the sample surface was 2 cm. Samples weighing 100 g were measured in a 12 cm-diameter petri dish. The samples were measured under stationary conditions and at three speeds (slow-0.017 ms −1 , medium-0.036 ms −1 , fast-0.068 ms −1 ) using the rotating stage.

Domain Adaptation
ML models correlate inputs with outputs (also called labelled data) during training. When the distribution of the input data changes, erroneous predictions can be obtained. Domain adaptation is a category of ML technique that alters how a model is trained or predicts to enable it to accurately predict on data outside its training set distribution. In this work, the aim is to transfer ML models trained on NIR spectra acquired under stationary conditions to accurately predict under moving conditions with none or few labelled data. This would enable NIR and ML combination deployment in industrial environments by either negating or reducing the data to collect under moving conditions and thereby minimizing disruption to a manufacturing process. Two domain-adaptation techniques were investigated: DANNs and SGANs. Both methods were investigated individually and combined. These models were compared with transfer learning using no domain adaptation. The methods were investigated using no labelled data from the target domain or using a single labelled instance from each material category in the target domain.

Domain-Adversarial Neural Networks
DANNs are trained using labelled data from the source domain (or also from the target domain) and unlabeled data from the target domain ( Figure 2). The aim is to extract features discriminative as to the class of allergen but nondiscriminative as to which domain the data are taken from. This is achieved through training the classifier module whilst simultaneously confusing the discriminator module as to whether the input data are from the source or target domain [33]. The network was trained on three losses sequentially. Firstly, the feature extractor and classifier are trained to accurately classify the labelled data. Secondly, the discriminator module is trained to accurately classify whether the labelled data are from the source or target domain. As it is desired that the feature extractor learns nondiscriminative features between the domains, it is trained on the negative inverse of this loss to encourage gradient ascent, i.e., if the discriminator is highly accurate, a large step is taken to move the feature-extractor weights away from local optima. If the discriminator has low accuracy, a smaller gradient ascent step is taken with the feature-extractor weights. Thirdly, in a similar way to previously described, the discriminator is then trained to determine whether the unlabeled data are from the target domain and the feature extractor is trained on the negative inverse of this loss. stationary conditions to accurately predict under moving conditions with none or few labelled data. This would enable NIR and ML combination deployment in industrial environments by either negating or reducing the data to collect under moving conditions and thereby minimizing disruption to a manufacturing process. Two domain-adaptation techniques were investigated: DANNs and SGANs. Both methods were investigated individually and combined. These models were compared with transfer learning using no domain adaptation. The methods were investigated using no labelled data from the target domain or using a single labelled instance from each material category in the target domain.

Domain-Adversarial Neural Networks
DANNs are trained using labelled data from the source domain (or also from the target domain) and unlabeled data from the target domain ( Figure 2). The aim is to extract features discriminative as to the class of allergen but nondiscriminative as to which domain the data are taken from. This is achieved through training the classifier module whilst simultaneously confusing the discriminator module as to whether the input data are from the source or target domain [33]. The network was trained on three losses sequentially. Firstly, the feature extractor and classifier are trained to accurately classify the labelled data. Secondly, the discriminator module is trained to accurately classify whether the labelled data are from the source or target domain. As it is desired that the feature extractor learns nondiscriminative features between the domains, it is trained on the negative inverse of this loss to encourage gradient ascent, i.e., if the discriminator is highly accurate, a large step is taken to move the feature-extractor weights away from local optima. If the discriminator has low accuracy, a smaller gradient ascent step is taken with the feature-extractor weights. Thirdly, in a similar way to previously described, the discriminator is then trained to determine whether the unlabeled data are from the target domain and the feature extractor is trained on the negative inverse of this loss.

Figure 2.
A diagram of the domain-adversarial neural network (DANN) structure and training procedure. The network was trained in three steps, iteratively: (1) the feature extractor and classifier were trained to classify the labelled data, (2) the discriminator module was trained to classify the domain of the labelled data and the feature extractor was trained to confuse the discriminator, (3) the discriminator module was trained to classify the domain of the unlabelled data and the feature extractor was trained to confuse the discriminator.

Semisupervised Generative Adversarial Neural Networks
SGANs, in conjunction with using labelled data for the main learning task, train a generator to generate fake data samples and a discriminator to classify whether each input data sample is real or fake [34]. In this work, the generator was trained to produce spectra similar to the target domain data. This encourages the feature extractor to learn discriminative features to classify between allergen types whilst also being discriminative to the Figure 2. A diagram of the domain-adversarial neural network (DANN) structure and training procedure. The network was trained in three steps, iteratively: (1) the feature extractor and classifier were trained to classify the labelled data, (2) the discriminator module was trained to classify the domain of the labelled data and the feature extractor was trained to confuse the discriminator, (3) the discriminator module was trained to classify the domain of the unlabelled data and the feature extractor was trained to confuse the discriminator.

Semisupervised Generative Adversarial Neural Networks
SGANs, in conjunction with using labelled data for the main learning task, train a generator to generate fake data samples and a discriminator to classify whether each input data sample is real or fake [34]. In this work, the generator was trained to produce spectra similar to the target domain data. This encourages the feature extractor to learn discriminative features to classify between allergen types whilst also being discriminative to the real target domain data compared to the generated samples. The SGAN was trained on three losses sequentially ( Figure 3). Firstly, the feature extractor and discriminator were trained to determine that the labelled data were real. Secondly, the feature extractor and discriminator were trained to determine that the generated samples were fake. The generator was trained on the negative inverse of this loss to encourage gradient ascent away from the local minima. Lastly, the feature extractor and discriminator were trained to predict that the unlabelled target domain data were real. real target domain data compared to the generated samples. The SGAN was trained on three losses sequentially ( Figure 3). Firstly, the feature extractor and discriminator were trained to determine that the labelled data were real. Secondly, the feature extractor and discriminator were trained to determine that the generated samples were fake. The generator was trained on the negative inverse of this loss to encourage gradient ascent away from the local minima. Lastly, the feature extractor and discriminator were trained to predict that the unlabelled target domain data were real.

Figure 3.
A diagram of the semisupervised generative adversarial network (SGAN) structure and training procedure. The network was trained in three steps, iteratively: (1) the feature extractor and discriminator were trained to classify that the labelled data were real, (2) the feature extractor and discriminator were trained to classify that the generated samples were fake and the generator was trained to confuse the discriminator, (3) the feature extractor and discriminator were trained to classify that the unlabelled target domain data were real.

Model Training
The models were trained using a batch size of 32, a learning rate of 0.001, and the Adam optimization algorithm for 10,000 epochs. The feature-extractor module consisted of four fully connected layers with 256, 128, 64, and 32 neurons, respectively. The classifier module consisted of a single logistic regression layer connecting the outputs of the last feature-extractor layer to the five allergen classes. The generator module consisted of five fully connected layers with 32, 64, 128, 256 neurons, and finally the size of the spectra being generated, respectively. Both discriminator modules for the DANN and SGAN networks consisted of a single logistic regression layer connecting the last layer of the feature extractor to the predicted classes: whether the input data were sampled from the source or target domain for the DANN discriminator, or whether the input data were real or generated for the SGAN discriminator. The hyperparameters were chosen by achieving 100% accuracy on the training set.
The models were trained using labelled data from all spectra acquired under static conditions, i.e., 50 stationary spectra for each material (or material brand). This totals 1250 spectra for each sensor used. Either none or a single labelled instance of each material category collected under moving conditions was also added to the training set. Half of the spectra (625 spectra per sensor, 25 spectra per material) for each material or material brand collected under moving conditions was used as an unlabelled dataset during training. The remaining moving condition spectra (625 spectra per sensor, 25 spectra per material) were used as the test set.

Trust in Machine Learning
Trust in model predictions is required to facilitate acceptance of ML models in manufacturing environments and for operators to make decisions based on their outputs. Three key components of trust are accuracy, uncertainty quantification, and (1) the feature extractor and discriminator were trained to classify that the labelled data were real, (2) the feature extractor and discriminator were trained to classify that the generated samples were fake and the generator was trained to confuse the discriminator, (3) the feature extractor and discriminator were trained to classify that the unlabelled target domain data were real.

Model Training
The models were trained using a batch size of 32, a learning rate of 0.001, and the Adam optimization algorithm for 10,000 epochs. The feature-extractor module consisted of four fully connected layers with 256, 128, 64, and 32 neurons, respectively. The classifier module consisted of a single logistic regression layer connecting the outputs of the last feature-extractor layer to the five allergen classes. The generator module consisted of five fully connected layers with 32, 64, 128, 256 neurons, and finally the size of the spectra being generated, respectively. Both discriminator modules for the DANN and SGAN networks consisted of a single logistic regression layer connecting the last layer of the feature extractor to the predicted classes: whether the input data were sampled from the source or target domain for the DANN discriminator, or whether the input data were real or generated for the SGAN discriminator. The hyperparameters were chosen by achieving 100% accuracy on the training set.
The models were trained using labelled data from all spectra acquired under static conditions, i.e., 50 stationary spectra for each material (or material brand). This totals 1250 spectra for each sensor used. Either none or a single labelled instance of each material category collected under moving conditions was also added to the training set. Half of the spectra (625 spectra per sensor, 25 spectra per material) for each material or material brand collected under moving conditions was used as an unlabelled dataset during training. The remaining moving condition spectra (625 spectra per sensor, 25 spectra per material) were used as the test set.

Trust in Machine Learning
Trust in model predictions is required to facilitate acceptance of ML models in manufacturing environments and for operators to make decisions based on their outputs. Three key components of trust are accuracy, uncertainty quantification, and interpretability. Accuracy is commonly reported and refers to the proportion of correct predictions an ML model makes. Uncertainty quantification requires an estimate of how confident the model is in its prediction. Uncertainty estimates can provide information as to whether a model's prediction should be trusted or whether further investigation is necessary. Interpretability necessitates information about how the model is deciding its predictions.
In this work, the prediction accuracy is reported as the percentage of correctly classified allergen categories for the test set acquired under moving conditions. For uncertainty quantification, an ensemble of five deep neural networks were trained, taking advantage of random initialization to produce models with different final weights. Ensemble learning techniques are one of the widest used uncertainty-quantification techniques [35]. Ensemble methods are usually used to combine predictions from multiple ML models to increase prediction accuracy, but they can also provide an uncertainty estimation by reporting the level of consensus among the individual models. To interpret the trained models, a permutationbased method was used to determine the most important wavelengths to the prediction made by the ensemble model. This is a global feature-importance method that randomly shuffles each feature (in this case, the intensity values of each wavelength) and measures the reduction in accuracy caused by this augmentation [36]. The augmented wavelengths that cause the largest reduction in accuracy are most important to the model's prediction.

Results
The model prediction-accuracy results are presented in Table 2. Overall, high accuracy were obtained through using none (up to 96%) or a single instance (up to 99.68%) of labelled data from each material under moving conditions. This demonstrates that the domain-adaptation methodology is suitable for transfer across stationary and moving samples. Therefore, the methodology has the potential to minimize disruption to a food manufacturing process by reducing the burden of collecting labelled data under moving conditions. These results were more accurate than using transfer learning alone (i.e., not using the DANN or SGAN domain-adaptation methodology), which achieved up to 92.96% with no labelled data or 96.8% using a single instance of labelled data from each material under moving conditions. This demonstrates that the domain-adaptation methodologies improved the extraction of features that were applicable to the spectra acquired under moving conditions. Table 2. The accuracies (%) for each domain-adaptation methodology on the test set acquired under moving conditions for each speed. A comparison between using both sensors and each sensor individually is provided. A comparison to using transfer learning (TL) with no domain-adaptation methodology is included. None or single in the labelled data instances row indicates whether no labelled data from the spectra acquired under moving conditions was used in the training set or if a single labelled instance from each material was included. The bolded results show the highest accuracies achieved for each task (i.e., combination of speed and number of labelled target domain data instances). In isolation, the DANN approach achieved higher accuracy than using the SGAN training for four out of six tasks and achieved higher accuracy than using both combined for two out of six tasks. However, the SGAN approach still achieved higher accuracy than the transfer learning approach for five out of six tasks, proving that it was useful in extracting discriminative target domain features. Using the DANN and SGAN approaches together achieved the highest prediction accuracy for four out of six tasks compared with using either method alone. This indicates that the domain nondiscriminative features learned by the DANN are enhanced by the target domain discriminative features extracted through the SGAN training.

Method
Interestingly, for models using no labelled data from the target domain, an increase in accuracy was found with increasing speed. Furthermore, for models using a single labelled instance from each material, the highest accuracy was achieved at the medium-speed setting. This is counterintuitive, as higher accuracy was expected at the lowest speeds due to the slower moving materials causing less distribution in intensity values (Figure 4) [15]. However, it is possible that at higher speeds, the larger intensity distribution enables the networks to better adapt to samples at the edges of the distribution and encourages learning of features based on spectra shape, rather than average intensity values. on the conveyor system to improve model accuracy. Table 2. The accuracies (%) for each domain-adaptation methodology on the test set acquired under moving conditions for each speed. A comparison between using both sensors and each sensor individually is provided. A comparison to using transfer learning (TL) with no domain-adaptation methodology is included. None or single in the labelled data instances row indicates whether no labelled data from the spectra acquired under moving conditions was used in the training set or if a single labelled instance from each material was included. The bolded results show the highest accuracies achieved for each task (i.e., combination of speed and number of labelled target domain data instances).   Using both sensors achieved the highest prediction accuracy, with the S2.0 sensor alone producing more accurate predictions than the S2.5 sensor. This indicates that the more important wavelength ranges for discriminating between materials are located in the S2.0 wavelength range, but for accurate monitoring of a conveyor line, installation of both sensors is optimal. Furthermore, using a single labelled instance for each material category under moving conditions improved model accuracy over using no labelled data from the target domain (e.g., from 95.04% to 99.36% using the SGAN and DANN methods combined for the slow speed). Therefore, provided the level of process disruption is acceptable, a small number of labelled samples under moving conditions should be collected on the conveyor system to improve model accuracy.

Uncertainty
To illustrate how these models could include uncertainty as a trust measure in an industrial setting, an ensemble of five neural networks was trained, providing a distribution of predictions owing to the random initialization of the network weights. This was undertaken using the combined DANN and SGAN model using no labelled samples in the target domain, to predict for samples under the fast-moving conditions and using data from both sensors. This sensing combination (SGAN and DANN, both sensors) was selected as it produced overall the highest model accuracy and this task combination (fast speed, no moving condition labelled data) was chosen as it represents the most difficult task. Overall, the ensemble of models achieved an accuracy of 96.96% (Table 3) on the moving condition samples compared with 95.04% achieved by a single model alone, as reported in Table 2. This illustrates the improved model accuracy achievable by ensemble methods whilst also providing a quantification of model confidence. The confidence scores and corresponding model performance are presented in Table 3. A confidence score of 0.6 indicates that three out of five models predicted that class, whereas a confidence score of 1.0 indicates that all models predicted the same class. Table 3. A summary of the accuracy and confidence of the ensemble model predictions on the test set data. The ensemble model consisted of five networks. As a case study, the networks were trained using both SGAN and DANN methods and both sensor measurements to predict for the fast speed with no labelled data from the target domain. A confidence score of 0.6 indicates that three out of five models predicted that class, whereas a confidence score of 1.0 indicates that all models predicted the same class. Notably, as the confidence score increases, the percentage of incorrectly classified samples with this confidence score decreases (31.8 to 1.4%, Table 3). This demonstrates that the ensemble method is less confident with incorrectly predicted materials. This validates the proposed method's efficacy in quantifying model uncertainty and subsequently enhancing model trust. In an industrial environment, a threshold value could be implemented based on the model's confidence in its prediction. Therefore, if in addition to predicting the material on the conveyor, the model also presents its confidence, production would only be stopped if the prediction confidence score were above this threshold. Importantly, there were eight samples where all the models incorrectly predicted the material. Therefore, it is likely that a greater number of networks in the ensemble is required. In an industrial environment during production, if it were found that confident incorrect predictions were being made, additional networks could be trained to form part of the ensemble.

Confidence
A detailed breakdown of the incorrectly identified samples is presented in Table 4. Gluten-free and peanut allergen categories produced the most false negatives, with a total of seven instances each. In particular, coconut flour produced five false-negative predictions with oat flour and peanut flour producing four each. All of these instances were predicted as gluten-containing materials. The gluten allergen category produced the most false-positive predictions, with a total of 15 instances. However, the most frequent high-confidence false positives were peanut-containing products being classified as glutencontaining. This suggests that rather than adding additional networks uniformly to all material classes, this could be tailored based on observed misclassifications. For example, a greater number of networks can be trained to only be used when a production line that is meant to contain peanut-based products is predicted as containing gluten. Another solution may be to implement higher confidence limits for these scenarios. Table 4. A breakdown of the incorrectly predicted allergen classes for the ensemble model. The ensemble model consisted of five networks. As a case study, the networks were trained using both SGAN and DANN methods and both sensor measurements to predict for the fast speed with no labelled data from the target domain.

Misclassified Materials Real Allergen Category Predicted Allergen Category Frequency (Confidence Scores)
Coconut flour Gluten-free

Interpretability
Important wavelength ranges were determined through permutation of individual wavelength values and by monitoring the reduction in prediction accuracy. From this, five wavelength ranges were identified (Table 5). Interestingly, the most important wavelength range (1785-1870 nm) did not correspond to a presence of powder components. However, the second (1686-1744 nm) and third (1931-1951 nm) most important wavelength ranges did correspond to the presence of long-chain fatty acids and water absorption, respectively. This indicates that the most important wavelength range was identified as it held information about both of the other two wavelength ranges that were deemed less important when used on their own. The fourth most important wavelength range (2111-2132 nm) indicated a protein-absorption band, and the fifth most important wavelength range (1569-1604 nm) corresponded to a carbohydrate-absorption band. Most of the important wavelength ranges were found within the S2.0 sensor wavelength range (1550-2000 nm), explaining why the S2.0 sensor achieved higher prediction accuracy than the S2.5 sensor. However, despite this, better accuracy was observed using both sensors together, suggesting that there is still useful information to be extracted by incorporating the S2.5 sensor in the prediction task.

Comparison to Previous Work
The results in this work are in close agreement with those presented in the literature that have also compared domain-adaptation methodologies to transfer learning alone using NIR spectroscopy. In summary of this work, an increase in prediction accuracy of up to 7.84% was achieved through using the DANN and SGAN methodologies combined compared with transfer learning alone (Table 2, medium speed, no labelled data instances). Similarly, Zhang et al. (2022) [29] used domain adaptation to allow the use of visible-light image datasets to train networks for infrared pedestrian detection. Domain adaptation was used to align the features between the infrared and visible-light domains. A domain classifier and gradient reversal layer were used to achieve this, a method similar to the DANNs used in this work. The feature-extraction module was updated in the direction of increasing domain classification loss to enable alignment of the feature space from both domains. Compared with transfer learning alone, using domain adaptation with the EfficientDet network increased the average precision by 2.0% on the XDU-NIR2020 dataset and 2.2% on the CVC-09 dataset. Mishra and Nikzad-Langerodi (2020) [42] and   [43] compared partial least squares regression and domain-invariant partial least squares regression (di-PLS), and dynamic orthogonal projections and transfer component analysis (TCA), respectively. Di-PLS uses a regularization term to minimize the variability between both domains whilst maximizing covariance between the source domain and response variables. This is followed by ordinary PLS, where latent variables that explain most of the variability in the data are extracted. TCA aims to minimize the distance between the source and target domains whilst maximizing variance in the data. A disadvantage of these methods compared to the deep-learning approaches used in this work is their limited feature-extraction capability. When using deep neural networks, not only can features from the original spectra be amplified throughout the network layers, but relationships between wavelength intensities can also be considered [12]. Both works used these methods for analysis of fresh fruit samples. The domain-adaptation approaches were used to overcome spectral changes due to using different instruments, different operating temperatures, or seasonal variations for the fruit samples. Mishra and Nikzad-Langerodi (2020) [42] achieved increases in R 2 by up to 67% and decreases in prediction bias and root mean squared error (RMSE). Similarly,   [43] noticed increases in R 2 of up to 31% and 98% and 66% reductions in prediction bias and RMSE, respectively. This similarity to other works indicates that the methods presented in this work are suitable for improving in-line allergen detection and minimizing the data-collection burden in industrial environments.

Conclusions
A common problem in food manufacturing is human error causing the wrong powder material being loaded onto a production line. In-line NIR combined with ML is a solution to enable early detection of this problem. However, a method is needed to minimize the data-collection burden when deploying this solution in manufacturing environments to minimize process disruption. This work investigated two deep-learning domain-adaptation methodologies (DANNs and SGANs) to transfer ML models trained using spectra acquired under stationary conditions, akin to collecting spectra in a laboratory, to accurately predict using spectra acquired whilst moving, i.e., when the sensor is implemented above a conveyor. Combining both methods worked best, as did combining spectra from two NIR sensors with different wavelength ranges. Overall, accuracy of up to 96.0% was achieved using no labelled instances from the moving target domain data and up to 99.68% when incorporating a single labelled instance for each material category. The use of ensemble methods was shown to increase accuracy and provide a measure of model prediction confidence, and a feature-permutation method was used for global interpretability of the models. The most high-confidence false positives were produced when peanut-containing materials were incorrectly classified as containing gluten. This indicates that a greater number of neural networks in the ensemble models could be used for these cases or a higher confidence threshold could be utilized. Implementation of this screening method in production lines could help to reduce food waste and improve productivity, economics, and sustainability of agri-food systems.