# Spatial Fingerprinting: Horizontal Fusion of Multi-Dimensional Bio-Tracers as Solution to Global Food Provenance Problems

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data

^{15}N, $\delta $

^{13}C and $\delta $

^{34}S) and 14 fatty acids (C16:0, C16:1, C18:0, C18:1, C18:2n-6, C18:2n-6, C18:3n-3, C18:4n-3, C20:1, C20:4n-3, C20:5n-3, C22:1, C22:5n-3, C22:6n-3 and C24:1). One muscle sample from each fish was delivered frozen to the Lipid Analytical Services at the University of Guelph for fatty acid analysis using a combination of Bligh and Dwyer and Morrison and Smith methods [32,33]. Individual FA weights ($\mathsf{\mu}$g/g) were converted to a % FA composition and fatty acids with >1% presence were retained as bio-tracers. The second muscle samples were dried at 70 ${}^{\circ}$C for 2 days and ground into a fine powder in preparation for stable isotope analysis. Tissue samples were sent to the University of Windsor GLIER Chemical Tracers Lab for isotopic analysis of $\delta $

^{15}N, $\delta $

^{13}C and $\delta $

^{34}S (Windsor, ON, Canada). Importantly, all variables were centered and scaled before any statistical inference.

#### 2.2. Numerical Simulations

#### 2.2.1. Statistical Models

#### 2.2.2. Simulations Design

#### 2.3. Mathematical Proof

#### Numerical Implementation

## 3. Results

#### 3.1. The More Bio-Tracers the Better

#### 3.2. An Examination of the Performances

^{15}N and oleic acid (C18:1) alone perform well (0.547 and 0.548, respectively) whereas $\delta $

^{13}C and linoleic acid (C18:2n-6) perform poorly (0.343 and 0.333). It is worth noting that the top 3 bio-tracers, based on individual performances, includes 2 fatty acids (oleic acid and docosapentaenoic acid, i.e., C22:5n-3) and one stable isotope ($\delta $

^{15}N; see Figure 5) and thus cover the two classes of bio-tracers. Note that even though we only show this for LDA (Figure 5), this holds true for NBC (see Figure A8 in Appendix B) and MLP (see Figure A11).

^{2}= 72.0% and 48.1% for LDA, respectively, see Figure 7a,b, Figure A10a,b and Figure A13a,b for NBC and MLP, respectively). In one dimension, the mean Euclidean distance between region centroids efficiently summarizes one key geometrical results of the data space: the further apart the data points of different regions are, the stronger the discriminatory power (Figure 5b). This result could be seen as a simple case of a more general one: the less overlap among regional hypervolumes (i.e., hypervolumes generated by data points of the different regions), the stronger the discriminatory power of a set of bio-tracers gets (Figure 7c, Figure A10c and Figure A13c). Notably, increasing dimensions is often an efficient way to reduce overlap among regions data points (see Figure 7c, Appendix A, Figure A10c and Figure A13c).

## 4. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Mathematical Proofs

#### Appendix A.1. Objectives

#### Appendix A.2. Notations and Definitions

- n, p and q are three natural numbers other than zero;
- ${\mathbb{N}}_{n}$ is the set of natural numbers ranging from 0 to n, where $n\in {\mathbb{N}}^{*}$;
- $\left[X\right]$ denotes the probability of the event X (X being an event, or a realization of a random variable), and $\left[X\right|Y]$ probability of X given Y;
- $\mathcal{N}(\mu ,\sigma )$ denotes a Gaussian distribution of parameter $(\mu ,\sigma )$;
- $\mathbb{E}\left(X\right)$ denotes the expected value of the random variable X.

#### Appendix A.3. Bayesian Approach

#### Appendix A.4. Effect of Dissimilarity between 2 Distributions

#### Appendix A.4.1. General Considerations

- Improving the a priori knowledge (which we do not consider here);
- Considering a larger quantity and/or higher quality of observations;
- Using more reliable inference techniques.

- ${\sigma}_{1}={\sigma}_{2}=\sigma $ where the dissimilarity will be quantified as $|{\mu}_{1}-{\mu}_{2}|$;
- ${\mu}_{1}={\mu}_{2}=\mu $ where the dissimilarity will be quantified as $\frac{{\sigma}_{1}}{{\sigma}_{2}}$.

#### Appendix A.4.2. Identical Variances

**Figure A1.**The larger the difference between the means the better the performance. $\mathbb{E}\left(\right)open="("\; close=")">\left(\right)open="["\; close="]">{A}_{1}|\mathbf{s}$ is computed for an increasing value ${({\mu}_{1}-{\mu}_{2})}^{2}$. Every point corresponds to the average value from 10,000 simulations and red lines correspond to the analytic solutions. From the darkest to the lightest gray, triplets $\{{\mu}_{1},{\mu}_{2},\sigma \}$ are as follows: $\{0,0.1,1\}$, $\{0,0.25,1\}$, $\{0,0.5,1\}$, $\{0,1,1\}$ and $\{0,1,0.5\}$.

#### Appendix A.4.3. Identical Means

**Figure A2.**Effect of the variance ratio on performances. (

**a**) $\mathbb{E}\left(\right)open="("\; close=")">\left(\right)open="["\; close="]">{A}_{1}|\mathbf{s}$ is computed for an increasing $\frac{{\sigma}_{1}}{{\sigma}_{2}}$ value. From the darker to the lighter gray, n (sample size) increases as follows: 1, 2.5, 5, 10, and 25. (

**b**) $\mathbb{E}\left(\right)open="("\; close=")">\left(\right)open="["\; close="]">{A}_{1}|\mathbf{s}$ is plotted against the sample size (n). Every point represents the average value from 10,000 simulations, red lines correspond to the analytic solutions. For all simulations, ${\mu}_{1}={\mu}_{2}=0$, ${\sigma}_{1}=1$ and from the darkest to the lightest gray, ${\sigma}_{2}$ increases as follows: 1, 1.2, 1.5, 2 and 5.

#### Appendix A.5. Effect of Dimensionality

#### Appendix A.5.1. General Considerations

#### Appendix A.5.2. Simplification

**Figure A3.**Effect of the number of bio-tracers combined on performances. $\mathbb{E}\left(\right)open="("\; close=")">\left(\right)open="["\; close="]">{A}_{1}|\mathbf{s}$ is computed for an increasing number of samples (

**a**) and for an increasing number of bio-tracers combined (

**b**). Every point represents the average value from 10,000 simulations, red lines correspond to the analytic solutions. For all simulations, $\sigma =1$, for any real l, ${\mu}_{1,l}=0$ and ${\mu}_{1,l}=0.2$. (

**a**) from the darkest to the lightest gray, the number of bio-tracers combined are 2, 3, 5, 10 and 20, respectively. (

**b**) from the darkest to the lightest gray, the number of samples employed are 1, 3, 5, 10 and 20, respectively.

#### Appendix A.6. The Role of Correlation

**Figure A4.**Effect of correlation among bio-tracers on the determination of origin. The lighter the line the less correlated are the two bio-tracers. $\rho $ values range from 0 to 0.99 with an increment of 0.11, each point represents the average over 100,000 simulations.

**Figure A5.**Effect of dimensionality of the set of bio-tracers on the determination of origin. Each point represents the average value from 100,000 simulations.

#### Appendix A.7. Conclusions

- Increase the size of the sample,
- Use bio-tracers that maximize the dissimilarity of distributions,
- Increase the number of bio-tracers used (preferably as less correlated as possible),
- Use a combination of the factors aforementioned.

## Appendix B. Additional Figures and Tables

#### Appendix B.1. Influence of Data Augmentation and Noise Addition for Multi-Layer Perceptron (MLP)

**Figure A6.**Effect of data augmentation and noise addition in the overall performance of the Multi-Layer Perceptron (MLP). Overall performance of the MLP are plotted against the number of times the data set is repeated (augmentation) and lines are colored according the noise level employed. We did so for the three stable isotopes we used (

**a**), the three first fatty acids (

**b**) and all the 17 bio-tracers together (

**c**). The dotted red vertical line indicates the augmentation level we opted for and the red plain line represents the noise level we chose.

#### Appendix B.2. Influence of the Training Set Size for the Three Methods

**Figure A7.**Effect of the size of the training set on overall performance of the three methods employed. Overall performances are plotted for an increasing number of sample used in the training set. Points are colored according to the number of bio-tracers combined. Every point represents the average over up to 100,000 simulations (up to 500 axes combinations of bio-tracers and 200 replicates per combination). The vertical red dotted line indicates the size of the training set we used in the main text. The three panels correspond to three statistical approaches used: NBC (

**a**), LDA (

**b**), MLP (

**c**). Note that potential gains in discriminatory power are increasing with the number of bio-tracers combined but obtaining such gains requires a larger number of samples.

#### Appendix B.3. Results for Pairs and Triplets

#### Appendix B.3.1. Equivalent of Figures 5–7 for the Naive Bayesian Classifier (NBC)

**Figure A8.**Performances of individual bio-tracers. Overall performances of all 17 bio-tracers (listed on the right) using NBC are plotted against the proportion of inter-regions variance (

**a**) and the mean distance between all pairs of region centroids (

**b**).

**Figure A9.**Including one more bio-tracers increases performance. Overall performances of all pairs of bio-tracers using NBC are plotted against the best individual performing bio-tracers of the pair (

**a**) and their average overall performance (

**c**). Similarly, overall performances of all triplets of bio-tracers are plotted against the best performing pair of bio-tracers of the triplet (

**b**) and their average overall performance (

**d**). Magenta dashed lines represent the 1:1 slope.

**Figure A10.**Efficient combinations of bio-tracers maximise inter-regional variance and minimize overlap of region data hypervolumes. (

**a**,

**b**), For all combinations of 2 (

**a**) and 3 (

**b**) bio-tracers, the overall performances (with NBC) of sets of bio-tracers are plotted against their inter-regions variance. (

**c**), We present relationship between the proportion of overlap of data between regions and the overall performances for all pairs and triplets of bio-tracers. Magenta dashed lines represent the results of non-linear leas-squares regression and the corresponding R-squared are added at the bottom of every panel.

#### Appendix B.3.2. Equivalent of Figures 5–7 for Mutiple-Layer Perceptron (MLP, a Class of Neural Network)

**Figure A11.**Performances of individual bio-tracers. Overall performances of all 17 bio-tracers (listed on the right) using MLP are plotted against the proportion of inter-regions variance (

**a**) and the mean distance between all pairs of region centroids (

**b**).

**Figure A12.**Including one more bio-tracers increases performance. Overall performances of all pairs of bio-tracers using MLP are plotted against the best individual performing bio-tracers of the pair (

**a**) and their average overall performance (

**c**). Similarly, overall performances of all triplets of bio-tracers are plotted against the best performing pair of bio-tracers of the triplet (

**b**) and their average overall performance (

**d**). Magenta dashed lines represent the 1:1 slope.

**Figure A13.**Efficient combinations of bio-tracers maximise inter-regional variance and minimize overlap of region data hypervolumes. (

**a**,

**b**), For all combinations of 2 (

**a**) and 3 (

**b**) bio-tracers, the overall performances (with MLP) of sets of bio-tracers are plotted against their inter-regions variance. (

**c**), We present relationship between the proportion of overlap of data between regions and the overall performances for all pairs and triplets of bio-tracers. Magenta dashed lines represent the results of non-linear leas-squares regression and the corresponding R-squared are added at the bottom of every panel.

#### Appendix B.4. Best Pairs and Triplets of Bio-Tracers

**Table A1.**Top 10 pair of bio-tracers. Abbreviations are as follows: bt bio-tracer, op overall performance.

Rank | LDA | NBC | MLP | ||||||
---|---|---|---|---|---|---|---|---|---|

bt 1 | bt 2 | op | bt 1 | bt 2 | op | bt 1 | bt 2 | op | |

1 | C20:5n-3 | C22:1 | 0.701 | ${\delta}^{15}N$ | C18:1 | 0.721 | C20:5n-3 | C22:1 | 0.748 |

2 | C18:1 | C22:6n-3 | 0.701 | C18:1 | C22:5n-3 | 0.717 | C18:1 | C18:3n-3 | 0.722 |

3 | C18:0 | C18:1 | 0.700 | C18:1 | C20:1 | 0.694 | C18:3n-3 | C22:5n-3 | 0.701 |

4 | C16:0 | C18:1 | 0.697 | ${\delta}^{15}N$ | C22:5n-3 | 0.693 | C18:1 | C22:5n-3 | 0.691 |

5 | ${\delta}^{15}N$ | C18:1 | 0.691 | C18:1 | C20:5n-3 | 0.693 | C18:0 | C18:1 | 0.688 |

6 | C20:1 | C22:6n-3 | 0.688 | ${\delta}^{15}N$ | C18:4n-3 | 0.687 | ${\delta}^{15}N$ | C18:1 | 0.683 |

7 | ${\delta}^{15}N$ | C16:1 | 0.687 | C18:4n-3 | C22:5n-3 | 0.682 | C16:0 | C18:1 | 0.683 |

8 | C18:1 | C20:1 | 0.677 | ${\delta}^{15}N$ | C20:1 | 0.682 | C18:1 | C22:6n-3 | 0.680 |

9 | C18:1 | C22:1 | 0.666 | C16:0 | C18:1 | 0.677 | C16:0 | C20:5n-3 | 0.678 |

10 | C18:1 | C20:5n-3 | 0.663 | ${\delta}^{15}N$ | C20:4n-3 | 0.676 | C18:1 | C20:5n-3 | 0.670 |

**Table A2.**Top 10 triplet of bio-tracers. Abbreviations are as follows: **bt** bio-tracer, **op** overall performance. The most left column indicates the rank of the combination.

LDA | NBC | MLP | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

bt 1 | bt 2 | bt 3 | op | bt 1 | bt 2 | bt 3 | op | bt 1 | bt 2 | bt 3 | op | |

1 | C18:1 | C20:5n-3 | C22:1 | 0.833 | ${\delta}^{15}N$ | C18:1 | C18:4n-3 | 0.805 | C18:1 | C20:4n-3 | C22:5n-3 | 0.873 |

2 | C18:1 | C20:5n-3 | C22:6n-3 | 0.793 | C18:1 | C18:4n-3 | C22:5n-3 | 0.791 | C18:3n-3 | C20:5n-3 | C22:1 | 0.851 |

3 | ${\delta}^{15}N$ | C20:5n-3 | C22:1 | 0.787 | ${\delta}^{15}N$ | C18:1 | C22:5n-3 | 0.779 | C18:1 | C20:5n-3 | C22:1 | 0.833 |

4 | C16:0 | C18:1 | C20:5n-3 | 0.771 | C18:1 | C20:4n-3 | C22:5n-3 | 0.779 | C18:1 | C20:4n-3 | C20:5n-3 | 0.832 |

5 | C18:0 | C18:1 | C20:4n-3 | 0.769 | C18:1 | C18:4n-3 | C20:5n-3 | 0.775 | ${\delta}^{15}N$ | C20:5n-3 | C22:1 | 0.828 |

6 | C18:1 | C20:4n-3 | C22:6n-3 | 0.769 | C18:1 | C18:3n-3 | C22:5n-3 | 0.773 | C18:3n-3 | C20:4n-3 | C22:5n-3 | 0.828 |

7 | C20:1 | C20:4n-3 | C22:6n-3 | 0.762 | ${\delta}^{15}N$ | C18:4n-3 | C22:5n-3 | 0.771 | C20:5n-3 | C22:1 | C22:5n-3 | 0.826 |

8 | C18:1 | C20:1 | C20:5n-3 | 0.759 | ${\delta}^{15}N$ | C18:1 | C20:5n-3 | 0.765 | C18:4n-3 | C20:5n-3 | C22:1 | 0.814 |

9 | C18:0 | C18:1 | C20:5n-3 | 0.758 | ${\delta}^{15}N$ | C18:1 | C20:4n-3 | 0.762 | C18:1 | C18:3n-3 | C20:4n-3 | 0.812 |

10 | ${\delta}^{15}N$ | C18:1 | C20:4n-3 | 0.755 | ${\delta}^{15}N$ | C18:1 | C18:3n-3 | 0.761 | C18:3n-3 | C22:5n-3 | C22:6n-3 | 0.810 |

#### Appendix B.5. DNA Barcodes

**Figure A14.**Unrooted neighbour-joining tree (NJ) based on the p-distance of the 650 bp barcode region of the Cytochrome c Oxidase I gene. The NJ tree was generated using the Barcode of Life Data System V4 see also boldsystems.org [55] using the Kimura 2 Parameter distance model [56] and sequences were generated using the LifeScanner DNA sequencing kit (lifescanner.net).

## References

- Kneen, B. From Land to Mouth: Understanding the Food System; NC Press: Toronto, Japan, 1993. [Google Scholar]
- Godfray, H.C.J.; Crute, I.R.; Haddad, L.; Lawrence, D.; Muir, J.F.; Nisbett, N.; Pretty, J.; Robinson, S.; Toulmin, C.; Whiteley, R. The Future of the Global Food System. Philos. Trans. R. Soc. Biol. Sci.
**2010**, 365, 2769–2777. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ingram, J. A Food Systems Approach to Researching Food Security and Its Interactions with Global Environmental Change. Food Secur.
**2011**, 3, 417–431. [Google Scholar] [CrossRef] - Clapp, J. Distant Agricultural Landscapes. Sustain. Sci.
**2015**, 10, 305–316. [Google Scholar] [CrossRef][Green Version] - FAO. The Future of Food and Agriculture—Alternative Pathways to 2050; Food & Agriculture Org: Rome, Italy, 2018. [Google Scholar]
- Béné, C.; Oosterveer, P.; Lamotte, L.; Brouwer, I.D.; de Haan, S.; Prager, S.D.; Talsma, E.F.; Khoury, C.K. When Food Systems Meet Sustainability—Current Narratives and Implications for Actions. World Dev.
**2019**, 113, 116–130. [Google Scholar] [CrossRef] - Weber, C.L.; Matthews, H.S. Food-Miles and the Relative Climate Impacts of Food Choices in the United States. Environ. Sci. Technol.
**2008**, 42, 3508–3513. [Google Scholar] [CrossRef][Green Version] - Roebuck, K.; Turlo, C.; Fuller, S.D. Canadians Eating in the Dark: A Report Card of International Seafood Labelling Requirements; SeaChoice: Pompano Beach, FL, USA, 2017; 24p. [Google Scholar]
- Aung, M.M.; Chang, Y.S. Traceability in a Food Supply Chain: Safety and Quality Perspectives. Food Control
**2014**, 39, 172–184. [Google Scholar] [CrossRef] - Galvez, J.F.; Mejuto, J.; Simal-Gandara, J. Future Challenges on the Use of Blockchain for Food Traceability Analysis. TrAC Trends Anal. Chem.
**2018**, 107, 222–232. [Google Scholar] [CrossRef] - Badia-Melis, R.; Mishra, P.; Ruiz-García, L. Food Traceability: New Trends and Recent Advances. A Review. Food Control
**2015**, 57, 393–401. [Google Scholar] [CrossRef] - Danezis, G.P.; Tsagkaris, A.S.; Brusic, V.; Georgiou, C.A. Food Authentication: State of the Art and Prospects. Curr. Opin. Food Sci.
**2016**, 10, 22–31. [Google Scholar] [CrossRef] - Luykx, D.M.; van Ruth, S.M. An Overview of Analytical Methods for Determining the Geographical Origin of Food Products. Food Chem.
**2008**, 107, 897–911. [Google Scholar] [CrossRef] - Danezis, G.P.; Tsagkaris, A.S.; Camin, F.; Brusic, V.; Georgiou, C.A. Food Authentication: Techniques, Trends & Emerging Approaches. TrAC Trends Anal. Chem.
**2016**, 85, 123–132. [Google Scholar] [CrossRef][Green Version] - Wong, E.H.K.; Hanner, R.H. DNA Barcoding Detects Market Substitution in North American Seafood. Food Res. Int.
**2008**, 41, 828–837. [Google Scholar] [CrossRef] - Baker, C.S. A Truer Measure of the Market: The Molecular Ecology of Fisheries and Wildlife Trade. Mol. Ecol.
**2008**, 17, 3985–3998. [Google Scholar] [CrossRef] [PubMed] - Galimberti, A.; De Mattia, F.; Losa, A.; Bruni, I.; Federici, S.; Casiraghi, M.; Martellos, S.; Labra, M. DNA Barcoding as a New Tool for Food Traceability. Food Res. Int.
**2013**, 50, 55–63. [Google Scholar] [CrossRef] - Shehata, H.R.; Bourque, D.; Steinke, D.; Chen, S.; Hanner, R. Survey of Mislabelling across Finfish Supply Chain Reveals Mislabelling Both Outside and within Canada. Food Res. Int.
**2018**, 121, 723–729. [Google Scholar] [CrossRef] [PubMed] - Shehata, H.R.; Naaum, A.M.; Garduño, R.A.; Hanner, R. DNA Barcoding as a Regulatory Tool for Seafood Authentication in Canada. Food Control
**2018**, 92, 147–153. [Google Scholar] [CrossRef] - Louppis, A.P.; Karabagias, I.K.; Papastephanou, C.; Badeka, A. Two-Way Characterization of Beekeepers’ Honey According to Botanical Origin on the Basis of Mineral Content Analysis Using ICP-OES Implemented with Multiple Chemometric Tools. Foods
**2019**, 8, 210. [Google Scholar] [CrossRef] [PubMed][Green Version] - Camin, F.; Bontempo, L.; Perini, M.; Piasentier, E. Stable Isotope Ratio Analysis for Assessing the Authenticity of Food of Animal Origin: Authenticity of Animal Origin Food. Compr. Rev. Food Sci. Food Saf.
**2016**, 15, 868–877. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bontempo, L.; Paolini, M.; Franceschi, P.; Ziller, L.; García-González, D.L.; Camin, F. Characterisation and Attempted Differentiation of European and Extra-European Olive Oils Using Stable Isotope Ratio Analysis. Food Chem.
**2019**, 276, 782–789. [Google Scholar] [CrossRef] [PubMed] - Shin, W.J.; Choi, S.H.; Ryu, J.S.; Song, B.Y.; Song, J.H.; Park, S.; Min, J.S. Discrimination of the Geographic Origin of Pork Using Multi-Isotopes and Statistical Analysis. Rapid Commun. Mass Spectrom.
**2018**, 32, 1843–1850. [Google Scholar] [CrossRef] - Chung, I.M.; Kim, J.K.; Lee, K.J.; Park, S.K.; Lee, J.H.; Son, N.Y.; Jin, Y.I.; Kim, S.H. Geographic Authentication of Asian Rice (Oryza Sativa L.) Using Multi-Elemental and Stable Isotopic Data Combined with Multivariate Analysis. Food Chem.
**2018**, 240, 840–849. [Google Scholar] [CrossRef] [PubMed] - Karabagias, I.K. Seeking of Reliable Markers Related to Greek Nectar Honey Geographical and Botanical Origin Identification Based on Sugar Profile by HPLC-RI and Electro-Chemical Parameters Using Multivariate Statistics. Eur. Food Res. Technol.
**2019**, 245, 805–816. [Google Scholar] [CrossRef] - Zhao, Y.; Tu, T.; Tang, X.; Zhao, S.; Qie, M.; Chen, A.; Yang, S. Authentication of Organic Pork and Identification of Geographical Origins of Pork in Four Regions of China by Combined Analysis of Stable Isotopes and Multi-Elements. Meat Sci.
**2020**, 165, 108129. [Google Scholar] [CrossRef] [PubMed] - Wu, H.; Tian, L.; Chen, B.; Jin, B.; Tian, B.; Xie, L.; Rogers, K.M.; Lin, G. Verification of Imported Red Wine Origin into China Using Multi Isotope and Elemental Analyses. Food Chem.
**2019**, 301, 125137. [Google Scholar] [CrossRef] [PubMed] - Fiorillo, J. Canadian Wild Salmon Fisheries Quitting MSC Program. Available online: https://www.intrafish.com/fisheries/canadian-wild-salmon-fisheries-quitting-msc-program/2-1-683519 (accessed on 10 April 2014).
- Centre, T.W.S. A Review of IUU Salmon Fishing and Potential Conservation Strategies in the Russian Far East; Technical Report; The Wild Salmon Center: Portland, OR, USA, 2009. [Google Scholar]
- Clarke, S. Trading Tails: Russian Salmon Fisheries and East Asian Markets; Technical Report; TRAFFIC East Asia: Hong Kong, China, 2007. [Google Scholar]
- Clarke, S.C.; McAllister, M.K.; Kirkpatrick, R.C. Estimating Legal and Illegal Catches of Russian Sockeye Salmon from Trade and Market Data. ICES J. Mar. Sci.
**2009**, 66, 532–545. [Google Scholar] [CrossRef] - Bligh, E.G.; Dyer, W.J. A Rapid Method of Total Lipid Extraction and Purification. Can. J. Biochem. Physiol.
**1959**, 37, 911–917. [Google Scholar] [CrossRef] [PubMed][Green Version] - Morrison, W.R.; Smith, M. Preparation of Fatty Acid Methyl Esters and Dimethylacetals from Lipids with Boron Fluoride-Methanol. J. Lipid Res.
**1964**, 5, 600–608. [Google Scholar] [CrossRef] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Sun, S.; Guo, B.; Wei, Y. Origin Assignment by Multi-Element Stable Isotopes of Lamb Tissues. Food Chem.
**2016**, 213, 675–681. [Google Scholar] [CrossRef] - Wunder, M.B. Using Isoscapes to Model Probability Surfaces for Determining Geographic Origins. In Isoscapes; West, J.B., Bowen, G.J., Dawson, T.E., Tu, K.P., Eds.; Springer: Dordrecht, the Netherland, 2010; pp. 251–270. [Google Scholar] [CrossRef]
- Bataille, C.P.; Bowen, G.J. Mapping 87Sr/86Sr Variations in Bedrock and Water for Large Scale Provenance Studies. Chem. Geol.
**2012**, 304–305, 39–52. [Google Scholar] [CrossRef] - Scott, D.W. Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed.; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Venables, W.N.; Ripley, B.D.; Venables, W.N. Modern Applied Statistics with S, 4th ed.; Statistics and Computing; Springer: New York, NY, USA, 2002. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
- Innes, M. Flux: Elegant Machine Learning with Julia. J. Open Source Softw.
**2018**, 3, 602. [Google Scholar] [CrossRef][Green Version] - Naccarato, A.; Furia, E.; Sindona, G.; Tagarelli, A. Multivariate Class Modeling Techniques Applied to Multielement Analysis for the Verification of the Geographical Origin of Chili Pepper. Food Chem.
**2016**, 206, 217–222. [Google Scholar] [CrossRef] [PubMed] - Chen, D.; Cao, X.; Wen, F.; Sun, J. Blessing of Dimensionality: High-Dimensional Feature and Its Efficient Compression for Face Verification. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 3025–3032. [Google Scholar] [CrossRef]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar] [CrossRef]
- Bartos, I.; Kowalski, M.; Institute of Physics (Gran Bretanya). Multimessenger Astronomy; IOP Publishing: Bristol, UK, 2017. [Google Scholar]
- Pimentel, T.; Marcelino, J.; Ricardo, F.; Soares, A.M.V.M.; Calado, R. Bacterial Communities 16S rDNA Fingerprinting as a Potential Tracing Tool for Cultured Seabass Dicentrarchus Labrax. Sci. Rep.
**2017**, 7, 11862. [Google Scholar] [CrossRef] [PubMed] - Zhao, H.; Zhang, S.; Zhang, Z. Relationship between Multi-Element Composition in Tea Leaves and in Provenance Soils for Geographical Traceability. Food Control
**2017**, 76, 82–87. [Google Scholar] [CrossRef] - Liu, G.; Liu, Q.; Li, P. Blessing of Dimensionality: Recovering Mixture Data via Dictionary Pursuit. IEEE Trans. Pattern Anal. Mach. Intell.
**2017**, 39, 47–60. [Google Scholar] [CrossRef] - Hebert, P.D.N.; Cywinska, A.; Ball, S.L.; de Waard, J.R. Biological Identifications through DNA Barcodes. Proc. R. Soc. Lond. Ser. B Biol. Sci.
**2003**, 270, 313–321. [Google Scholar] [CrossRef][Green Version] - Zemlak, T.S.; Ward, R.D.; Connell, A.D.; Holmes, B.H.; Hebert, P.D.N. DNA Barcoding Reveals Overlooked Marine Fishes. Mol. Ecol. Resour.
**2009**, 9, 237–242. [Google Scholar] [CrossRef] - Ehleringer, J.R.; Bowen, G.J.; Chesson, L.A.; West, A.G.; Podlesak, D.W.; Cerling, T.E. Hydrogen and Oxygen Isotope Ratios in Human Hair Are Related to Geography. Proc. Natl. Acad. Sci. USA
**2008**, 105, 2788–2793. [Google Scholar] [CrossRef][Green Version] - Fan, X.; Messenger, C.; Heng, I.S. A Bayesian Approach to Multi-Messenger Astronomy: Identification of Gravitational-Wave Host Galaxies. Astrophys. J.
**2014**, 795, 43. [Google Scholar] [CrossRef] - Frederic, P.; Lad, F. Two Moments of the Logitnormal Distribution. Commun. Stat. Simul. Comput.
**2008**, 37, 1263–1269. [Google Scholar] [CrossRef] - Wutzler, T. Logitnorm: Functions for the Logitnormal Distribution; 2018. [Google Scholar]
- Ratnasingham, S.; Hebert, P.D.N. BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Mol. Ecol. Notes
**2007**, 7, 355–364. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kimura, M. A Simple Method for Estimating Evolutionary Rates of Base Substitutions through Comparative Studies of Nucleotide Sequences. J. Mol. Evol.
**1980**, 16, 111–120. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Combining bio-tracers to improve the determination of samples’ provenance. (

**a**), Sockeye salmon (Oncorhynchus nerka) samples of this study originate from three potential origins, namely Alaska, United States (yellow); British Columbia, Canada (cyan) and Kamchatka Peninsula, Russia (magenta). (

**b**), We examine the efficiency of horizontal strategies that combine several classes of bio-tracers as opposed to vertical strategies that focus on one specific class. (

**c**), While using a single bio-tracer to discriminate the true origin of a sample (distributions on top and right of the chart, dotted line depict bio-tracer values of a sample) may prove difficult, combining bio-tracers (colored areas) enhances the performance of the inference process. (

**d**), This is also shown with confusion matrices obtained using a classifier that uses only the first bio-tracers (top), only the second one (middle) or the combination of the two (bottom).

**Figure 2.**Increasing the number of bio-tracers considerably improves statistical performances. (

**a**–

**c**), The probability of assigning one sample to its true origin increases as the number of bio-tracers employed increases for the three regions considered, namely Alaska (yellow), British Columbia (cyan) and Kamchatka Peninsula (magenta). (

**d**–

**f**), The overall performance (i.e., the correct assigning any sample to its true origin) can also be improved by combining samples, assuming samples combined originate from the same region (e.g., individuals of the same lot). Points are colored according to the number of samples combined. These results are qualitatively similar for the three statistical approaches considered, which are Naive Bayesian classifier (NBC; (

**a**,

**d**)), Latent Discriminant Analysis (LDA; (

**b**,

**e**)) and a Multi-Layer Perceptron (MLP; (

**c**,

**f**)). In all panels, points represent performances averaged over up to 100,000 replicates (see Methods for further details).

**Figure 3.**Combining bio-tracers is robust to noise addition. The probability of correctly determining the provenance of samples is evaluated for an increasing noise addition to the training data set. The lighter the gray, the more the number of bio-tracers combined. Note that prior to analysis, all bio-tracer values were scaled, thus a noise level of 1 represents a strong noise addition. The three panels correspond to three statistical approaches used: NBC (

**a**), LDA (

**b**), MLP (

**c**).

**Figure 4.**PCA does not necessarily provides axes with the maximum discriminatory power. Boxplots represent the probability of correctly determining the provenance of one sample for 500 combinations of bio-tracers (or all combinations if the total number of combinations is less than 500; see methods for details). Black lines and points represent results obtained when the first of principal component axes are being used. The three panels corresponds to three statistical approaches used: NBC (

**a**), LDA (

**b**), MLP (

**c**).

**Figure 5.**Performances of individual bio-tracers. Overall performances of all 17 bio-tracers (listed on the right) using LDA are plotted against the proportion of inter-regions variance (

**a**) and the mean distance between all pairs of region centroids (

**b**).

**Figure 6.**Including one more bio-tracers increases performance. Overall performances of all pairs of bio-tracers using LDA are plotted against the best individual performing bio-tracers of the pair (

**a**) and their average overall performance (

**c**). Similarly, overall performances of all triplets of bio-tracers are plotted against the best performing pair of bio-tracers of the triplet (

**b**) and their average overall performance (

**d**). Magenta dashed lines represent the 1:1 slope.

**Figure 7.**Efficient combinations of bio-tracers maximise inter-regional variance and minimize overlap of region data hypervolumes. (

**a**,

**b**) For all combinations of 2 (

**a**) and 3 (

**b**) bio-tracers, the overall performances (with LDA) of sets of bio-tracers are plotted against their inter-regions variance. (

**c**) We present relationship between the proportion of overlap of data between regions and the overall performances for all pairs and triplets of bio-tracers. Magenta dashed lines represent the results of non-linear leas-squares regression and the corresponding R-squared are added at the bottom of every panel.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cazelles, K.; Zemlak, T.S.; Gutgesell, M.; Myles-Gonzalez, E.; Hanner, R.; Shear McCann, K.
Spatial Fingerprinting: Horizontal Fusion of Multi-Dimensional Bio-Tracers as Solution to Global Food Provenance Problems. *Foods* **2021**, *10*, 717.
https://doi.org/10.3390/foods10040717

**AMA Style**

Cazelles K, Zemlak TS, Gutgesell M, Myles-Gonzalez E, Hanner R, Shear McCann K.
Spatial Fingerprinting: Horizontal Fusion of Multi-Dimensional Bio-Tracers as Solution to Global Food Provenance Problems. *Foods*. 2021; 10(4):717.
https://doi.org/10.3390/foods10040717

**Chicago/Turabian Style**

Cazelles, Kevin, Tyler Stephen Zemlak, Marie Gutgesell, Emelia Myles-Gonzalez, Robert Hanner, and Kevin Shear McCann.
2021. "Spatial Fingerprinting: Horizontal Fusion of Multi-Dimensional Bio-Tracers as Solution to Global Food Provenance Problems" *Foods* 10, no. 4: 717.
https://doi.org/10.3390/foods10040717