Overview of Algorithms for Using Particle Morphology in Pre-Detonation Nuclear Forensics

: A major goal in pre-detonation nuclear forensics is to infer the processing conditions and/or facility type that produced radiological material. This review paper focuses on analyses of particle size, shape, texture (“morphology”) signatures that could provide information on the provenance of interdicted materials. For example, uranium ore concentrates (UOC or yellowcake) include ammonium diuranate (ADU), ammonium uranyl carbonate (AUC), sodium diuranate (SDU), magnesium diuranate (MDU), and others, each prepared using different salts to precipitate U from solution. Once precipitated, UOCs are often dried and calcined to remove adsorbed water. The products can be allowed to react further, forming uranium oxides UO3, U3O8, or UO2 powders, whose surface morphology can be indicative of precipitation and/or calcination conditions used in their production. This review paper describes statistical issues and approaches in using quantitative analyses of measurements such as particle size and shape to infer production conditions. Statistical topics include multivariate t tests (Hotelling’s T 2 ), design of experiments, and several machine learning (ML) options including decision trees, learning vector quantization neural networks, mixture discriminant analysis, and approximate Bayesian computation (ABC). ABC is emphasized as an attractive option to include the effects of model uncertainty in the selected and ﬁtted forward model used for inferring processing conditions.


Introduction
References [1][2][3][4][5][6][7] describe pre-detonation nuclear forensics goals. Pre-detonation nuclear forensics focuses on determining the source and route of nuclear materials and devices prior to their use in an improvised nuclear device or radiological dispersal device. Baseline morphology and discussion of the technical production details for four precipitation conditions (ADU, AUC, SDU, and MDU) are provided in [1,2]. Subsequent studies have investigated the quantitative morphological variations for a selection of these materials due to processing, aging, and/or storage conditions [3,4,7]. One related inference goal is to determine factors that impact particle morphology, such as calcination temperature, production pathway involving UO3, U3O8, or UO2, at 400 • C, 800 • C, and 510 • C (in H2), respectively, and impurities for a given precipitation condition. Nuclear forensic analysis utilizes a suite of techniques to help elucidate the provenance for a nuclear material of interest. The analysis of surface morphology for particulate samples has been of great interest as a possible indicator of the synthetic conditions used to produce nuclear materials, such as uranium ore concentrates (UOCs). The technique of quantitative morphological analysis as applied to nuclear forensics is this paper's focus.
This review paper illustrates statistical issues using 22 quantitative morphological metrics (Appendix A describes these 22 metrics) derived from the analysis of scanning electron microscopy images (SEM) of each segmented particle [2,4,5]. Figure 1 highlights This review paper illustrates statistical issues using 22 quantitative morphological metrics (Appendix A describes these 22 metrics) derived from the analysis of scanning electron microscopy images (SEM) of each segmented particle [2,4,5]. Figure 1 highlights some example SEM images acquired from nuclear materials. The left image is the raw unprocessed SEM image and the right image is the image following manual particle segmentation using the Morphological Analysis of Materials (MAMA) software [2,[8][9][10]. Morphological features are proving to be useful signatures to infer the chemical processing histories (which relate to forensics goals [1][2][3]) of uranium oxides. Particle morphology analysis has recently been developed and applied with image analysis software to segment fully visible particles in SEM images and then compute attributes, such as circularity, area, perimeter, and ellipse aspect ratio of the segmented particles. One example of such software is MAMA, which performs segmentation and quantification. SEM images of nuclear materials can be difficult to segment due to overlapping particles, charging effects, and image clarity. Manual segmentation (locating, recognizing, and assigning boundaries to particles) requires significant time-consuming user inputs to only segment fully visible particles. An alternative automated segmentation method using a deep learning model (a U-net based on convolutional neural networks) was shown in [8] to effectively segment fully visible particles (see also Reference [11] for another segmentation option); however, this paper's focus is on inferring processing conditions using currentlyavailable segmentation software, such as MAMA, that can also provide the 22 quantitative particle morphological metrics. The left image contains the raw unprocessed image of the particle surface morphology at 50,00×. The image on the right was manually segmented using MAMA software. The blue regions were the individual particle features that had clearly defined boundaries and were used to quantitatively characterize the morphology of the material. Figure 2 highlights representative SEM images of ADU, SDU, MDU, and AUC and the materials generated throughout the calcination and reduction pathway of each respective starting material. In [1] and [2], each of the ADU, AUC, MDU, and SDU conditions have three processing pathways involving UO3, U3O8, or UO2, at 400 °C, 800 °C, and 510 °C (in H2), respectively. The main inference goal is to use particle morphology to distinguish among ADU, AUC, MDU, and SDU processing routes. One open question is whether machine learning (ML) methods can clearly recognize ADU, AUC, MDU, SDU when all 3 pathways are represented in the data and also when only one pathway (baseline condition only, or either of the other two conditions) is represented in the data. Another inference goal is then to determine factors that impact particle morphology, such as calcination temperature, production pathway, and impurities for a given precipitation condition. The left image contains the raw unprocessed image of the particle surface morphology at 50,00×. The image on the right was manually segmented using MAMA software. The blue regions were the individual particle features that had clearly defined boundaries and were used to quantitatively characterize the morphology of the material. Figure 2 highlights representative SEM images of ADU, SDU, MDU, and AUC and the materials generated throughout the calcination and reduction pathway of each respective starting material. In [1,2], each of the ADU, AUC, MDU, and SDU conditions have three processing pathways involving UO3, U3O8, or UO2, at 400 • C, 800 • C, and 510 • C (in H2), respectively. The main inference goal is to use particle morphology to distinguish among ADU, AUC, MDU, and SDU processing routes. One open question is whether machine learning (ML) methods can clearly recognize ADU, AUC, MDU, SDU when all 3 pathways are represented in the data and also when only one pathway (baseline condition only, or either of the other two conditions) is represented in the data. Another inference goal is then to determine factors that impact particle morphology, such as calcination temperature, production pathway, and impurities for a given precipitation condition.

Figure 2.
Example SEM images, such as those used in [1,2], at the listed magnifications (right), for MDU, ADU, SDU, and AUC throughout the 400 °C (UO3), 800 °C (U3O8), and 510 °C (UO2) production pathways. This paper focuses on ML options applied to the 22 MAMA measurements per particle to recognize ADU, AUC, MDU, and SDU precipitation conditions when 1, 2, or 3 pathways are represented in the data. Data from [2] are analyzed to estimate the correct classification rate (CCR) in recognizing ADU, AUC, SDU, and MDU. For completeness, it is noted here that data from [1] are vector representations of images (using unsupervised vector quantized variational autoencoding) either of length 256 or 1000 that are derived from each SEM image [1]. These vector representations of images have been shown to have a CCR of approximately 0.80 depending on the particular data subset used [1]. Because the CCR depends on the particular data subset analyzed, this paper does not directly compare the CCR when using the 256-length or 1000-length vectors from [1] for each image to the CCR when using 22-length MAMA measurements from each segmented particle from its respective SEM image. Instead, this paper reviews statistical issues in using the 22-length MAMA measurements per particle, and applies ML options to the 22length MAMA measurements per segmented particle, with the ADU, AUC, MDU, and SDU precipitation conditions and the processing pathways involving UO3, U3O8, and UO2, at 400 °C, 800 °Cand 510 °C (in H2) all known (supervised learning).
Data from [2] are 22 morphological measurements produced from MAMA software (morphological analysis for material attribution) such as particle area, aspect ratio, and circularity (Appendix A). Data from [2] can be analyzed per particle or per image. Available data include 10s to 100s of MAMA measurements of particles per image, with a total Example SEM images, such as those used in [1,2], at the listed magnifications (right), for MDU, ADU, SDU, and AUC throughout the 400 • C (UO3), 800 • C (U3O8), and 510 • C (UO2) production pathways. This paper focuses on ML options applied to the 22 MAMA measurements per particle to recognize ADU, AUC, MDU, and SDU precipitation conditions when 1, 2, or 3 pathways are represented in the data. Data from [2] are analyzed to estimate the correct classification rate (CCR) in recognizing ADU, AUC, SDU, and MDU. For completeness, it is noted here that data from [1] are vector representations of images (using unsupervised vector quantized variational autoencoding) either of length 256 or 1000 that are derived from each SEM image [1]. These vector representations of images have been shown to have a CCR of approximately 0.80 depending on the particular data subset used [1]. Because the CCR depends on the particular data subset analyzed, this paper does not directly compare the CCR when using the 256-length or 1000-length vectors from [1] for each image to the CCR when using 22-length MAMA measurements from each segmented particle from its respective SEM image. Instead, this paper reviews statistical issues in using the 22-length MAMA measurements per particle, and applies ML options to the 22-length MAMA measurements per segmented particle, with the ADU, AUC, MDU, and SDU precipitation conditions and the processing pathways involving UO3, U3O8, and UO2, at 400 • C, 800 • Cand 510 • C (in H2) all known (supervised learning).
Data from [2] are 22 morphological measurements produced from MAMA software (morphological analysis for material attribution) such as particle area, aspect ratio, and circularity (Appendix A). Data from [2] can be analyzed per particle or per image. Available data include 10s to 100s of MAMA measurements of particles per image, with a total of  16,19, and 18 images with an average of 46.9, 40.2, and 43.6 particles per image for MDU1, MDU2, MDU3, respectively. The results presented here are by particle because of the limited number of images for each precipitation condition and pathway [1][2][3][4][5].
This paper is organized as follows. Section 2 presents CCR results for nine ML methods. Section 3 relates the multivariate T2 test to LDA and illustrates impacts of model assumptions violations due to having multiple components (pathway, U3O8, UO2, or UO3, at 800 • C. 510 • C and 400 • C) per group (precipitation condition, ADU, ADU, SDU, MDU). Section 4 relates LDA to the multivariate T2 test to assess group separation. Section 5 provides more details regarding ABC in the context of experimental design. Section 6 summarizes this paper.

CCR Results
Results based on the 22 MAMA measurements [2] are given in Table 1 for nine ML options including decision trees, flexible discriminant analysis (FDA), mixture DA (MDA), linear DA (LDA), k-nearest neighbor, approximate Bayesian computation (ABC), learning vector quantization (LVQ), multivariate adaptive regression with splines (MARS), and support vector machines (SVM). When all three pathways are in the data, the CCRs (using 70% of the data to train and 30% of the data to test) for decision trees, MDA, LDA, FDA, k-nearest neighbor, LVQ, SVM, ABC, and MARS, are 0.83, 0.76, 0.74, 0.74, 0.81, 0.75, 0.76, 0.87, and 0.87, respectively. All analyses are done in R [12]. When only pathway 1 is in the data for each of ADU, AUC, SDU, and MDU, the CCRs (again using 70% of the data to train and 30% of the data to test) for decision trees, MDA, LDA, FDA, k-nearest neighbor, LVQ, SVM, ABC, and MARS, are all increased, to 0.93, 0.82, 0.81, 0.81,0.85, 0.87, 0.83, 0.89, and 0.89, respectively. Precipitation conditions SDU and MDU are the most difficult to distinguish, so if SDU and MDU are merged into one group, reducing the number of groups from 4 to 3, then all methods have higher CCRs, some as large at 0.98. ML options 1-9 are described in Appendix B. In addition, LDA is discussed in Section 4, and ABC is discussed in Section 5 because the simulation framework used in ABC is helpful for including various sources of model uncertainty.
It is not surprising that the CCR decreases when all three pathways are represented in the data. Figure 3 plots principal coordinate (PC, [13]) 7 versus PC 4 for particles from the ADU, AUC, SDU, and MDU. PCs are coordinates from which the matrix of pairwise distances in the 22-dimensional space can be closely approximated. In Figure 3, there is qualitative evidence of moderately-good group separation, particularly considering that only two PCs are used (up to 22 PCs are available). Figure 4 is the same as Figure 3, but is for PC1, PC2, and PC4, and illustrates that using three PCs leads to better group separation than using the two PCs in Figure 3. for PC1, PC2, and PC4, and illustrates that using three PCs leads to better group separa tion than using the two PCs in Figure 3.  Methods such as MDA have the potential to recognize clusters within the 4 precip tation conditions and thereby mitigate the decrease in classification accuracy [13,14]. Fig  ure 5 plots the Bayesian information criterion (BIC) used in model-based clustering versu the number of clusters ("components") in the ADU data with all three pathways (U3O8 UO2, UO3) present. There is modest evidence for 3-7 clusters within ADU. Figure 6a plot the case in which mclust is asked to find the best 3-cluster fit in 22 MAMA features in th ADU data (using MDS coordinates 7 and 3 as one good option to display the data in 2 dimensions). The three clusters are somewhat distinct, but the same types of plots for 4 5, 6, or 7 clusters look equally plausible, and there are small sub-clusters in 2 of the clusters. Figure 6b is the "true" clusters as defined by the three pathways. Note in com paring Figure 6a,b that some of U3O8 cases are clustered in the UO2 group. Figure 7 plot the raw MAMA feature 5 (perimeter of convex hull covering the particle) versus feature (particle vector area) without transformation into PCs. Figure 8 plots the raw MAMA fea ture 3 (particle pixel area) versus feature 1 (particle vector area) and feature 2 (area o convex hull covering the particle) without transformation into PCs. Figure 9 plots th ADU, AUC, MDU, SDU data from [1] for UMAP coordinates 1 and 2. The UMAP coord nates are a uniform manifold approximation and projection for dimension reduction which is used to reduce the dimension of either a 256-length vector or a 1000-length vecto used to represent each image that are the derived image-based features from [1] using for PC1, PC2, and PC4, and illustrates that using three PCs leads to better group separation than using the two PCs in Figure 3.  Methods such as MDA have the potential to recognize clusters within the 4 precipitation conditions and thereby mitigate the decrease in classification accuracy [13,14]. Figure 5 plots the Bayesian information criterion (BIC) used in model-based clustering versus the number of clusters ("components") in the ADU data with all three pathways (U3O8, UO2, UO3) present. There is modest evidence for 3-7 clusters within ADU. Figure 6a plots the case in which mclust is asked to find the best 3-cluster fit in 22 MAMA features in the ADU data (using MDS coordinates 7 and 3 as one good option to display the data in 2dimensions). The three clusters are somewhat distinct, but the same types of plots for 4, 5, 6, or 7 clusters look equally plausible, and there are small sub-clusters in 2 of the 4 clusters. Figure 6b is the "true" clusters as defined by the three pathways. Note in comparing Figure 6a,b that some of U3O8 cases are clustered in the UO2 group. Figure 7 plots the raw MAMA feature 5 (perimeter of convex hull covering the particle) versus feature 1 (particle vector area) without transformation into PCs. Figure 8 plots the raw MAMA feature 3 (particle pixel area) versus feature 1 (particle vector area) and feature 2 (area of convex hull covering the particle) without transformation into PCs. Figure 9 plots the ADU, AUC, MDU, SDU data from [1] for UMAP coordinates 1 and 2. The UMAP coordinates are a uniform manifold approximation and projection for dimension reduction, which is used to reduce the dimension of either a 256-length vector or a 1000-length vector used to represent each image that are the derived image-based features from [1] using a Methods such as MDA have the potential to recognize clusters within the 4 precipitation conditions and thereby mitigate the decrease in classification accuracy [13,14]. Figure 5 plots the Bayesian information criterion (BIC) used in model-based clustering versus the number of clusters ("components") in the ADU data with all three pathways (U3O8, UO2, UO3) present. There is modest evidence for 3-7 clusters within ADU. Figure 6a plots the case in which mclust is asked to find the best 3-cluster fit in 22 MAMA features in the ADU data (using MDS coordinates 7 and 3 as one good option to display the data in 2-dimensions). The three clusters are somewhat distinct, but the same types of plots for 4, 5, 6, or 7 clusters look equally plausible, and there are small sub-clusters in 2 of the 4 clusters. Figure 6b is the "true" clusters as defined by the three pathways. Note in comparing Figure 6a,b that some of U3O8 cases are clustered in the UO2 group. Figure 7 plots the raw MAMA feature 5 (perimeter of convex hull covering the particle) versus feature 1 (particle vector area) without transformation into PCs. Figure 8 plots the raw MAMA feature 3 (particle pixel area) versus feature 1 (particle vector area) and feature 2 (area of convex hull covering the particle) without transformation into PCs. Figure 9 plots the ADU, AUC, MDU, SDU data from [1] for UMAP coordinates 1 and 2. The UMAP coordinates are a uniform manifold approximation and projection for dimension reduction, which is used to reduce the dimension of either a 256-length vector or a 1000-length vector used to represent each image that are the derived image-based features from [1] using a vector quantized variational encoder. The CCR from the UMAP coordinates is approximately 0% to 10% smaller than those presented here for the MAMA features from [2], depending on the data subsets analyzed. vector quantized variational encoder. The CCR from the UMAP coo mately 0% to 10% smaller than those presented here for the MAMA depending on the data subsets analyzed. Figure 5. The BIC versus the number of components for ADU with 3 pathwa (U3O8, UO2, UO3). The legend describes different modeling assumptions reg iance matrix within each cluster, where "E" denotes "equal", "I" denotes "i pendent MAMA components), and "V" denotes "varying" features of the co order volume, shape, orientation in an eigenvector decomposition. For exam volume, varying shape, and identity orientation among the groups, and VVV ume, shape, and orientation among the groups.  . The BIC versus the number of components for ADU with 3 pathways in the training data (U3O8, UO2, UO3). The legend describes different modeling assumptions regarding the data covariance matrix within each cluster, where "E" denotes "equal", "I" denotes "identity" matrix (independent MAMA components), and "V" denotes "varying" features of the covariance matrix in the order volume, shape, orientation in an eigenvector decomposition. For example, EVI denotes equal volume, varying shape, and identity orientation among the groups, and VVV denotes varying volume, shape, and orientation among the groups. Figure 5. The BIC versus the number of components for ADU with 3 pathwa (U3O8, UO2, UO3). The legend describes different modeling assumptions reg iance matrix within each cluster, where "E" denotes "equal", "I" denotes "i pendent MAMA components), and "V" denotes "varying" features of the co order volume, shape, orientation in an eigenvector decomposition. For exam volume, varying shape, and identity orientation among the groups, and VVV ume, shape, and orientation among the groups.   The BIC versus the number of components for ADU with 3 pathwa (U3O8, UO2, UO3). The legend describes different modeling assumptions reg iance matrix within each cluster, where "E" denotes "equal", "I" denotes "id pendent MAMA components), and "V" denotes "varying" features of the co order volume, shape, orientation in an eigenvector decomposition. For examp volume, varying shape, and identity orientation among the groups, and VVV ume, shape, and orientation among the groups.    . The raw MAMA feature 3 (particle pixel area) versus feature 1 (particle vector are feature 2 (area of convex hull covering the particle) without transformation into PCs for ADU

Multivariate t-Test
If two groups of normally-distributed data have the same variance and possibl ferent means then the well-known univariate t statistic where ̅ is the sample mean of group 1, ̅ is the sample mean of group 2, and poo {∑ ( − ̅ ) + ∑ ( − ̅ ) } is distributed as (at dis tion with + − 2 degrees of freedom. Therefore, the value of as comput Equation (1) is compared to the quantiles of the t distribution with degre freedom to test whether the mean of group 1 is different from the mean of gr [15].
The precipitation condition could define group 1 as ADU and group 2 as AUC other production, aging, and/or storage conditions could be varied to evaluate ho bustly and reliably the ADU data could be distinguished from the AUC data. If th Figure 8. The raw MAMA feature 3 (particle pixel area) versus feature 1 (particle vector area) and feature 2 (area of convex hull covering the particle) without transformation into PCs for ADU. Figure 8. The raw MAMA feature 3 (particle pixel area) versus feature 1 (pa feature 2 (area of convex hull covering the particle) without transformation in

Multivariate t-Test
If two groups of normally-distributed data have the same varian ferent means then the well-known univariate t statistic Equation (1) is compared to the quantiles of the t distribution with freedom to test whether the mean of group 1 is different from the [15]. The precipitation condition could define group 1 as ADU and g

Multivariate t-Test
If two groups of normally-distributed data have the same variance and possibly different means then the well-known univariate t statistic where x 1 is the sample mean of group 1, x 2 is the sample mean of group 2, and s 2 pooled } is distributed as t n 1 +n 2 −2 (at distribution with n 1 + n 2 − 2 degrees of freedom. Therefore, the value of t as computed in Equation (1) is compared to the quantiles of the t distribution with t n 1 +n 2 −2 degrees of freedom to test whether the mean µ 1 of group 1 is different from the mean µ 2 of group 2 [15].
The precipitation condition could define group 1 as ADU and group 2 as AUC and other production, aging, and/or storage conditions could be varied to evaluate how robustly and reliably the ADU data could be distinguished from the AUC data. If the two groups cannot be assumed to have the same variance, then another version of the t statistic, such as Welch's t [15], can be used. The normality assumption is not crucial, partly because the sample means x 1 and x 2 are usually approximately normal in distribution due to the central limit effect [15].
If the data are from the two groups are more than one dimension (a vector rather than a scalar), as for example in Figure 8, then Hotelling's multivariate T statistic [16] generalizes Equation (1) to: Figure 10 plots simulated group 1 and group 2 data for p = 2, with the true and estimated (from 100 observations per group) linear decision boundaries. The correlation is 0.5, and the true mean shift is 3 units in the direction of x1 only (a 3 unit mean shift to the east, and the standard deviation in both groups is (1).
Algorithms 2021, 14, x FOR PEER REVIEW If the data are from the two groups are more than one dimension (a than a scalar), as for example in Figure 8, then Hotelling's multivariate T sta eralizes Equation (1)    ) . The R programming language is used for all figure and calculations [12].
Not surprisingly, if the mean shift between group 1 and group 2 is shift of only a subset of p variables, then the probability to detect the mean as Figure 11. The estimated pdf of simulated values of from Equation (2) and th analytical pdf, ) .  Figure 11 plots the estimated probability density function (pdf) of simulated values of T 2 from Equation (2) and the corresponding analytical probability density, (n 1 +n 2 −2)p (n 1 +n 2 −p−1) F p,(n 1 +n 2 −p−1) . The R programming language is used for all figures, simulations, and calculations [12]. Figure 10 plots simulated group 1 and group 2 data for p = 2, with the mated (from 100 observations per group) linear decision boundaries. The 0.5, and the true mean shift is 3 units in the direction of x1 only (a 3 unit m east, and the standard deviation in both groups is (1).  ) . The R programming language is used for all figure and calculations [12].
Not surprisingly, if the mean shift between group 1 and group 2 is shift of only a subset of p variables, then the probability to detect the mean as  Not surprisingly, if the mean shift between group 1 and group 2 is due to a mean shift of only a subset of p variables, then the probability to detect the mean shift decreases as. If µ 1 = µ 2 then T 2 ∼ (n 1 +n 2 −2)p (n 1 +n 2 −p−1) F p,(n 1 +n 2 −p−1;δ) , where the non-centrality parameter δ of the F distribution is: If δ in Equation (3) is fixed, for example, at δ = 5, and the number of variables (predictors) p increases then the non-centrality parameter δ remains fixed, but p increases, which means that a mean shift from µ 1 to µ 2 becomes more difficult to detect. To illustrate, Figure 12 plots the detection probability to detect µ 1 = µ 2 as p increases for δ = 5. The MAMA software currently provides 22 particle measurements (such as convex hull area, pixel area, vector perimeter, ellipse perimeter, ellipse aspect ratio, circularity, etc.), some of which might not be useful to distinguish among groups, and Figure 12 indicates that using useless measurements can decrease the CCR. Therefore, Section 5 on ABC investigates whether variable screening based on t statistic values can increase the CCR. lgorithms 2021, 14, x FOR PEER REVIEW If in Equation (3) is fixed, for example, at = 5, and the number of dictors) p increases then the non-centrality parameter remains fixed, b which means that a mean shift from to becomes more difficult to d trate, Figure 12 plots the detection probability to detect as p increa The MAMA software currently provides 22 particle measurements (such area, pixel area, vector perimeter, ellipse perimeter, ellipse aspect ratio, ci some of which might not be useful to distinguish among groups, and Figu that using useless measurements can decrease the CCR. Therefore, Section vestigates whether variable screening based on t statistic values can increas  Figure 13 plots one example of the impact of nonnormality on the pdf ure 13, p = 2, and the pdf of T2 is plotted for normal data (with correlation 0 n2 = 20) and for lognormal data, also with a correlation 0.60. The analytic same as in Figure 11 for normal data. In this case, there is noticeable effect o ity on the pdf of T2. Figure 14 is the same as Figure 13 but for p = 15. Figure  as Figure 13, p = 2, and the pdf of T2 is plotted for normal data (with correlation 0.60 and n 1 = n 2 = 20) and for lognormal data, also with a correlation 0.60. The analytical result is the same as in Figure 11 for normal data. In this case, there is noticeable effect of non-normality on the pdf of T2. Figure 14 is the same as Figure 13 but for p = 15. Figure 15 is the same as Figure 14, but for and n 1 = n 2 = 200. The approach to approximate normality is somewhat slower as p increases. (3) is fixed, for example, at = 5, and the number of dictors) p increases then the non-centrality parameter remains fixed, b which means that a mean shift from to becomes more difficult to d trate, Figure 12 plots the detection probability to detect as p incre The MAMA software currently provides 22 particle measurements (such area, pixel area, vector perimeter, ellipse perimeter, ellipse aspect ratio, ci some of which might not be useful to distinguish among groups, and Figu that using useless measurements can decrease the CCR. Therefore, Section vestigates whether variable screening based on t statistic values can increa  Figure 13 plots one example of the impact of nonnormality on the pdf ure 13, p = 2, and the pdf of T2 is plotted for normal data (with correlation n2 = 20) and for lognormal data, also with a correlation 0.60. The analytic same as in Figure 11 for normal data. In this case, there is noticeable effect o ity on the pdf of T2. Figure 14 is the same as Figure 13 but for p = 15. Figure  as

LDA
The multivariate T2 statistic uses the statistical distance between two and leads to a simple and reasonably robust test, for example, of whether t mean is different from the AUC mean. Linear discriminant analysis (LDA) tical distance from a test observation to each of the ADU and AUC group m the test observation to the ADU or AUC group. LDA is one of the oldest simplest pattern recognition methods ("supervised learning" in ML jargon) a group membership to a test case based on training data that has known ments. Figure 16 plots simulated group 1 and group 2 data for p = 2, with the mated (from 100 observations per group) linear decision boundaries. The 0.5, and the true mean shift is 3 units in the direction of x1 only (a 3-unit m east, and the standard deviation in both groups is 1). The mean vector of g

LDA
The multivariate T2 statistic uses the statistical distance between two and leads to a simple and reasonably robust test, for example, of whether th mean is different from the AUC mean. Linear discriminant analysis (LDA) tical distance from a test observation to each of the ADU and AUC group m the test observation to the ADU or AUC group. LDA is one of the oldest simplest pattern recognition methods ("supervised learning" in ML jargon) a group membership to a test case based on training data that has known ments. Figure 16 plots simulated group 1 and group 2 data for p = 2, with the mated (from 100 observations per group) linear decision boundaries. The 0.5, and the true mean shift is 3 units in the direction of x1 only (a 3-unit m east, and the standard deviation in both groups is 1). The mean vector of g and the mean vector of group 2 is (3,0). By appealing to likelihood ratio th rule, the true linear decision boundary is computed using ( − ) Σ mated linear decision boundary is computed using ( ̅ − ̅ ) Σ wher Figure 15. The same as Figure 14, but for n 1 = n 2 = 200.

LDA
The multivariate T2 statistic uses the statistical distance between two sample means and leads to a simple and reasonably robust test, for example, of whether the ADU group mean is different from the AUC mean. Linear discriminant analysis (LDA) uses the statistical distance from a test observation to each of the ADU and AUC group means to assign the test observation to the ADU or AUC group. LDA is one of the oldest [12,13,15] and simplest pattern recognition methods ("supervised learning" in ML jargon) used to assign a group membership to a test case based on training data that has known group assignments. Figure 16 plots simulated group 1 and group 2 data for p = 2, with the true and estimated (from 100 observations per group) linear decision boundaries. The correlation is 0.5, and the true mean shift is 3 units in the direction of x1 only (a 3-unit mean shift to the east, and the standard deviation in both groups is 1). The mean vector of group 1 is (0,0) and the mean vector of group 2 is (3,0). By appealing to likelihood ratio theory, or Bayes rule, the true linear decision boundary is computed using (µ 1 − µ 2 ) T Σ −1 X and the estimated linear decision boundary is computed using (x 1 − x 2 ) T Σ −1 X where X = (x 1 , x 2 ) [13,14,16]. Similarly to using the non-centrality parameter in Section 3 to quantify the e mean shift from to , the total probability of misclassification (TPM) by L probability that a group 1 item is classified as group 2, or a group 2 item is clas group 1, TPM = 1−CCR), is given by the cumulative normal probability TPM = where δ = ( − ) Σ ( − ) 16 . For the data in Figure 16, the true TPM and the estimated TPM is 0.045. Figure 16 plots the pdf for the estimated TPM b 105 sets of 100 simulated observations per group, both for normal data and log data.
Of course, real data are rarely so close in distribution to multivariate normal For example, Figure 17 plots 25 simulated observations from each of 4 cluster clusters 1 and 2 belong to group 1 and clusters 3 and 4 belong to group 2. ML m such as learning vector quantization (LVQ [14]) or mixture discriminant analysi [14]) training are effective for this type of data. However, with more than p = 2 pr it is rarely so evident that clusters are present, so LDA, LVQ, and MDA are co MDA assigns a test case to the nearest cluster and corresponding group. Altho data rarely closely follow the LDA assumptions that the data are MVN with t covariance matrix and possibly different means, LDA sometimes has a competit compared to other ML options such as MDA and LVQ. Similarly to using the non-centrality parameter in Section 3 to quantify the effect of a mean shift from µ 1 to µ 2 , the total probability of misclassification (TPM) by LDA (the probability that a group 1 item is classified as group 2, or a group 2 item is classified as group 1, TPM = 1−CCR), is given by the cumulative normal probability [16]. For the data in Figure 16, the true TPM = 0.042 and the estimated TPM is 0.045. Figure 16 plots the pdf for the estimated TPM based on 105 sets of 100 simulated observations per group, both for normal data and log normal data.
Of course, real data are rarely so close in distribution to multivariate normal (MVN). For example, Figure 17 plots 25 simulated observations from each of 4 clusters, where clusters 1 and 2 belong to group 1 and clusters 3 and 4 belong to group 2. ML methods such as learning vector quantization (LVQ [14]) or mixture discriminant analysis (MDA [14]) training are effective for this type of data. However, with more than p = 2 predictors, it is rarely so evident that clusters are present, so LDA, LVQ, and MDA are compared. MDA assigns a test case to the nearest cluster and corresponding group. Although real data rarely closely follow the LDA assumptions that the data are MVN with the same covariance matrix and possibly different means, LDA sometimes has a competitive CCR compared to other ML options such as MDA and LVQ. Similarly to using the non-centrality parameter in Section 3 to quantify the e mean shift from to , the total probability of misclassification (TPM) by L probability that a group 1 item is classified as group 2, or a group 2 item is clas group 1, TPM = 1−CCR), is given by the cumulative normal probability TPM = where δ = ( − ) Σ ( − ) 16 . For the data in Figure 16, the true TPM and the estimated TPM is 0.045. Figure 16 plots the pdf for the estimated TPM b 105 sets of 100 simulated observations per group, both for normal data and log data.
Of course, real data are rarely so close in distribution to multivariate normal For example, Figure 17 plots 25 simulated observations from each of 4 clusters clusters 1 and 2 belong to group 1 and clusters 3 and 4 belong to group 2. ML m such as learning vector quantization (LVQ [14]) or mixture discriminant analysi [14]) training are effective for this type of data. However, with more than p = 2 pr it is rarely so evident that clusters are present, so LDA, LVQ, and MDA are co MDA assigns a test case to the nearest cluster and corresponding group. Altho data rarely closely follow the LDA assumptions that the data are MVN with t covariance matrix and possibly different means, LDA sometimes has a competit compared to other ML options such as MDA and LVQ.  Note that due to processing, aging, and/or storage conditions, clustering, such as in Figure 17, could occur and performance claims regarding how distinct group 1 is from group 2 depend on the relative frequency of each cluster. For example, cluster 4 in group 2 is quite distinct from all of group 1, but cluster 3 from group 2 is not distinct from cluster 2 of group 1. Therefore, performance claims regarding group separability could depend on the relative frequencies of the clusters in the training and testing data. Applying leave-oneout cross validation (CV) to 103 simulated observations in each of the four clusters in each group, the estimated TPM using either LDA or MDA is 0. Simply assuming MVN data in each group without clustering and applying TPM = Φ −δ 2 , where leads to an estimated TPM = 0.08, with δ 2 = 7.61 in Equation (4), describing the T2 based measure of group separation. Somewhat surprisingly, the estimate 0.08 closely agrees with 1 − CCR = 0.08 (recall from previous paragraphs that CCR = 0.93 if clustering is ignored) in this case because Σ is badly estimated due to ignoring the clustering within each group. In the case with the probability of 1, 2, 3, 4, 5, 6, or 7 clusters in each group calculated from BIC leads to an estimated TPM = 0.13 (CCR = 0.87, in good agreement with the CCR of 0.88 from the previous paragraph), with δ 2 = 5.12 in Equation (4) describing the T2-based measure of group separation. Again, somewhat surprisingly, because Σ is badly estimated due to ignoring the clustering within each group, the CCR is well estimated even if clustering is ignored. The true simulated distribution is a mixture of MVN clusters within each of the two groups, with either four clusters or a random number of clusters from 1 to 7 as modeled using BIC as estimated using mclust applied to the real data with ADU = group 1 and AUC = group 2. Real data can often be modeled effectively using mixtures, including mixtures of MVN distributions [17]; however, ignoring the mixture behavior and assuming one cluster per group can often lead to competitive performance.
Recall that LDA assumes one MVN component per group while MDA allows one or more MVN components per group.

ABC
Because ABC is a versatile and robust option for many inverse problems [18][19][20][21][22][23], ABC is described in detail here. ABC can be used to evaluate candidate experimental designs, accommodate model uncertainty such as how many clusters could be present in each precipitation condition (ADU, AUC, SDU, MDU), and to provide a model-based summary of the data.
In any Bayesian approach [24], prior information regarding the magnitudes and/or relative magnitudes of any model parameter(s) can be provided. If the prior is "conjugate" for the likelihood, then the posterior is in the same likelihood family as the prior, in which case analytical methods are available to compute posterior prediction intervals for quantities of interest. So that a wide variety of priors and likelihoods can be accommodated, modern Bayesian methods do not rely on conjugate priors, but use numerical methods to obtain samples from approximate posterior distributions [24]. For numerical methods, such as Markov chain Monte Carlo [24], the user specifies a prior distribution (which need not be normal), for the parameters to be inferred, such as processing and aging conditions. The user must also specify a likelihood for the data given the model parameters. In contrast, ABC does not require a likelihood for the data; and, as in any Bayesian approach, ABC accommodates constraints on variances through prior distributions [18][19][20][21][22][23][24].
No matter what type of Bayesian approach is used, a well-calibrated Bayesian approach is defined here as an approach that satisfies several requirements. One calibration requirement is that in repeated applications of ABC for classification the actual correct classification probability should be closely approximated by the predicted correct classification probability. For example, if the average probability of the selected model is 0.90, then the actual CCR should be approximately 0.90. In applications of ABC for estimating a continuous-valued parameter ("regression"), one requirement is that in repeated applications of ABC, approximately 95% of the middle 95% of the posterior distribution for the parameter should contain the true value. That is, the actual coverage should be closely approximated by the nominal coverage. The nominal coverage is obtained by using the quantiles of the estimated posterior. For example, the 0.005 and 0.995 quantiles define the lower and upper limits of a 99% probability interval. A second requirement is that the true mean squared error (MSE) of the ABC-based estimate of the parameter should be closely approximated by the variance of the ABC-based posterior distribution of that parameter [22].
Denote the real data as y. Inference using ABC can be summarized as follows: (1) Sample θ from the prior θ ∼ f prior (θ) (2) Simulate data y from the model y ∼ P(y|θ) (3) If the distance d(S(y ), S(y)) ≤ ε, then accept θ as an observation from f posterior (θ y) Repeat steps 1-3 many times, usually 105 or more times. In step (2), the model can be analytical or, for example, a computer model. In ABC, the model has input parameters θ, has output data y ∼ P(y|θ), and there is corresponding real data yobs. For the example below, the forward model has 1 parameter (the group membership, 1, 2, 3, or 4 for ADU, AUC, SDU, and MDU, respectively).
Synthetic data are generated from the model for many trial values of θ, and trial θ values are accepted as contributing to the estimated posterior distribution for θ | y obs if the distance d(y obs , y(θ)) between y obs and y(θ) is reasonably small. Alternatively, for most applications, it is necessary to reduce the dimension of y obs to a small set of summary statistics θ and accept trial values of θ if d(S(y obs ), S(y(θ)) < ε, where ε is a user-chosen threshold.
Here, for example, y obs is the 22 MAMA morphology measurements or transformations of them. Because trial values of θ are accepted if if d(S(y obs ), S(y(θ)) < ε, an approximation error to the posterior distribution arises that several ABC options [20,21] attempt to mitigate. Such options weight the accepted θ values by the actual distance d(S(y obs ), S(y(θ)) (abctools package in R [12]). In practice, the true posterior distribution is not known, so a method such as that in [22] using calibration checks, is needed in order to choose an effective value of the threshold ε.
ABC is used here for two main reasons: (A) ABC allows for comparison of the performance of candidate summary statistics such as the mean or quantiles of the particle attributes such as size and shape. (B) ABC allows for easy experimentation with the effects of EDA to choose a model and to include uncertainty in the estimated forward model (such as the number of mixture components in each of ADU, AUC, SDU, MDU) used to perform ABC.

ABC for ADU, AUC, SDU, MDU
ABC requires a forward model with source terms (such as group membership being ADU, AUC, SDU, or MDU) as inputs and observables such as any subset of the 22 MAMA features as outputs. For a test set of outputs, the most likely inputs can be inferred using steps 1, 2, and 3. The forward model can be simple or complicated, and can include empirical adjustments based on training data. For example, empirical measurements (training data) can be used to select a model (such as the number of mixture components in MDA) and estimate the chosen model parameters. Uncertainty arising from the model selection step [25] is often ignored; however, the effects of model selection can be easily including via ABC (Section 5.3).
This section uses subsets of the MAMA measurements as the model output and assigns the group membership as 1, 2, 3, or 4, for ADU, AUC, SDU, and MDU, respectively. The CCR of 0.87 in Table 1 corresponds to the choice of MAMA features 1-9, 11, 21, 22 (Vector area (µm 2 ), Convex hull area (µm 2 ), Pixel area (µm 2 ), Vector perimeter (µm), Convex hull perimeter (µm), Ellipse perimeter (µm), ECD (µm), Major ellipse (µm), Minor ellipse (µm), Diameter aspect ratio, Grayscale mean, Gradient mean). Features 1-9, 11, 21, and 22 were chosen because these 12 features had the largest ratio of between-group variance to the within-group variance (as is done in LDA). The average probability assigned by ABC to the inferred group membership is 0.85, which is acceptably close to 0.87. If instead all 22 MAMA features are used, the CCR of ABC is approximately 0.82 instead of 0.87.
Note that this implementation of ABC for classification uses the actual data rather than an explicit forward model; therefore, this implementation of ABC is similar to a smooth version of k-nn in that it is based on using weighted distances from the test case to the training cases. In ABC applications below, the data is used to fit a model and uncertainty due to model selection [25] and parameter estimation is included.

ABC for the 3 Pathways in ADU, AUC, SDU, MDU
The same procedure as in Section 5.1 can be repeated for training data that includes only one of the three pathways (U3O8, UO2, or UO3, at 400 • C, 510 • C and 800 • C, for each of the four precipitation conditions. Recall from Table 1 that the nine ML methods each have larger CCR for the one-pathway per group training data (for any of the three pathways) than for the three-pathways per group training.

Model Uncertainty Accommodated by ABC for Inverse Problems in Forensics
Suppose that a model such asŷ =β 0 +β 2 x 1 +β 2 x 2 +β 3 x 2 1 +β 4 x 2 2 +β 5 x 1 x 2 is fit to training data for one MAMA feature (or a transformation of several MAMA features), where for example x 1 is processing temperature and x 2 is humidity (not available in the data from [2] used here but experiments are ongoing that will have other processing conditions). The ABC framework is useful to evaluate the fitted forward model's performance in the inverse problem consisting of using several responses y 1 , y 2 . . . , y q to infer the value of x 1 , x 2 . . . , x p . Design of experiments has traditionally focused on choosing effective values of x 1 , x 2 . . . , x p that lead to low prediction error variance in each of y 1 , y 2 . . . , y q , such as I-optimal designs [26][27][28][29][30][31]. A key task in nuclear forensics is the inverse problem, using future measured values of y 1 , y 2 . . . , y q to infer the corresponding values of x 1 , x 2 . . . , x p . However, it is expected that good values of x 1 , x 2 . . . , x p in training/calibration data that lead to competitively small prediction error variance in y 1 , y 2 . . . , y q will also be good values for performing the inverse [26][27][28][29][30][31], because uncertainty in the fitted forward model is minimized subject to resource constraints (number of experiments). ABC is a good framework for including model uncertainty in addition to model parameter estimation uncertainty, as illustrated below.
A simulated example below uses q = 5 and p = 3 continuous-valued variables such as temperature, humidity and concentration of impurities. The ABC output (as in any Bayesian approach) is the estimated posterior for each of x 1 , x 2 . . . , x p . Each candidate experimental design [26][27][28][29][30][31] such as central composite and Box-Behnken for fitting quadratic models can be evaluated in the ABC framework. The experimental design having the narrowest (smallest standard deviation) posterior probability density function (pdf) for x 1 (or any other processing parameter) as the least uncertainty, provided the ABC-based approximation is sufficiently accurate. Simulations such as those used below indicate that ABC is accurate, so ABC-based posterior widths corresponding to candidate experimental designs can be compared [26][27][28][29][30][31]. Figure 18 plots an example of the ABC-based estimated posterior pdf of x 1 for nominal measurement error standard deviations (SDs) in x 1 , x 2 , x 3 and in y 1 , y 2 , y 3 , y 4 , y 5 and for two times the nominal standard deviations. The SDs include measurement error and modeling error effects. Note from Figure 18 that measurement errors in the factors ("predictors") leads to a mean shift when the SDs are multiplied by a factor of 2. Figure 18 also includes a well-chosen experimental design (central-composite design [28,29]) and a randomly chosen experimental design that has non-zero correlations among the 3 predictors. The posterior width for a few other experimental designs were also compared [28,29]. For example, the posterior standard deviation for x 1 is 0.023, 0.053, 0.028, 0.024, 0.023, 0.021 for a full 3-level factorial for x 1 , x 2 , x 3 with 33 runs plus 5 repeated runs at the central position (27 + 5 = 32 runs), then the same full factorial but with two times the nominal standard deviations in the measurements of x 1 , x 2 , x 3 and in y 1 , y 2 , y 3 , y 4 , y 5 , then a random reordering of each of the design-matrix columns (a random design), a Box-Behnken design with 16 runs, a central composite design with 20 runs, and a central composite design with standard deviations in the measurements of , , and in , , , , , then a ran dom reordering of each of the design-matrix columns (a random design), a Box-Behnken design with 16 runs, a central composite design with 20 runs, and a central composite design with 20 runs but with a linear fit only, respectively. Because of the quadratic and interaction terms in the model, the full factorial design does not have uncorrelated pre dictors, which is one reason the Box-Behnken or central composite designs lead to a sim ilar posterior standard deviation for as does the full factorial design with more runs (32 runs compared to 20 runs).  for the central composite design and for the ran dom design. Recall that ABC is based on an approximate method, but the two ABC cali bration checks for a continuously-valued parameter such as indicate that ABC is wel calibrated with a choice of ε = 0.01. The two checks are that the nominal posterior intervals coverages agree with the actual coverages at 0.90, 095, and 0.99 levels, and the RMSE is well estimated by the posterior standard deviation. Therefore, the posterior standard de viations are good indications of inference quality and are a good basis to compare candi date experimental designs. In this case, the central composite design or the Box-Behnken designs are better than the full factorial because they achieve similar posterior width with fewer runs.
As pointed out in [27], ignoring estimation error in model parameters, one could use plug-in estimates , … , of forward model parameters to compute the estimate For example, if = + + + + + then the esti mate , , , , , are plugged into , ignoring their uncertainty, and ignoring the model selection process. Note the similarity of the plug-in approach to ABC in that ABC also seeks plausible values of { , , } that lead to small distance ∑ ( − ) . How ever, the ABC framework makes it simple to include the effects of estimation error in  Recall that ABC is based on an approximate method, but the two ABC calibration checks for a continuously-valued parameter such as x 1 indicate that ABC is well calibrated with a choice of ε = 0.01. The two checks are that the nominal posterior intervals coverages agree with the actual coverages at 0.90, 095, and 0.99 levels, and the RMSE is well estimated by the posterior standard deviation. Therefore, the posterior standard deviations are good indications of inference quality and are a good basis to compare candidate experimental designs. In this case, the central composite design or the Box-Behnken designs are better than the full factorial because they achieve similar posterior width with fewer runs.
As pointed out in [27], ignoring estimation error in model parameters, one could use plug-in estimatesβ 0 ,β 1 . . . ,β 5 of forward model parameters to compute the estimatê For example, ifŷ =β 0 +β 2 x 1 +β 2 x 2 +β 3 x 2 1 +β 4 x 2 2 +β 5 x 1 x 2 then the estimatê β 0 ,β 1 ,β 2 ,β 3 ,β 4 ,β 5 are plugged intoŷ, ignoring their uncertainty, and ignoring the model selection process. Note the similarity of the plug-in approach to ABC in that ABC also seeks plausible values of {x 1 ,x 2 ,x 3 } that lead to small distance ∑ q j=1 y j −ŷ j 2 . However, the ABC framework makes it simple to include the effects of estimation error in model parameter estimates and to include the effects of model selection. Regarding model selection, a key assumption in experimental design is that the model form such as y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 2 1 + β 4 x 2 2 + β 5 x 1 x 2 + error is known in advance and then good values of x 1 and x 2 can be calculated for minimizing prediction errors in y (such as the I-optimal or the Box-Behnken designs [27][28][29]). In practice, if EDA suggests that some model form (perhaps nonlinear in the parameters or a segmented regression) is more appropriate than a low-order quadratic, then the strategy in [27] of using plug-in estimatesβ 0 ,β 1 . . . ,β 5 of forward model parameters to compute the esti-mateX = {x 1 ,x 2 ,x 3 } = argmin x ∑ q j=1 y j −ŷ j 2 in Equation (5) should be modified to account for the model selection step [25], as illustrated in the next two paragraphs in the classification context.
In practice, if EDA suggests that some model form (perhaps nonlinear in the parameters or a segmented regression) is more appropriate than a low-order quadratic, then the strategy in [27] of using plug-in estimatesβ 0 ,β 1 , . . . ,β 9 as illustrated in the next paragraph in the regression context and in the following two paragraphs in the classification context could be used, but it ignores model selection effects.
For ABC-based model selection effects in the regression context, first a model form such x 2 x 3 + e is specified in the case of p = 3 continuous predictors x 1 , x 2 , x 3 , for each of q = 5 measured responses (such as 5 of the 22 MAMA measurements, or transformations of the MAMA measurements). Standard design of experiments assumes that the model form is known in advance, and effective training values of x 1 , x 2 , x 3 are selected in order to have good predictions of future y values at future x 1 , x 2 , x 3 values. In practice, EDA can suggest other candidate models and to keep this example short and simple, the impact of model selection will be illustrated when the true model is either (1) a 4-parameter model having only linear terms (y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + e) without interactions such as β 7 x 1 x 2 , or (2) is the full 10-parameter model model with parameters { β 0 , β 1 , . . . β 9 }. ABC will then be implemented assuming the 4-parameter model, the 10-parameter, or using BIC to choose between the two models. ABC is implemented by simulating possible x 1 , x 2 , x 3 values and corresponding y 1 , y 2 . . . , y q values for each set of { β 0 , β 1 , . . . β 9 } values simulated from their respective prior probability distributions. The measurement error variances in x 1 , x 2 , x 3 and y 1 , y 2 . . . , y q are assumed here to have been well estimated by using small repeatability studies. The BIC for comparing regression models (1) and (2) fit to the same data is given by where RSS in Equation (6) is the residual sum of squares RSS = ∑ n i=1 (y i −ŷ i ) 2 , n is the number of observations (n = 20 for the central composite design as used to generate Figure 18 as in this example) and p is the number of model parameters (p = 10 for model 1 and p = 4 for model 2). Then for a randomly-generated test case, the ABC-based posterior standard deviations (SDs, repeatable to ±0.001) for x 1 , x 2 , x 3 are 0.046, 0.050, 0.064, respectively, if model 1 is the true and assumed model; the SDs are 0.054, 0.058, 0.077, respectively, if model 1 is the true and model 2 is the chosen model, and the SDs are 0.068, 0.043, 0.059, respectively, if model 1 is the true model and the model selection process is included in the training data for ABC, so model 1 is assumed with relative probability P(M 1 ) ∝ e − BIC 1 2 , which has an average value over 105 simulations of approximately 0.65 for each of the 5 responses in this example. The model selection process can be included in ABC by simulating from model 1 with 10 estimated parameters with the BIC-based model relative probability and by simulating from model 2 with relative probability P(M 2 ) ∝ e − BIC 2 2 . Such a simulation procedure is a representation of our state of knowledge after the n = 20 run central composite design, such as a BIC-based approximate 65% probability that model 1 is more appropriate than model 2 when model 1 is the correct model. For the same simulated test case as above, having true values x 1 = 0.71, x 2 = −0.99, x 3 = −0.10, Figure 19  model, and (c) BIC-based model selection chooses between model 1 and 2 in the ABC training data. It should be noted that the calibration checks indicate that ABC is working the actual coverages of 0.99, 0.95, and 0.90 are the same as the nominal coverages, and th estimated RMSE is 0.05, in agreement with the observed average RMSE, using threshold ε = 0.01. Although this is a simple model selection example, it illustrates how ABC can accom modate both model selection and parameter estimation effects. Alternatively, [25] used smoothed bootstrap method to assess model selection effects, which were non-negligibl even for the case of selecting among linear, quadratic, or higher order polynomials to fit response, as done here.
For ABC-based model selection effects in the classification context, recall that Figur 5 plots the BIC used in model-based clustering versus the number of clusters ("compo nents") in the ADU data with all 3 pathways (U3O8, UO2, UO3) present. There is modes evidence for 3-7 clusters within ADU. Recall that Figure 6 is a two-dimensional plo of the identified clusters when the mclust function in R is asked to find 4 clusters ABC can be implemented with a fixed number of clusters per group, or with th probability of 1, 2, 3, 4, 5, 6, or 7 clusters per group calculated from the BIC (with the relative model probability assumed to be well approximated by [14 and see below). In the case of distinguishing group 1 (ADU) from group 2 (AUC), the 4 cluster model has an ABC-based CCR estimate of 0.92 and the random-cluster model ha an ABC-based CCR estimate of 0.88. The corresponding actual CCRs in simulated dat are 0.93 and 0.88, in good agreement with prediction, and with the anticipated feature tha the CCR is larger if the assumed forward model has a known number (4) of clusters in each group.
Returning to ABC applied to the 9255 sets of 22 MAMA measurements on particles recall that the CCR in Table 1 is 0.87, which was for the case of using MAMA measure ments {1-9, 23, 24} and using 70% of the data to train and 30% to test. If leave-one-ou cross validation (CV, [14]) is used, the CCR is again 0.87. If model-based clustering (mclus Although this is a simple model selection example, it illustrates how ABC can accommodate both model selection and parameter estimation effects. Alternatively, [25] used a smoothed bootstrap method to assess model selection effects, which were non-negligible even for the case of selecting among linear, quadratic, or higher order polynomials to fit a response, as done here. For ABC-based model selection effects in the classification context, recall that Figure 5 plots the BIC used in model-based clustering versus the number of clusters ("components") in the ADU data with all 3 pathways (U3O8, UO2, UO3) present. There is modest evidence for 3-7 clusters within ADU. Recall that Figure 6 is a two-dimensional plot of the identified clusters when the mclust function in R is asked to find 4 clusters. ABC can be implemented with a fixed number of clusters per group, or with the probability of 1, 2, 3, 4, 5, 6, or 7 clusters per group calculated from the BIC (with the relative model probability assumed to be well approximated by e − BIC 2 [14] and see below). In the case of distinguishing group 1 (ADU) from group 2 (AUC), the 4-cluster model has an ABC-based CCR estimate of 0.92 and the random-cluster model has an ABC-based CCR estimate of 0.88. The corresponding actual CCRs in simulated data are 0.93 and 0.88, in good agreement with prediction, and with the anticipated feature that the CCR is larger if the assumed forward model has a known number (4) of clusters in each group.
Returning to ABC applied to the 9255 sets of 22 MAMA measurements on particles, recall that the CCR in Table 1 is 0.87, which was for the case of using MAMA measurements {1-9, 23, 24} and using 70% of the data to train and 30% to test. If leave-one-out cross validation (CV, [14]) is used, the CCR is again 0.87. If model-based clustering (mclust in R) is use to model the data, and 3 clusters are selected for each of ADU, AUC, SDU, and MDU, then the ABC-based CCR is 0.93 (unrealistically high CCR due to over-simplified forward model in ABC). If instead the BIC-based probability of 1, 2, 3, 4, 5, 6, or 7 clusters is used, so that model uncertainty is accommodated in the forward model used by ABC, then the ABCbased CCR is 0.88, only 0.01 larger than that observed in real data, and lower than the oversimplified 3-clusters-per-group forward model. The BIC-based probability of a candidate model M, such as one with M = 1-7 clusters, is approximated as P(M) = e − BIC 2 , with BIC = −2 log(likelihood of model) + p log(n) where the model likelihood in Equation (7) depends on the assume parametric form and measures how well the model fits the data, p is the number of parameters, and n is the number of observations. The number of parameters p = 2kM + Mk(k−1) where k is the dimension of the data and there are k means, k variances, and k(k − 1)/2 covariances per cluster. There are other ways to approximate the probability of a candidate model, such as the integrated complete-data likelihood, but the point of this example is to illustrate how ABC can accommodate the model selection process, no matter what type of model selection [25] is used. One goal for ABC applications is that the data model used in ABC lead to essentially the same CCR as that from using the real data within the ABC framework, in which case the fitted data model is a convenient model-based summary of the data.

Discussion and Summary
This review paper described statistical issues and approaches in using quantitative morphological measurements such as particle size and shape to infer production conditions. Statistical topics included multivariate T2 tests (Hotelling's T2), design of experiments, and several machine learning (ML) options including decision trees, learning vector quantization neural networks, linear and mixture discriminant analysis, and approximate Bayesian computation (ABC). Particular emphasis was given to ABC because its simulation framework is effective for including effects of model selection and other sources of model uncertainty [25]. Any calibrated Bayesian approach is useful for comparing candidate experimental designs (which are in progress for 2022). For example, Figure 18 plotted the posterior probability distribution for inferred processing temperature in simulated data for a well-chosen experimental design (a full factorial design) and for a randomly-chosen design.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Morphological Analysis of Materials (MAMA), Software Quantification Attributes
Appendix A briefly describes the 22 quantitative morphological attributes or features that are produced from the segmentation of particles within an image. See Appendix A in reference [30] for more details.

1
Vector Area The area of the object calculated using the contour representation, but does not remove any holes. Due to the contour representation passing through the pixels of the perimeter of the object, generally slightly smaller than the pixel area representation.

2
Convex Hull Area This is the area contained within the convex hull (convex polygon outline) of the segmented feature.

3
Pixel Area This is the area of the object which does not include any holes. This metric is calculated by taking the total number of pixels encapsulated by the object boundary and multiplying by the micrometer/pixel scale. 4 Vector Perimeter The perimeter of the object using the object contour, calculates the length of the perimeter by summing the lengths of the segments between subsequent points around the boundary as determined by the cvFindCountours function in the OpenCV software. 5 Convex Hull Perimeter The perimeter of the convex hull of the object. 6 Ellipse Perimeter The calculated perimeter of the best-fitted ellipse around the boundary of the object using the cvFitEllipse function in the OpenCV software. The major and minor ellipse are extracted from these calculations. 7 Equivalent Circular Diameter (ECD) This is the diameter of a perfect circle that has an area equivalent to the pixel area of the segmented feature. ECD = [4 * (Pixel Area)/π)] 1/2 8 Major Ellipse Major ellipse axis calculated from the fitted ellipse from the ellipse perimeter calculations