Factors Affecting Species Identifications of Blow Fly Pupae Based upon Chemical Profiles and Multivariate Statistics

Alternative methods for the identification of species of blow fly pupae have been developed over the years that consist of the analyses of chemical profiles. However, the effect of biotic and abiotic factors that could influence the predictive manner for the tests have not been evaluated. The lipids of blowfly pupae (Cochliomyia macellaria, Lucilia cuprina, Lucilia sericata, and Phormia regina) were extracted in pentane, derivatized, and analyzed by total-vaporization solid phase microextraction gas chromatography-mass spectrometry (TV-SPME GC-MS). Peak areas for 26 compounds were analyzed. Here we evaluated one biotic factor (colonization) on four species of blow flies to determine how well a model produced from lipid profiles of colonized flies predicted the species of flies of offspring of wild-caught flies and found very good species identification following 10 generations of inbreeding. When we evaluated four abiotic factors in our fly rearing protocols (temperature, humidity, pupation substrate, and diet), we found that the ability to assign the chemical profile to the correct species was greatly reduced.


Introduction
Insects make up a large proportion of the environment; therefore, it is not surprising that insects would be present in crime scenes. Insects play an important role in decomposition, therefore they become forensically useful in estimating the postmortem interval [1]. Insects recovered from the body can be used as a "clock" to determine the minimum postmortem interval (PMI MIN ), or the time since the body has been available to the insects, as most species in North America will only lay eggs or larvae on a resource if it is dead [2,3].
The most common fly that will be encountered in casework is the blowfly (Diptera: Calliphoridae). The species of the collected specimens is determined, and some measure of their age-usually the length of the larvae or the life cycle stage of the insects' development-is used to extrapolate PMI MIN , referencing published developmental data (e.g., [4][5][6][7][8][9]). To estimate the PMI MIN , investigators should attempt to collect the oldest insect specimens present at the crime scene, and in some cases, these are pupae. Due to their lack of morphological characters, it is often difficult, if not impossible, to identify As there appear to be qualitative and quantitative differences in the occurrence of these compounds among the various species, the chemical profiles may be useful as a tool for identification and discrimination. Solvent extraction has been the traditional method for the analysis of insect cuticular compounds [24,26,[36][37][38]. The process is straightforward: the insect is placed into a nonpolar solvent, such as methylene chloride, ether, pentane, or hexane, during which time the biological compounds leach into the liquid. Those compounds may then be concentrated and analyzed by gas chromatography-mass spectrometry (GC-MS) or by high performance liquid chromatography (HPLC).
Previous work has been done demonstrating the discriminating power of chemical profiles for blow fly larvae and pupae [26,[39][40][41][42]. Headspace solid phase microextraction (SPME) experiments of a single species (Calliphora vicina) utilized a polydimethylsiloxane/divinyl benzene/carboxen (PDMS/DVB/CAR) SPME fiber to collect hydrocarbons, fatty acids, and sterols from larvae, and a PDMS/CAR SPME fiber to do the same for pupae [39]. The presence and absence of the compounds were noted and the qualitative data was analyzed by agglomerative hierarchical clustering (AHC). The results in this case indicated that the chemical profiles of larvae and pupae were dissimilar, as were the profiles of pupae of differing ages [39].
The purpose of this project was to interrogate the chemical profiles that could be obtained from pupae by solvent extraction followed by total vaporization solid-phase microextraction (TV-SPME) [43]. The advantage of this method over headspace or immersion SPME is the elimination of the need to determine the optimal organic solvent, and that extracts do not need to be filtered reducing buildup and potential contamination of the GC column. Furthermore, TV-SPME can add additional selectivity and sensitivity based on fiber choice, and requires no changes in instrumentation.
Four species of blowfly were investigated: Cochliomyia macellaria, Lucilia cuprina, Lucilia sericata, and Phormia regina. The objectives were to: (i) determine whether the chemical profiles could be used to effectively discriminate between different species; (ii) to determine the extent to which genetic variation affected the chemical profile and the integrity of these determinations; and, (iii) to examine the effects of various abiotic variables on the chemicals extracted from the flies. These scenarios were used to produce a model of blow fly species identifications.

Rearing of Fly Colonies and Collection of Specimens
Colonies of Phormia regina, Cochliomyia macellaria, Lucilia sericata, and Lucilia cuprina were reared through ten successive generations (G). At the G1 and G10 generations, 19 × 100 maggots were transferred into a Dixie cup containing 50 g chicken liver, which were in turn placed inside 19 separate glass mason jars half-filled with a pupation substrate. The mason jars were kept in an incubator maintained at 25 • C temperature and 60% humidity. Seven to ten days were allotted for the maggots to develop through their larval stages. At 1 d following the observation of pupation, all of the pupae were harvested from three randomly-selected jars; these pupae were stored separately in a freezer at −80 • C and classified as Timepoint 1. Samples were collected each additional 24 h following for a total of six time points (18 jars). This data was not used to estimate age, but rather to determine the consistency of the chemical profiles over time to accurately identify the species. The maggots in the 19th jar served as a control, the time of adult emergence was recorded, and the number of adult flies was compared against the number of puparia and unhatched pupae to ensure proper development.
To gauge the chemical effects wrought by changes in temperature, humidity, pupation substrate, and diet, the same experimental design as above was carried out using pupae of G10+ Phormia regina (Table 1). During this set of experiments, pupae were likewise collected over six timepoints.

Instrumental Analysis
A 6890 gas chromatograph coupled to a 5975 mass spectrometer (Agilent, Santa Clara, CA, USA) served as the principal instrument, with autosampler functionality provided by an MPS2 (Gerstel, Mülheim an der Ruhr, Germany). The column was a J&W DB-5ms (30 m × 0.25 mm × 0.25 µm). All GC-MS analyses utilized H 2 carrier gas with a flow rate of 2.5 mL/min operated in splitless mode, with a scan range of m/z 40-550.
For each timepoint, three haphazardly selected pupae from each of the three jars were thawed and extracted in 1 mL pentane. After 4 day, the extracted solution was siylated and submitted to GC-MS analysis via the TV-SPME method described [43]. This method provided excellent sensitivity an order of magnitude greater than what could be achieved by liquid injection, allowing for the analysis of liquid extracts without a reconstitution step.

Statistical Analysis
Following instrumental analysis, a qualitative assessment of the chromatograms was undertaken, retention times converted to retention indices (RI), and 26 compounds of interest were identified across all four species (see Results). Compounds were identified based upon their retention times and mass spectra. Authentic standards of the most common free fatty acids, squalene and cholesterol, were also analyzed alongside unknown samples. For compounds for which we did not have a standard, a computerized search of the National Institute of Standards and Technology (NIST) mass spectral database was conducted. In these cases, only forward match scores greater than 800/1000 were considered probative. The mass spectra for these compounds were compiled into an Automated Mass Spectral Deconvolution and Identification System (AMDIS) library, and automated peak integration proceeded using AMDIS mass spectral deconvolution software. The peak areas were normalized to the square root of the sum of squares for all compounds of interest in the chromatogram.
The normalized dataset was investigated by principal component analysis (PCA) and discriminant analysis (DA) to illuminate the underlying trends among the variables and timepoints. Therefore, a complex system of 26 variables (i.e., 26 compounds of interest) can be efficiently described by a system of three PCs, which can themselves be plotted on a Cartesian coordinate plane, thereby aiding in the visualization of the data.
DA is another technique whose purpose is to visualize groupings in the data, as well as to predict and ascribe group membership for new samples. DA strives to maximize the discrimination among different groups. The group memberships are pre-defined, but the integrity of these memberships can be assessed by a leave-one-out cross-validation, which is reported in a table known as a confusion matrix.

Determination of Species
The relative abundance of the 26 compounds of interest for the G1 generations are presented in a heat map in Table 2. Wherever possible, the identity of the compound has been reported. However, due to similar retention times and indistinguishable mass spectra, the exact identities of some of the hydrocarbons remain elusive. These have been compared against the retention times of an n-alkane mix and differentiated based upon their retention indices. Similarly, the compounds labeled "14:1 FFA" and "18:1 FFA" are mono-unsaturated 14:1 and 18:1 free fatty acids, but the precise location of the double bond is unknown.
There is a high degree of qualitative similarity between the four species, particularly with respect to the expression of fatty acids. For example, palmitoleic acid, palmitic acid, linoleic acid, and oleic acid are observed at relatively high levels in all samples for all species. Substantially more variability is evident in the hydrocarbon compound class, with certain alkanes commonly seen in some species, yet wholly absent in other species. For example: 1.
Less obvious in Table 2, but still just as important to the statistical analysis, are the more nuanced quantitative differences in the detection of these compounds among the different species (i.e., differences in the relative amounts of each of the major compounds). It is often customary to reduce the dimensionality of a dataset first via PCA prior to attempting DA. This is typically done to sidestep one of the inherent limitations of DA, where each grouping must possess more observations (i.e., pupa exemplars) than variables (i.e., compounds of interest, or principal components thereof). A preliminary round of PCA therefore serves as a convenient way to reduce the number of variables whilst still capturing a sizeable amount of the variance in the data. For this data set, PCA of the G1 generation (data not shown) tended to over-emphasize the differences between Lucilia cuprina and Lucilia sericata at the expense of the differences between Cochliomyia macellaria and Phormia regina, and therefore obfuscated more than it aided. Improved discrimination of species was achieved in this study by performing DA on the dataset directly without the intermediate PCA step. DA results for the G1 generation were prepared by this method and an observations plot is shown in Figure 1. This plot illustrates the two species that are readily differentiable (L. cuprina and L. sericata) and two species that are more apt to be confused (C. macellaria and P. regina).
A variables plot is also provided in Figure 2. This type of plot serves as a "roadmap" for interpreting the DA results in Figure 1. Each ray in the variables plot represents one of the 26 compounds of interest. Rays that form an acute angle represent compounds whose abundances are highly correlated. For example, rays I, R, and S, correspond to myristic acid, octadecenoic acid (18:1), and arachidonic acid, which tend to rise and fall together. Similarly, rays T, U, W, and Y, represent long-chain alkanes (waxes) that are also correlated. Rays that form obtuse angles are highly anti-correlated. For example, linoleic acid (ray O) and arachidonic acid (ray S) are negatively correlated. Rays at right angles reflect compounds with little or no correlation. Table 2. Heat map depicting the log of the normalized peak areas for the G1 generation of C. macellaria (n = 53), L. cuprina (n = 54), L. sericata (n = 54), and P. regina (n = 36) across all time points. Twenty-six compounds were selected for statistical analysis. The color scale is logarithmic, varying from black (10 −7 ) to red (10 0 ). "TCAIE" (Compound E) is 2,2,4-trimethyl-3-carboxyisopropyl-pentanoic acid isobutyl ester. FFA: free fatty acid; ND: not determined.
whilst still capturing a sizeable amount of the variance in the data. For this data set, PCA of the G1 generation (data not shown) tended to over-emphasize the differences between Lucilia cuprina and Lucilia sericata at the expense of the differences between Cochliomyia macellaria and Phormia regina, and therefore obfuscated more than it aided. Improved discrimination of species was achieved in this study by performing DA on the dataset directly without the intermediate PCA step. DA results for the G1 generation were prepared by this method and an observations plot is shown in Figure 1. This plot illustrates the two species that are readily differentiable (L. cuprina and L. sericata) and two species that are more apt to be confused (C. macellaria and P. regina). A variables plot is also provided in Figure 2. This type of plot serves as a "roadmap" for interpreting the DA results in Figure 1. Each ray in the variables plot represents one of the 26 compounds of interest. Rays that form an acute angle represent compounds whose abundances are highly correlated. For example, rays I, R, and S, correspond to myristic acid, octadecenoic acid (18:1), and arachidonic acid, which tend to rise and fall together. Similarly, rays T, U, W, and Y, represent long-chain alkanes (waxes) that are also correlated. Rays that form obtuse angles are highly    Figure 1. Identities for the rays projected from the origin correspond to the lettered compounds in Table 2.
The quadrants of the variables plot ( Figure 2) are also directly comparable to the quadrants of the corresponding DA chart (Figure 1). The length of each ray indicates the magnitude of the association. Long rays indicate variables that are important in describing the variation in sample profiles, whereas short rays that fall close to the origin are compounds that describe little to none of the variation. The distribution of the rays in Figure 2 is quite instructive. For example, rays B, K, T, U, W, and Y project into the lower left quadrant and tend to be found at higher levels in L. cuprina. Rays I, R, and S project into the lower right quadrant and tend to be found at higher levels in P. regina and C. macellaria. Lastly, rays A, C, D, O, and P project into the top left quadrant and tend to be found at higher levels in L. sericata.
Thus, the following observations are manifest: • Lucilia cuprina is distinguishable by high concentrations of palmitoleic acid and several long-chain alkanes (alkane B, alkane C, alkane E, and alkane G).  Figure 1. Identities for the rays projected from the origin correspond to the lettered compounds in Table 2.
The quadrants of the variables plot ( Figure 2) are also directly comparable to the quadrants of the corresponding DA chart (Figure 1). The length of each ray indicates the magnitude of the association. Long rays indicate variables that are important in describing the variation in sample profiles, whereas short rays that fall close to the origin are compounds that describe little to none of the variation. The distribution of the rays in Figure 2 is quite instructive. For example, rays B, K, T, U, W, and Y project into the lower left quadrant and tend to be found at higher levels in L. cuprina. Rays I, R, and S project into the lower right quadrant and tend to be found at higher levels in P. regina and C. macellaria. Lastly, rays A, C, D, O, and P project into the top left quadrant and tend to be found at higher levels in L. sericata.
Thus, the following observations are manifest: • Lucilia cuprina is distinguishable by high concentrations of palmitoleic acid and several long-chain alkanes (alkane B, alkane C, alkane E, and alkane G). Lastly, when the DA model for the G1 generation was tested using leave-one-out cross-validation (Table 3), classification accuracies ranged from 89% (P. regina) to 96% (L. sericata) with an overall classification accuracy of 93%. Whereas Figure 1 allows for a qualitative assessment of the discrimination of the four species, classification accuracy is a quantitative way to assess the level of confusion between classes. Therefore, it is worth noting that of the four species, P. regina and C. macellaria had the lowest accuracies. For P. regina, 75% (three out of four) of the misclassifications were due to P. regina observations erroneously classified as belonging to the C. macellaria species. Similarly, 75% (three out of four) of the misclassifications were due to C. macellaria observations erroneously classified as belonging to the P. regina species. Achieving an overall classification accuracy of greater than 90% is an excellent result and this is sound evidence for the theory that taxonomic information can be gleaned from the lipid profiles.

Effect of Genetic Diversity
A key issue this project sought to address was the effect of genetic diversity on a specimen's chemical profile. Specifically, do G1 pupae yield a markedly different suite of compounds than pupae belonging to the G10 generation, whose genetic diversity has been homogenized over successive generations of inbreeding?
When the results from the G10 generations were analyzed by DA (see Figures 3 and 4 and Table 4), the results were like those for the G1 generation. Note that the orientation of the species within the DA plot is rotated~180 • relative to Figure 2. This is not as significant as the relative positions of the classes and the overall separation between species, which is more evident in this data set ( Figure 3). The correlations between variables and the observations are also very similar (Figure 4), although rotated relative to Figure 2 as mentioned above. One exception was compound R (18:1 or octadecenoic acid), which was no longer strongly correlated with any of the other compounds. Lastly, when the DA model for the G10 generation was tested using leave-one-out cross-validation (Table 4), classification accuracies ranged from 94% (P. regina) to 98% (L. cuprina) with an overall classification accuracy of 96%. In similar fashion to the G1 generation, most of the misclassifications for P. regina, 67% (four out of seven) were due to P. regina observations that were erroneously classified as belonging to the C. macellaria species, and vice versa. Similarly, all misclassifications for C. macellaria (two out of two) were due to C. macellaria observations that were erroneously classified as belonging to the P. regina species. The overall performance of the DA model is better for the G10 generation (96%) than it was for the G1 generation (93%). This small increase in accuracy may be explained by reducing the intra-species variation through repeated inbreeding.
DA was also used to attempt to discriminate samples hailing from the G1 and G10 generation of each species (i.e., C. macellaria G1 samples were compared to C. macellaria G10 samples, L. cuprina G1 samples were compared to L. cuprina G10 samples, etc.). When DA is applied to only two classes (as in a series of pair-wise comparisons), a Receiver Operating Characteristic (ROC) curve can be   Figure 3. Identities for the projected rays from the origin correspond to the lettered compounds in Table 2. DA was also performed by treating samples of G1 pupae as unknowns and classifying them per the DA model constructed from G10 pupae. This models a situation where a laboratory would have a database of lipid profiles taken from flies that could reproduce and inbreed for 10 generations. That same laboratory would then take case samples, analyze them immediately, and classify them per this database. If genetic differences are irrelevant, then the G1 "casework" pupae would be expected to be grouped per their species with near 100% accuracy, as was seen for the G10   Figure 3. Identities for the projected rays from the origin correspond to the lettered compounds in Table 2. Lastly, when the DA model for the G10 generation was tested using leave-one-out cross-validation (Table 4), classification accuracies ranged from 94% (P. regina) to 98% (L. cuprina) with an overall classification accuracy of 96%. In similar fashion to the G1 generation, most of the misclassifications for P. regina, 67% (four out of seven) were due to P. regina observations that were erroneously classified as belonging to the C. macellaria species, and vice versa. Similarly, all misclassifications for C. macellaria (two out of two) were due to C. macellaria observations that were erroneously classified as belonging to the P. regina species. The overall performance of the DA model is better for the G10 generation (96%) than it was for the G1 generation (93%). This small increase in accuracy may be explained by reducing the intra-species variation through repeated inbreeding.
DA was also used to attempt to discriminate samples hailing from the G1 and G10 generation of each species (i.e., C. macellaria G1 samples were compared to C. macellaria G10 samples, L. cuprina G1 samples were compared to L. cuprina G10 samples, etc.). When DA is applied to only two classes (as in a series of pair-wise comparisons), a Receiver Operating Characteristic (ROC) curve can be constructed. ROC curves depict the binary classification accuracy of the system, and the area under a ROC curve varies from 0.5 (no discrimination) to 1.0 (perfect discrimination).
The classification accuracy and ROC results for the G1 and G10 generations of all four species are shown in Table 5. The species whose G1 and G10 generations were the most distinguishable was P. regina, with a classification accuracy of 82.2% and a ROC area under the curve (AUC) of 0.88. Therefore, the effects of genetic homogenization were the most significant for this species. The species whose G1 and G10 generations were the least distinguishable was L. sericata, with a classification accuracy of 58.9% and ROC AUC of 0.65. Therefore, the effects of genetic homogenization were the least significant for this species. DA was also performed by treating samples of G1 pupae as unknowns and classifying them per the DA model constructed from G10 pupae. This models a situation where a laboratory would have a database of lipid profiles taken from flies that could reproduce and inbreed for 10 generations. That same laboratory would then take case samples, analyze them immediately, and classify them per this database. If genetic differences are irrelevant, then the G1 "casework" pupae would be expected to be grouped per their species with near 100% accuracy, as was seen for the G10 "database" generation provided in Table 4. Poor classification accuracy, conversely, would indicate that the model is indeed betrayed by genetic differences. The results are shown in Table 6. Table 6. Leave-one-out confusion matrix for the G1 generation chemical profiles. Although the confusion matrix shows lower classification accuracies across the board (when compared to Table 4), the accuracies for L. cuprina and L. sericata are excellent (>90%) and the accuracy for C. macellaria is very good (89%), despite the mismatch between the training set (G10 samples) and the test set (G1 samples). It is also interesting to see that the species whose G1 and G10 generations were reliably differentiated in Table 5 (i.e., P. regina), and therefore, exhibited the most significant chemical changes, had low classification accuracies (75%) in this example. Conversely, a species whose G1 and G10 generations were not differentiated in Table 5 (e.g., L. sericata), and therefore, exhibited the least significant chemical changes, had high classification accuracy (93%). It should be noted that no L. sericata were classified as L. cuprina, and 2 of 54 (7%) of L. cuprina were classified as L. sericata. This is important as both species belong to the same genus and share a close evolutionary history.
Based on these results, it appears that G10 pupae may be a suitable model for G1 pupae, if the effect of genetic homogenization is not significant. For species such as P. regina, genetic homogenization does indeed alter the chemical profile in a statistically-significant way. Future researchers seeking to extrapolate useful information about G1 and wild-type populations based on data gained from advanced generations are cautioned to bear this in mind.

Effects of Abiotic Variables (P. regina)
DA was carried out and confusion matrices were generated for the data from the studies of temperature, humidity, substrate, and diet. Although overall classification accuracies of 60% or more were achieved for all four experiments, in most cases the actual differences observed among the classes were marginal. For example, almost no difference in the lipid profile was manifest among specimens allowed to pupate in sawdust, in sand, or in vermiculite (data not shown). Classification accuracies were therefore poor, with a third of samples misclassified in the cases of pupae reared in sawdust and sand, and half of all samples in the case of vermiculite. In these cases, only the data in the experiment (for example, the pupation substrate) were used, therefore, it appeared that it did not matter what the pupation substrate was, there were no distinct classes formed.
Results for pupae reared under different humidity conditions were almost as poor (data not shown), with discrimination only achieved for the sample set reared under normal 55% humidity conditions. This is perhaps a surprising result, as the cuticular hydrocarbons of insects are known to play a part in resistance to desiccation, and might be expected to vary in response to adverse environmental conditions such as low humidity. It should be noted that although discrimination could not be achieved in these cases, this is, in fact, beneficial. In the context of insect specimens recovered from a crime scene, it is to the advantage of an investigator if the chemical profiles do not vary significantly with temperature, humidity, and other uncontrollable factors. A reliable method for age and species determination that is not undermined or biased by biotic and abiotic variables is the gold standard for any analyst. Therefore, the lack of discrimination is, in this case, the desired result.
Temperature appeared to have more of an impact on the lipid profiles, although its influence was still relatively small, as attested by the poor classification accuracies of 41.7% and 62.3% for pupae reared at 15 • C and 20 • C, respectively (data not shown). More substantial chemical differences were observed at 25 • C and 30 • C. In general, lauric acid, myristic acid, pentadecanoic acid, palmitic acid, and alkane G increased with temperature.
Diet was the single-largest contributing factor to differences in the lipid profile ( Figure 5). Classification accuracies (Table 7) were 100% for pupae reared on chicken liver and 87.5% for pupae reared on feeder rats. The statistical model was unsuccessful at distinguishing between pupae raised on high-fat ground beef and pupae fed on low-fat ground beef. Neglecting that minor distinction, pupae raised on ground beef were still correctly categorized 100.00% of the time. The following trends are clear:

•
Pupae reared on chicken liver have lipid profiles with higher quantities of palimitic acid, palimitoleic acid, unknown alkane B, and cholesterol. • Pupae reared on ground beef have lipid profiles with higher quantities of lauric acid, myristic acid, palmitelaidic acid, linolenic acid, and stearic acid. • All other compounds appear in greatest abundance in feeder rats. Diet was the single-largest contributing factor to differences in the lipid profile ( Figure 5). Classification accuracies (Table 7) were 100% for pupae reared on chicken liver and 87.5% for pupae reared on feeder rats. The statistical model was unsuccessful at distinguishing between pupae raised on high-fat ground beef and pupae fed on low-fat ground beef. Neglecting that minor distinction, pupae raised on ground beef were still correctly categorized 100.00% of the time. The following trends are clear:

•
Pupae reared on chicken liver have lipid profiles with higher quantities of palimitic acid, palimitoleic acid, unknown alkane B, and cholesterol.

•
Pupae reared on ground beef have lipid profiles with higher quantities of lauric acid, myristic acid, palmitelaidic acid, linolenic acid, and stearic acid. • All other compounds appear in greatest abundance in feeder rats.    The most meaningful takeaway from this series of experiments is that, which the exception of diet, changes in any given abiotic factors did result in reliably differentiable classes for P. regina. However, each factor had a larger capacity to obfuscate accurate species determinations when comparing to a population of P. regina reared under normal conditions. When each chemical profile under every single differing abiotic condition was validated using discriminant analysis with the four species (G10) as the training set, most species classifications were less than 50% correct (Table 8). However, when comparing the control conditions (these conditions are identical to the G10 conditions) with expected classification accuracies of >82% (see Table 5), only 37.7% classification accuracy was recovered. This is indicative that conditions changed between the G10 rearing conditions and that of the abiotic factor experiments conducted sometime later. Nevertheless, it should be noted that classification accuracy using discriminant analysis is only as good as the similarity between the training and validation data sets. In this case, the data sets differ, thus obscuring an accurate species identification. Our data indicate that diet has the greatest impact (with classification accuracies less than 2%), whereas temperature and pupation substrate appear to be most like the control profiles (Table 8). This is not surprising; given that pupae produce lipids in their cuticles for the protection against desiccation and predation, temperature and pupation substrate (our substrates were all dry) should not have much of an effect. However, humidity (different lipids/lipid ratios would be needed to protect the pupa), and diet (the lipids would be obtained in different ratios) would significantly influence the profiles. Table 8. Leave-one-out confusion matrix for P. regina larvae reared under different abiotic conditions and to which species they assign. The control abiotic conditions were 25 • C, 55% r.h., chicken liver and sawdust-the same conditions as the G10 generation. Only the abiotic condition in column one changed, other conditions remained as in the control condition.

Conclusions
This study aimed to better understand how biotic (reduced genetic variation) and abiotic factors (temperature, humidity, pupation substrate, and diet) influence the uses of chemical analyses for species identification of forensically important pupae. Considerable chemical differences were documented among Cochliomyia macellaria, Lucilia cuprina, Lucilia sericata, and Phormia regina as a function of species and genetics. Classification accuracies of 89-94% were obtained for the G1 generation, and classification accuracies of 93%-98% were obtained for the G10 generation. The compounds most correlated and anti-correlated with species were identified. Data from the G10 generations were shown to be a suitable model for G1 pupae, although performance on P. regina suffered, with classification accuracy falling to 75%. However, genetic homogenization was found to have an impact on the observed chemical profiles, which should serve as a cautionary note for future researchers hoping to extract data with real world relevance from colonies at advanced generations. Temperature and pupation substrate were found to have minimal impact on the lipid profiles, whereas small chemical changes were observed with changes in humidity, and significant changes were observed with variations in diet.
If comparing the classification accuracies of the different abiotic conditions relative to the control, it is immediately clear that diet plays the most important role in the composition of lipid profiles in developing pupae (Table 8). Therefore, in "real-world" scenarios where pupae are recovered from a scene but their diet is unknown, we suspect that using chemical profiles to determine species is not ideal, and thus more research is necessary. Although we used a single classification method here (discriminant analysis), other methods such as (Soft Independent Modeling of Class Analogy (SIMCA), Neural networks or Bayesian classifiers, etc.) could change the classification accuracies presented here. However, the effect of diet is a biochemical process and not one that could be easily corrected for via statistical methods. It is important that additional data be collected to establish a greater understanding of the changes in chemical profiles over time under different conditions to establish a reliable database for species identification purposes.