Statistical Optimization of Urinary Organic Acids Analysis by a Multi-Factorial Design of Experiment

: The analysis of urinary organic acids is useful for patients suspected to have inborn errors of metabolism known as organic acidurias. These diseases cause an accumulation of organic acids in body ﬂuids and their abnormal excretion in urines. By means of chemometrics tools, such as principal component analysis and multiple linear regression, it was concluded that the conditions used in our laboratory are really the most suitable to achieve high yields of analytes.


Introduction
Organic acids and acylglycines determination in urine samples is an extremely important tool for the diagnosis of inborn errors of metabolism (IEM) characterized by abnormal excretion of these substances and known as organic acidurias [1]. Organic acidurias consist of a very heterogeneous group of disorders, both phenotypically and genetically. They are caused by defects in genes responsible for the coding of enzymes or cofactors involved in crucial metabolic pathways [2]. To date, according to Villani et al. [3] more than 65 organic acidurias have been documented. Even if these disorders are individually very rare, the cumulative incidence is probably one out of 3000 live births [4]. They are found in all ethnic groups and can occur at every age, but most frequently their onset is in the first days or months of life. Some of them can be treated, with promising outcomes. For this reason it is really important to diagnose them early, in order to start the treatment before the most severe symptoms take place.
Since isovaleric aciduria was identified by Tanaka and Isselbacher [5] with GC-MS, this analytical technique is considered the method of choice for the diagnosis of organic acidurias [6]. Other methods of detection were proposed, such as capillary electrophoresis [7,8], electrospray ionization tandem mass spectrometry [9], and NMR [10]. All of them, though, have relevant drawbacks, such as inadequate sensitivity and repeatability and limitations to provide the broad profile of urinary metabolites and the standardization offered by GC-MS.
The most common analytical strategy of IEM laboratories consists of liquid-liquid extraction (LLE) of metabolites from urine samples, followed by trimethylsilyl derivatization and GC-MS analysis [11]. The obtained chromatographic profiles allow the identification and quantification of more than 250 organic acids and acylglycines, among which almost a hundred metabolites are useful for the diagnosis of organic acidurias [12,13].
Analytica 2020, 1 15 Urinary organic acids can be aliphatic or aromatic, linear, cyclic or heterocyclic, saturated or unsaturated, hydroxyl-, keto-, mono-, di-or tricarboxylic acids. Endogenous organic acids are intermediate metabolites of the catabolism of the most important components of the cells, such as amino acids, lipids, biogenic amines, nucleic acids and steroids [14]. They can also arise from exogenous sources, like xenobiotics and dietary supplements, food additives, drugs and drug metabolism or bacterial metabolism [15]. Organic acids are polar compounds rapidly accumulated and excreted in the urine, more than in other biological fluids. Acylglycines are synthetized in the liver by the conjugation of glycine with acyl-coenzyme A (CoA) esters resulting from organic acids. This conjugation is a detoxification system used by the organism to prevent the accumulation of acyl-CoA esters in some IEM [16].
The aim of the present study was the systematic evaluation of the method used in our laboratory to analyze organic acids and acyglycines for organic acidurias diagnosis. We performed a multi-factorial experimental design to investigate the dependence of the analytes total yield on experimental factors.
A design of experiment (DoE) is used to obtain as many information as possible with the minimum number of experiments. It is a strategy which consents to maximize the efficiency of the experiments and to minimize waste and costs [17]. Firstly it is required to choose factors, i.e., the quantities that affect the response, secondly to define levels for each factor. The levels are chosen with the purpose to maximize the information about the factors. Then, the multivariate data obtained from the experiments are fitted into an empirical function, usually linear or quadratic with interaction terms. The calculated function can be used to provide further information about the system [17]. The DoE strategy is claimed to be better than the traditional "one-change-at-a-time" approach, since the traditional method is unlikely to discover the optimal conditions, especially when factors seem to be correlated, thus leading to perform more experiments than necessary [18]. In the present study, since it was impossible to exclude interactions between the factors, we have used a full-factorial design with four factors and two levels, to obtain the most robust interpretation of their effects, together with their reciprocal interactions, on the analytical yield of the organic acids and acylglycines.
The final purpose of this DoE was to maximize the areas of the chromatographic peaks and to optimize the method used in our laboratory. The results were expressed in terms of principal components and were interpreted using a multiple linear regression (MLR) model.

Analytical Method
The determination of organic acids and acylglycine in urine samples was performed according to the 2008 guidelines of the "Società Italiana Studio Malattie Metaboliche Ereditarie" [7] as follows: 1.
Sample creatinine was determined according to Jaffè method; 2.
Urine samples were diluted with saline solution to obtain a concentration of 2 mM of creatinine; 3.
A total of 1 mL of the diluted samples was added with 100 µL of internal standard (1 M tropic acid), 20 µL of a solution of 35% w/v sodium hydroxide and 100 µL of 1 M hydroxylamine. Samples were incubated at 60 • C for 30 min;

4.
A total of 50 µL of 30% w/w hydrochloric acid and approx. 0.5 g of sodium chloride were added. Extraction was performed three times with 2 mL of ethyl acetate. The samples were vigorously vortexed for 1 min or shaken at 350 rpm on an automatic shaker for 10 min; 5.
The combined extracts were dried at 40 • C in a water bath, under a gentle nitrogen flux; 6.
A total of 50 µL of BSTFA with 1% of TMCS were added to dried extracts and they are heated at 50 • C for 40 min; 7.
Samples were diluted with 500 µL of n-heptane.
The analysis was performed with a GC-MS (model 6890/5973, Agilent Technologies, Milan, Italy) equipped with an Electron Ionization (EI) source, 60 m cross-linked 5MS ® 0.25 mm × 0.25 µm capillary column was used (CPS Analitica s.r.l., Milan, Italy). Carrier gas employed was helium with a flow of 0.8 mL/min. The solvent delay was set at 14 min. Injections of 1 µL were performed in splitless mode at 285 • C. The GC oven temperature was initially set at 60 • C, raised at a rate of 5 • C/min up to 155 • C, where it was maintained for 5 min. Then it was raised again at a rate of 5 • C/min up to 200 • C, and then at a rate of 50 • C/min up to 290 • C, where it was maintained for 9 min, for an overall run time of 49.5 min. Temperatures were set at 250 • C for the ionization source, at 280 • C for the transfer line and at 180 • C for the quadrupole. The spectra are acquired in scan mode, with a range of analysis between 60 and 500 m/z.

Urine Specimens
Urine samples used for this study were control materials ORG ® purchased from MCA (Winterswijk, The Netherlands). They consist of lyophilized human urine containing analytes specifically selected for laboratories active in the field of IEM. ORG ® is intended for internal quality control of analytical systems for the determination of organic acids and acylglycines in the urine. The product was reconstituted with 10 mL of distilled water according to producer instructions.

Chemometrics
In this study, we used principal component analysis (PCA) and multiple linear regression (MLR) as chemometric tools to elaborate and interpret the results of the DoE.
PCA is a statistical procedure that analyzes a data table in which the results are described by inter-correlated quantitative dependent variables [19].
In this statistical procedure, an orthogonal transformation is used to convert the set of observations of possibly correlated variables (in our case, the peak areas of organic acids and acylglycines) into a set of values of linearly uncorrelated variables, i.e., the principal components. Each one of them can be represented as a vector and is a linear combination of the initial variables. Subsequently, the first principal component (called PC1) has the largest possible variance, because it accounts for as much of the variability in the data as possible.
This method is probably the best known and most widely used dimension-reducing technique, and it is very often applied in chemometric analysis, since it allows for eliminating the possible correlation among the original variables [19]. The number of principal components, at first, is the same of the original variables, but with a procedure of selection it is possible to use only the first two or three principal components, the ones that explain most part of the variability.
After the PCA, we have used an MLR model to interpret the results. MLR is one of the possible statistical models which can be used in regression analysis. In regression analysis, the focus is on the relationship between a dependent variable, also called the response variable, and one or more independent variables, called predictors [20]. In particular, an MLR model describes how a single response variable depends linearly on a number of predictor variables [21].
It shows the relationships between two or more predictor variables (x) and one response variable (y), by fitting a linear equation (Equation (1)) to observed data.
This equation can be also called the population regression line for p predictor variables: x 1 , x 2 , . . . x p . β 0 is the intercept of the line, while β 1 , β 2 , . . . β p are the regression coefficients and ε is the residual noise.

Design of Experiment (DoE)
In this study, the target of the DoE was to evaluate the effect of quantitative factors on the yield of organic acids to optimize the method used in the laboratory. The DoE consisted in a full-factorial design with four factors and two levels for each factor. The chosen factors were: the pH of the oximation reaction (A), the volume of 1 M hydroxylamine solution (B), the GC injector temperature (C) and the volume of BSTFA + 1% of TMCS added for derivatization (D). These four factors were chosen on the basis of a comparison among our analytical method and two other methods: the one in use in Mayo Clinic Laboratories (Rochester, MN, USA) [22], and the one reported in the guidelines of American College of Medical Genetics and Genomic [23]. Two levels, coded with −1 and +1, were selected for each factor ( Table 1). The levels were chosen to comprehend the value actually applied in our analytical method and the values used by the other methods. Table 1. Selected factors and their levels for design of experiment (DoE).

Factors Levels: Values and Codes
A: pH of oximation We have chosen to investigate the oximation step the most, because it is critical for the analysis of ketoacids, which are very important in the diagnosis of many organic acidurias. We have taken into account a range of pH between 7 and 14, because a basic pH should favor the reaction between organic ketoacids and hydroxylamine. The same reason is valid for the investigation of the volume of BSTFA + 1% of TMCS, used in the second derivatization. The temperature of the injection was considered because, if too high, it could cause a degradation of the derivatized organic acids and acylglycines. The response was considered to be the area of the chromatographic peak of each organic acid. In particular, we have selected 22 organic acids among the most important for the diagnosis of organic acidurias. The selected organic acids, including the internal standard, are reported below in Table 2, together with their retention times. Table 2. List of organic acids and acylglycines considered for this study, with their respective retention times. As for the response area please see Table S1 in supplementary materials.

Analytes
Retention Times (min)  Figure 1 shows an example of a chromatogram, obtained with this analysis, of the control material ORG ® . The identification and the integration of each peak is done automatically by the software. The most significant peaks are shown in Figure 1.  Figure 1 shows an example of a chromatogram, obtained with this analysis, of the control material ORG ® . The identification and the integration of each peak is done automatically by the software. The most significant peaks are shown in Figure 1.

Design of Experiment
The multifactorial DoE was planned by comparing our analytical method with (i) procedures used in the most important IEM laboratory worldwide, Mayo Clinic's (Rochester, MN, USA), described by P. Rinaldo in his "Organic acids" [24], and (ii) those described in American College of Medical Genetics Guidelines [25]. In both reference procedures, the oximation of ketoacids is performed at a neutral pH and the temperature of injection in GC-MS is less than 250 • C, while, in our laboratory, the oximation pH is basic and the derivatized samples are injected at 285 • C. In the Mayo Clinic Laboratory, oximation is obtained with a volume of the reagent of 500 µL versus 100 µL, and silanization is accomplished with an amount of BSTFA + 1% of TMCS of 100 µL versus 50 µL, in comparison with our method. We developed a full-factorial design with four factors (k), i.e., pH and volume of the reagent of oximation, injection temperature and volume of silanizating agent, and two levels (L) for each factor. A total of 16 experiments (L k ) were performed, as detailed in Table 3. Table 3. Matrix of the DoE (Design of Experiments).

Experiment nr:
Factor A (pH) Factor B (V NH2OH ) Factor C (T inj ) Factor D (V BSTFA ) The experiments were performed in random order without replicates. Randomization was necessary to guarantee the repeatability and to avoid any systematic error. According to the matrix of the DoE, the experiment number 9 corresponds to the method currently used in our laboratory.
Preliminary analysis of the results shows that there is not a positive trend for every factor: the variation of the factors from level −1 to level +1 not always leads to an increase in the response. In fact, the increase of factor A, i.e., the pH of the oximation, raised the areas of the chromatographic peaks of some organic acids, while the increase of factors B and D, i.e., the volume of hydroxylamine solution and of BSTFA + 1% of TMCS, decreased the response for almost all the organic acids. The experiments that seemed to have the best result are number 9 and number 2. Experiment number 2 differs from number 9 only for the level of factor C.
Afterward, the autoscaling procedure was applied to the results, to give them all the same weight, and then the principal component analysis (PCA). The most significant principal components were considered to be the first and the second one (PC1 and PC2), since, together, they explain 53% of the variance (33% and 19% respectively). The scores plot and the loadings plot of the PCA are reported in Figure 2.
In the scores plot of the first two PCs, it is possible to notice that the experiment that maximizes both of the PCs is number 9, while the second-best experiment is number 2. They are circled in red in the Figure 2. The worst experiments are number 3, 10 and 7, with reference to the PCA.
The loadings plot shows that the direction in which most of the original variables, i.e., the organic acids, are maximized is the direction where PC1 is maximized.
To interpret the data, it is convenient to apply a MLR model with the following formula: where y is the final response, ε represents the noise and b i are the coefficients of each factor. The quadratic terms of all factors, which represent their reciprocal interactions, were also evaluated. With the MLR model, it was possible to obtain the coefficients plot for the factors and their binary interactions ( Figure 3).
Analytica 2020, 1, FOR PEER REVIEW 7 of 10 were considered to be the first and the second one (PC1 and PC2), since, together, they explain 53% of the variance (33% and 19% respectively). The scores plot and the loadings plot of the PCA are reported in Figure 2. In the scores plot of the first two PCs, it is possible to notice that the experiment that maximizes both of the PCs is number 9, while the second-best experiment is number 2. They are circled in red in the Figure 2. The worst experiments are number 3, 10 and 7, with reference to the PCA.
The loadings plot shows that the direction in which most of the original variables, i.e., the organic acids, are maximized is the direction where PC1 is maximized.
To interpret the data, it is convenient to apply a MLR model with the following formula: y = b0 + bAA + bBB + bCC + bDD + bABAB + bACAC + bADAD + bBCBC + bBDBD + bCDCD+ ε (2) where y is the final response, ε represents the noise and bi are the coefficients of each factor. The quadratic terms of all factors, which represent their reciprocal interactions, were also evaluated. With the MLR model, it was possible to obtain the coefficients plot for the factors and their binary interactions ( Figure 3). The coefficients plot in Figure 3 shows that the most significant factor that affects the yield of organic acids is factor B, the volume of hydroxylamine solution, with a p-value < 0.001. In this case,  The coefficients plot in Figure 3 shows that the most significant factor that affects the yield of organic acids is factor B, the volume of hydroxylamine solution, with a p-value < 0.001. In this case, the analytical yield decreases when factor B increases from level −1 to level +1. In all the other cases, the p-value results to be >0.05, indicating that the other factors and their interactions do not have a significant effect on the yield of organic acids.
The maximum response value for the DoE can be observed in the lower-right section of the response surface graphics (Figure 4), at the points encoded as [+1, −1]. We can conclude that the highest yield for organic acids was reached when the oximation pH was 14, the volume of hydroxylamine solution was 100 µL, the GC injection temperature was 285 • C, and the volume of BSTFA + 1% of TMCS was 50 µL. These are exactly the experimental conditions used in our method. significant effect on the yield of organic acids.
The maximum response value for the DoE can be observed in the lower-right section of the response surface graphics (Figure 4), at the points encoded as [+1, −1]. We can conclude that the highest yield for organic acids was reached when the oximation pH was 14, the volume of hydroxylamine solution was 100 μL, the GC injection temperature was 285 °C, and the volume of BSTFA + 1% of TMCS was 50 μL. These are exactly the experimental conditions used in our method.

Conclusions
The present study has concluded, with a detailed investigation of the experimental parameters, that the method currently used in the laboratory is the most suitable for this analysis.
To obtain the highest yields in quantification of urinary organic acids the following conditions are recommended: basic pH and, a volume of 1M hydroxylamine solution not exceeding 100-150 μL for the oximation, 50 μL of BSTFA + 1% TMCS for the sylanization and a temperature of 285 °C for the injection of the derivatives in the GC.
The DoE has required accurate planning of the experiments and the use of statistical methods, such as PCA and MLR, necessary for the elaboration of the results. The major achievement of the present study is the demonstration that several experimental parameters can influence the results of the analysis of organic acids in urine. The results have shown some significant differences among the experiments.
Achieving optimal conditions for organic acids analysis represents, therefore, an important goal in order to obtain stable and reliable analytical results. In the future, a significant effort could be made to plan another DoE changing the levels of the factors, with the aim to define the optimal values for each factor. This statistical analysis confirmed that the method currently used in our laboratory was the best for the diagnosis of organic acidurias.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/s1, Figure S1: Contour plot of the full-factorial DoE: 2-D response surface plots for the first two factors of the DoE, Figure S2: Contour plot of the full-factorial DoE: 3-D response surface plots for the second two factors, Figure S3: Example of a chromatogram obtained in case of a pathological patient, Table S1: Analyte name and response (peak area) for each experiment.

Conclusions
The present study has concluded, with a detailed investigation of the experimental parameters, that the method currently used in the laboratory is the most suitable for this analysis.
To obtain the highest yields in quantification of urinary organic acids the following conditions are recommended: basic pH and, a volume of 1M hydroxylamine solution not exceeding 100-150 µL for the oximation, 50 µL of BSTFA + 1% TMCS for the sylanization and a temperature of 285 • C for the injection of the derivatives in the GC.
The DoE has required accurate planning of the experiments and the use of statistical methods, such as PCA and MLR, necessary for the elaboration of the results. The major achievement of the present study is the demonstration that several experimental parameters can influence the results of the analysis of organic acids in urine. The results have shown some significant differences among the experiments.
Achieving optimal conditions for organic acids analysis represents, therefore, an important goal in order to obtain stable and reliable analytical results. In the future, a significant effort could be made to plan another DoE changing the levels of the factors, with the aim to define the optimal values for each factor. This statistical analysis confirmed that the method currently used in our laboratory was the best for the diagnosis of organic acidurias.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2673-4532/1/1/3/s1, Figure S1: Contour plot of the full-factorial DoE: 2-D response surface plots for the first two factors of the DoE, Figure S2: Contour plot of the full-factorial DoE: 3-D response surface plots for the second two factors, Figure S3: Example of a chromatogram obtained in case of a pathological patient, Table S1: Analyte name and response (peak area) for each experiment.