Structural Equation Modeling (SEM) Analysis of Sequence Variation and Green Plant Regeneration via Anther Culture in Barley

The process of anther culture involves numerous abiotic stresses required for cellular reprogramming, microspore developmental switch, and plant regeneration. These stresses affect DNA methylation patterns, sequence variation, and the number of green plants regenerated. Recently, in barley (Hordeum vulgare L.), mediation analysis linked DNA methylation changes, copper (Cu2+) and silver (Ag+) ion concentrations, sequence variation, β-glucans, green plants, and duration of anther culture (Time). Although several models were used to explain particular aspects of the relationships between these factors, a generalized complex model employing all these types of data was not established. In this study, we combined the previously described partial models into a single complex model using the structural equation modeling approach. Based on the evaluated model, we demonstrated that stress conditions (such as starvation and darkness) influence β-glucans employed by cells for glycolysis and the tricarboxylic acid cycle. Additionally, Cu2+ and Ag+ ions affect DNA methylation and induce sequence variation. Moreover, these ions link DNA methylation with green plants. The structural equation model also showed the role of time in relationships between parameters included in the model and influencing plant regeneration via anther culture. Utilization of structural equation modeling may have both scientific and practical implications, as it demonstrates links between biological phenomena (e.g., culture-induced variation, green plant regeneration and biochemical pathways), and provides opportunities for regulating these phenomena for particular biotechnological purposes.


Introduction
Anther culture involves cold treatment of spikes [1], dark incubation of tissue culture [2], addition of chemicals [3], and many other stress-inducing steps [4] that affect plant regeneration. Cold treatment is necessary for the developmental switch of microspores from the gametophytic to embryogenic fate [5], a process that requires DNA demethylation, followed by de novo methylation [6]. DNA demethylation as, well as de novo methylation, may alter the DNA methylation patterns [7][8][9][10], which could lead to DNA mutations [11] and mobile element activation [12,13]. During reprogramming, an increase in cell death and oxidative stress is observed [14]. The equilibrium between reactive oxygen species (ROS)-scavenging and ROS-producing mechanisms governs the level of damage and oxidative stress in the cell [15,16]. Low temperature may also alter endogenous ethylene (ET) We hypothesized that abiotic stresses acting on barley microspores are somehow sensed by the inner callose layer comprising β-glucans. Cells utilize β-glucans as a carbon source under in vitro culture conditions, thus affecting glycolysis and TCA. The presence of Cu 2+ and Ag + ions in the medium affects the mitochondrial respiratory chain and the amount of adenosine triphosphate (ATP) needed for the biosynthesis of SAM via the Yang cycle [57][58][59]. Cu 2+ ions can cause oxidative stress, leading to changes in the methylation patterns of DNA, especially cytosines [60]. Moreover, modified bases are subject to repair. If the repair system does not work correctly, modified cytosines act as a source of DNA mutation. We also suspect that Cu 2+ and Ag + ions, as well as DNA methylation changes, may be responsible for green plant regeneration, and the tissue culture duration controls DNA methylation changes, SV and green plant regeneration. Such relationships could be predicted using the structural equation modeling (SEM) approach [61], which is widely exploited in psychology [62] but relatively rarely in biology [63] and agriculture [64,65]. The primary goal of SEM is to explain the observed variability in the data set, described by the covariance matrix, with smallest number of parameters of the postulated model for the analyzed process or phenomenon. If the tested model is confirmed using empirical data, then it is possible to estimate the strength of the relationship between the variables included in this model. These relationship strengths accurately represent the cause and effect of the process being studied. SEM, unlike other methods of statistical modeling, allows the inclusion of all interactions and interdependencies that accompany a given process or phenomenon [64]. Utilizing SEM for studying relationships between multiple factors putatively involved in the control of tissue culture-induced variation (TCIV), components of biochemical pathways affected by in vitro anther culture and green plant regeneration is a sophisticated combination of methods allowing better understanding of complex relationships in anther culture. Such knowledge may have practical implications, as it may help to gain control over the number of green plants regenerated via in vitro anther culture and the level of DNA SV in these plants.
In this study, we evaluated the relationships between different factors affecting SV and green plant regeneration via anther culture in barley using the SEM approach.

Materials and Methods
Data used for current analysis and model generation were based on plant materials evaluation, DNA isolation and metAFLP and DArTSeqMet analysis, as well as the FTIR spectroscopy conducted and described earlier [48,50,56].
Variables used for SEM, their descriptive statistics are given in Table 1, whereas Pearson's linear correlation coefficients in Table 2 (see Results).  To eliminate constants from the structural equation model, the difference between observation and mean value for variables was used instead of raw data. The structural equation model was implemented in IBM SPSS ® Amos™ 20 [66] computer software. The maximum likelihood (ML) estimation with the Levenberg-Marquardt iteration method was used to optimize the parameters of postulated models [67][68][69].

Results
Molecular and phenotypic data as well as the results of mediation analysis presented in our earlier study [50] were used here to create a generalized model predicting essential relationships between selected genetic and biochemical features and Cu 2+ and Ag + ions concentrations in barley (Hordeum vulgare L.) anther cultures.

Characterization of Input Data
Construction of the structural equation model was based on nine variables evaluated employing metAFLP characteristics, DArTseqMet marker-based methylation changes, ATR-FTIR spectroscopy, Cu 2+ and Ag + ion concentrations and Time. The descriptive statistics of the analyzed variables, including mean, variance, skewness and kurtosis are presented in Table 1.
All of the analyzed variables were quantitative and met the conditions set out and the Lindeberg-Lévy theorem [70]. It can, therefore, be assumed that the distribution of these variables is asymptotically convergent with the theoretical normal distribution. Among these variables, CHG_DMV and SV showed relatively high skewness and kurtosis in the analyzed random sample.
Kenny [71] indicated that variables with non-normal distribution, especially those with high kurtosis, inflate the chi-square and absolute measures of fit values [72]. Moreover, the relatively small sample size (n = 35) was disadvantageous, as it may falsely result in non-significant chi-square statistics [73]. Therefore, the chi-square test was used only as an information criterion [74] and was not used to determine the correctness of the model. Instead, the correctness of the model should be evaluated using numerous model fit measures.
A wide range of fit measures can be used to assess the goodness of fit of a model. To interpret the fitness of the model, all possible limitations resulting from the specificity of the data and the model itself were taken into account. For example, [73] showed that some goodness-to-fit indices are relatively stable with small sample sizes, whereas others such as root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR), increase with smaller sample sizes. Additionally, [71] suggested that small sample size can be used for simple models and models without latent variables. Parsimonious fit measures such as parsimonious normed fit index (PNFI) and parsimonious comparative fit index (PCFI) include in their construction an element of model complexity. These measures are used when comparing models with different degrees of freedom (df). The higher the value of these indices, the better the model [75,76].
SEM allows creating a statistical description of complex causal relationships. Compared with other commonly used methods (such as regression analysis and Write's path analysis), SEM permits the inclusion of more complex relationships between variables, including those with exogenous variables, in the model. The correlation coefficient analysis allows the evaluation of the occurrence and complexity of the relationship within the studied process or phenomenon. Pearson linear correlation coefficients (Table 2) revealed correlations between the analyzed traits, indicating a complex relationship within the analyzed data.

Model Specification and Estimation
SEM revealed relationships between all variables included in Table 1 residual (SRMR), increase with smaller sample sizes. Additionally, [71] suggested that small sample size can be used for simple models and models without latent variables. Parsimonious fit measures such as parsimonious normed fit index (PNFI) and parsimonious comparative fit index (PCFI) include in their construction an element of model complexity. These measures are used when comparing models with different degrees of freedom (df). The higher the value of these indices, the better the model [75,76].
SEM allows creating a statistical description of complex causal relationships. Compared with other commonly used methods (such as regression analysis and Write's path analysis), SEM permits the inclusion of more complex relationships between variables, including those with exogenous variables, in the model. The correlation coefficient analysis allows the evaluation of the occurrence and complexity of the relationship within the studied process or phenomenon. Pearson linear correlation coefficients (Table 2) revealed correlations between the analyzed traits, indicating a complex relationship within the analyzed data.

Model Specification and Estimation
SEM revealed relationships between all variables included in Table 1  One-way arrows illustrate causal relationships between variables. Path coefficients are marked by lambda parameters. The semicircular bidirectional arrow represents the presence of covariance between exogenous variables in the model. Time represents the duration of anther culture; CHG_DMV and CG_DMV represent the DNA demethylation of the CHG and CG sequence contexts, respectively, determined using metAFLP markers [9,48]; SV represents sequence variation One-way arrows illustrate causal relationships between variables. Path coefficients are marked by lambda parameters. The semicircular bidirectional arrow represents the presence of covariance between exogenous variables in the model. Time represents the duration of anther culture; CHG_DMV and CG_DMV represent the DNA demethylation of the CHG and CG sequence contexts, respectively, determined using metAFLP markers [9,48]; SV represents sequence variation evaluated using the metAFLP approach; DNM-DM is the difference between de novo methylation and DNA demethylation, evaluated using the DArTseqMet approach [54,56]; GP is the number of green plants regenerated per 100 anthers via in vitro anther culture of barley [43]; Cu 2+ and Ag + represent ion concentrations present in the tissue culture medium [9,43]. The F1010.940 is the ATR-FTIR spectral range assigned to β-glucans CHG_DMV. The CHG_DMV directly affects SV and GP. Time of anther culture also plays an essential role in the action of the ion. Silver ions directly influence CHG_DMV, and the DNM-DM is involved in GPs. Moreover, DNM-DM may also influence SV. The action of silver ions is controlled by Time of anther culture. The F1010.940 is to be linked to the CHG_DMV.

Model Matching
The quality of the structural equation model is assessed by analyzing the fit of the relationship system described by the postulated model to the mutual relationship system within the data derived from the real assessment of a given process or phenomenon. Various fit statistics are used for this purpose.
The tested model met all the convergence criteria. The fit statistics (Table 3) were proper. The chi-square statistic indicated that the postulated model satisfactorily confirmed the empirical data. The SRMR value was relatively low but failed to meet the criterion defined by Hu and Bentler [77,78]. In multiple regression analysis, the goodness-of-fit index (GFI) and adjusted goodness-of-fit index (AGFI) can be interpreted analogously to the coefficient of determination [79]. Values of both AGFI and GFI were high, and GFI was close to the lowest limit (0.9), implying that the postulated model describes approximately 90% of the variability observed in the dataset. The normed fit index (NFI) and relative fit index (RFI) did not exceed the limit reported in the literature (0.95) [77,78]. The same was true for the incremental fit index (IFI), non-normed fit index (NNFI) and comparative fit index (CFI), but not for NNFI. However, applying the rule described by MacCallum and colleagues [80], obtained RMSEA results testified that the postulated model showed a good fit.

Estimation of Model Parameters
After confirming that the postulated model is correct (i.e., it describes the system of dependencies within empirical data), the estimated values of the structural equation model parameters were used for a detailed description of the type and nature of causal relationships.
The values of individual path coefficients were estimated. Some of these coefficients were not statistically significant; however, their removal resulted in a substantial reduction in the quality of matching between the postulated model and the empirical data. Therefore, we decided to retain all the paths in the model (Table 4). Absolute values of the standardized path coefficients explaining the relationships between various variables were calculated. The highest value was obtained for the relationship between CHG_DMV and SV, followed by that between Ag + and CHG_DMV, Cu 2+ and DNM_DM, Time and CG_DMV, CG_DMV and GP, Cu 2+ and CHG_DMV and lastly Ag + and CG_DMV.
No significant covariance was found between exogenous variables. The estimated variances for random components (δ1-δ7) and variances of exogenous variables were significantly different from zero. The significance of model parameters expressing direct, indirect and total effects on standardized coefficients is summarized in Table 5. The CHG_DMV variable showed the highest dependence on Ag + (β = −0.695) and Cu 2+ (β = −0.481) ions, and these were direct effects. The SV variable showed the greatest dependence on CHG_DMV (β = −0.985), an immediate (direct) effect, followed by Ag + (β = 0.644) and Cu 2+ (β = 0.356) ions, which represented indirect effects. The GP variable showed the highest dependence on CG_DMV (β = −0.563; direct effect), followed by Ag + ions (β = 0.245; indirect effect). The relationship of the CG_DMV variable with Time (β = 0.591) and Ag + ion concentration (β = −0.402), which are both direct effects, showed the greatest weight. The DNM_DM variable depended the most on Cu 2+ (β = 0.646) and Ag + (β = 0.220) ion concentrations (direct effects).

Discussion
The addition of Cu 2+ and Ag + ions to the induction and regeneration medium influences plant regeneration via anther culture. The process of tissue culture is predisposed to SV [48] because of changes in DNA methylation patterns [56]. Cu 2+ and Ag + ions affect the mitochondrial complex IV [81,82] and the functioning of the Yang cycle [58] (and consequently SAM production). Cu 2+ ions also induce mutations in CG and CHG sequence contexts [50,56] because of oxidative stress-triggered modification of 5 mC [60]. However, tissue culture-induced SV probably starts much earlier. The presence of mannitol in the culture medium causes starvation stress. Under this condition, β-glucans, which are present between the cell wall [50] and cell membrane of embryogenic microspores [22], are probably utilized as a source of glucose [48] for the production of acetyl-coenzyme A, which is required for the Krebs cycle [83]. Additionally, DNA methylation pattern changes induced during the gametophytic to sporophytic switch [5,84], as well as Cu 2+ and Ag + ion concentrations in the culture medium, determine the number of green plants regenerated via anther culture. Finally, our model also assumed that Time influences SV and GP. Although the link between tissue culture-induced SV and green plant regeneration was not yet established, some elements of such a model had been published previously [48,50,56], which were linked using the SEM approach in this study.
According to the structural equation model, β-glucans (F1010.940 FTIR spectral range) and CHG_DNMV negatively influenced CHG_DMV and SV, respectively. We failed to include the F710.690 FTIR spectral range (preliminarily assigned to SAM) into the structural equation model. This could be explained by the relatively small sample size and low cellular SAM concentration (<100 mM), which is technically challenging to measure [85]. Fluctuations in SAM concentration may not be sufficient to be significant in a model including many variables under small sample size conditions. However, under starvation conditions, the concentration of SAM increases because of the universal energy-sensing regulator Snf1, which is the yeast (Saccharomyces cerevisiae) ortholog of AMP-activated protein kinase (AMPK) [86].
Effects of Cu 2+ and Ag + ion concentrations on CHG_DMV and Time were less pronounced than those of β-glucans. Thus, demethylation of the CHG context is not under the robust control of metal ions but is influenced by β-glucans and SV. The model also assumes that Time influences CG_DMV via the action of Cu 2+ ions. However, the effects describing this relation are not very strong. The effect of Ag + ions, (acting as a mediator), on Time and CG_DMV was much more pronounced than that of Cu 2+ ions. This is in agreement with the fact that Ag + ions may replace Cu 2+ ions, for example, in the mitochondrial complex IV, which affects the methionine cycle. Interestingly, in contrast to our results of mediation analysis [50], changes in CG_DMV did not contribute to SV.
All of the relationships that contributed to SV were based on metAFLP characteristics and did not seem to be linked to GP. However, by using DArTseqMet markers, we showed that the difference between de novo methylation and DNA demethylation (DNM-DM), influences SV. Moreover, the DNM-DM variable was affected much more by Cu 2+ ions than by Ag + ions. The model predicts that the number of green plants is under the limited control of the DNM-DM variable. Thus, the model presented in this study can explain the SV and is congruent with the role of DNA methylation in SV. We assumed that CHG_DMV rather than CG_DMV participates in SV. However, further investigation is needed to understand whether SV is mainly caused by point mutations or transposable elements. It should be stressed that CHG_DMV only slightly affected GP. This suggests that green plant regeneration is relatively independent of changes in DNA methylation patterns due to anther culture.
It must be emphasized that the model presented in this study might not be able to identify all the relationships affecting SV and GP, possibly because of the limitations of FTIR spectroscopy, non-normal distribution of some variables and/or small sample size. Additional factors need to be incorporated into the model to link these phenomena.
It is always important to question whether the results of SEM analysis are reliable. Based on the analysis of the obtained goodness-of-fit coefficients (Table 4), we conclude that the postulated model is correct and adequately describes the complex relationships between the analyzed variables. The presented model indicates that by manipulating Cu 2+ and Ag + ion concentrations, we can predict the biochemical factors that induce SV and to some extent increase the GP. We also demonstrated that the callose layer is a crucial participant in the model under varying Cu 2+ and Ag + concentrations. It would be of value to test whether there is a correlation between the amount of callose present in embryogenic microspore culture and the number of regenerated plants. If so, the presence of callose could be used as an indicator of the capacity of microspores to switch from the gametophytic to sporophytic fate. Callose could also be used to identify genotypes suitable for anther-dependent plant regeneration. Thus, understanding the relationships that influence sequence (or total tissue culture-induced) variation and GP is of practical value to plant breeders and scientists working on tissue culture.
Possibly the most important outcome of the structural equation model is the opportunity to predict the level of outcomes (start and end of the path, see Table 5) if the model parameters (independent variables: start of the path) are changed, thus analyzing direct effects. For example, in this study, an increase in F1010.940 by one unit caused a reduction in CHG_DMV by 6.23 units, whereas a reduction in CHG_DMV decreased the SV by 9.8 units. This implies that Cu 2+ ions have a limited influence on CHG_DMV, as increasing the Cu 2+ ion concentration by one unit decreased CHG_DMV by 0.037 units. Thus, the model predicts that when β-glucans act as the source of carbon, the larger the subintinal callose layer (under starvation conditions), the lower the CHG_DMV and SV. Furthermore, the role of Time is also limited, as an increase in its value by one unit increased CHG_DMV only by 0.014 units.
Detailed analysis of estimates presented in Table 5  While SV was evaluated using metAFLP markers [48,50], GP was examined using MSAP markers [56]. The two marker systems are based on distinct marker platforms utilizing different endonucleases. The metAFLP is based on KpnI and Acc65I isoschizomers, whereas the MSAP approach and the variant involving the DArTseqMet approach are based on HpaII and MspI endonucleases. Additionally, the metAFLP and DArTseqMet methods recognize distinct DNA methylation patterns; while metAFLP can distinguish between methylation changes affecting CHH, CHG and CG contexts, DArTseqMet can capture only CHG and CG alterations. Moreover, marker systems recognize different restriction sites. Genetic mapping in cereals demonstrated that many AFLP markers based on the EcoRI endonuclease mapped to genomic regions other than those generated using PstI [87]. Although undocumented, we speculate that KpnI-Acc65I and HpaII-MspI endonuclease pairs will generate markers with non-random distribution along the chromosomes, thus potentially mapping to different genomic regions. Thus, the two marker systems may identify distinct phenomena, similar to the models depicting SV [48,50] and green plant regeneration [56]. Implementation of distinct marker systems in combination with FTIR spectroscopy will be highly valuable for evaluating relationships between multiple factors affecting tissue culture and for predicting the roles of these factors in SV and green plant regeneration.

Conclusions
We evaluated the structural equation model describing complex relationships between different factors including DNA methylation changes, SV, β-glucans, Cu 2+ and Ag + ions, Time and GP. The model was constructed on theoretical bases concerning DNA methylation changes, sources of DNA mutations, the effect of Cu 2+ and Ag + ion concentrations on DNA methylation and GP. The theoretical background was also supported by our moderation and mediation analysis, partly linking the variables. Although the structural equation model was evaluated based on a relatively small sample size (because of experimental limitations), analysis of the model using a wide range of measures of fit suggests that the model is reliable. Nevertheless, the use of this model on larger sample sizes and different cereal species is required to verify its general application in anther culture-derived regenerants. The presented model predicts the outcome of a change in tissue culture conditions.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.