Selection of Soybean Genotypes under Drought and Saline Stress Conditions Using Manhattan Distance and TOPSIS

The search for soybean genotypes more adapted to abiotic stress conditions is essential to boost the development and yield of the crop in Brazil and worldwide. In this research, we propose a new approach using the concept of distance (or similarity) in a vector space that can quantify changes in the morphological traits of soybean seedlings exposed to stressful environments. Thus, this study was conducted to select soybean genotypes exposed to stressful environments (saline or drought) using similarity based on Manhattan distance and the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method. TOPSIS is a multi-criteria decision method for selecting the best alternative using the concept of distance. The use of TOPSIS is essential because the genotypes are not absolutely similar in both treatments. That is, just the distance measure is not enough to select the best genotype simultaneously in the two stress environments. Drought and saline stresses were induced by exposing seeds of 70 soybean genotypes to −0.20 MPa iso-osmotic solutions with polyethylene glycol–PEG 6000 (119.6 g L−1) or NaCl (2.36 g L−1) for 14 days at 25 °C. The germination rate, seedling length, and seedling dry matter were measured. We showed here how the genotypic stability of soybean plants could be quantified by TOPSIS when comparing drought and salinity conditions to a non-stressful environment (control) and how this method can be employed under different conditions. Based on the TOPSIS method, we can select the best soybean genotypes for environments with multiple abiotic stresses. Among the 70 tested soybean genotypes, RK 6813 RR, ST 777 IPRO, RK 7214 IPRO, TMG 2165 IPRO, 5G 830 RR, 98R35 IPRO, 98R31 IPRO, RK 8317 IPRO, CG 7464 RR, and LG 60177 IPRO are the 10 most stable genotypes under drought and saline stress conditions. Owing to high stability and gains with selection verified for these genotypes under salinity and drought conditions, they can be used as genitors in breeding programs to obtain offspring with higher resistance to antibiotic stresses.


Introduction
Soybean (Glycine max (L.) Merrill.) is among the most important crops in the world, constituting one of the largest sources of vegetable oil and animal protein [1,2]. The main producing countries are Brazil, the USA, Argentina, China, and India [3]. Abiotic stresses such as drought and salinity negatively affect world soybean production, constituting limiting factors for soybean cultivation, especially in tropical and semi-arid regions [4,5]. In many situations, crop sowing is performed under inappropriate soil moisture conditions to support seed germination, or in areas with excess salts in the soil or irrigation water [4]. Currently, one-third of the world's cultivated land, 7% of the total world land, and 50% of the irrigated land are affected by salinity [6]. Therefore, the sustainability of agriculture production in many areas of the world is at risk due to soil salinization and water scarcity during the crop growing season.
Low water content and salt excess in the soil at sowing time cause delayed and reduced seed germination, unequal seedling emergence, and unsatisfactory stand establishment, which results in crop yield reductions [7,8]. Drought and salinity affect seed germination and seedling growth by creating highly negative water potentials, thus preventing water uptake by the seeds and plants [8][9][10][11]. Salinity may also cause direct phytotoxic effects of Na + and Cl − ions [9]. Therefore, drought and salt tolerance testing in the initial stages of plant growth is important because a seed with more rapid germination under water or salt stress conditions may be expected to achieve rapid seedling establishment, resulting in higher yields.
Many factors affect plant responses to drought or salt stress, such as plant genetics, timing, the intensity and duration of applied stress, and environmental factors that determine the genotype versus environment interaction [11][12][13][14]. Genetic differences in tolerance to abiotic stresses in soybean genotypes have been reported in other studies [4,[15][16][17], which may be useful in identifying genotypes more adapted to sowing under abiotic stress conditions. In the research by Zuffo et al. [4], the authors proposed a multitrait tool to select the best soybean genotypes exposed to drought and saline stresses. They investigated the stability of 46 soybean genotypes using the stability index. Among the results presented, they mention that this index can be used under different stressful environmental conditions to quantify the genotypic stability of soybean genotypes.
In this research, we discussed how the concept of distance (or similarity) in a vector space could be used to evaluate changes in the characteristics of soybean genotypes when subjected to stressed environments.
Two objects are similar if they have characteristics in common. These objects are represented as vectors in a vector space model (V.S.M.) [18]. Each component is a feature or a characteristic of the object and represents a dimension in the vector space. A real n-dimensional vector x (i.e., x ∈ R n ) is expressed to its components as x = x 1 , x 2 , . . . , x n , where the symbol R represents the real number set. In this text, the features of soybean genotypes are the following variables: germination (GERM), shoot length (SL), root length (RL), total length (TL), shoot dry mass (SDM), root dry mass (RDM) and total dry mass (TDM). Thus, each sample of the dataset is represented as x ∈ R 7 , that belongs to a 7-dimensional vector space.
In the VSM, the similarity is related to the distance between vectors [19]. In other words, the closer two objects are, the more similar they are. Classic machine learning algorithms, such as a k-nearest neighbor, k-means, support vector machine, and others, use distance metrics to measure similarity [20,21]. There are several ways to calculate the distances between vectors. Some of them are Euclidean, Manhattan, Chebyshev, Mahalanobis, Cosine, Hamming, Jaccard, and Spearman [22].
In this research, the soybean genotype samples are drawn in the VSM. The Manhattan distance is used for compute similarity, as it is more suitable in higher dimensions. It is calculated for different stressful environments. To combine the distance measures and choose which genotype has the shortest distance (higher similarity with the control sample) in both stressed environments, we propose using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). This method is a multi-criteria decision-making approach used in several areas [23]. It is preferable to other decision-making approaches because (i) it is suited to a large number of attributes and alternatives; (ii) it requires little subjectivity in the definition of input values; and (iii) it has consistency in the comparison of the alternative ranking [24].
The main objective of this research is to select soybean genotypes exposed to stressful environments (saline or drought) using similarity based on Manhattan distance and the TOPSIS method.

Results
To illustrate the proposed approach, Figure 1 shows a further example considering only two normalized variables, i.e., TDM and TL in Figure 1a, "97R73 RR" and "HO Paranaiba IPRO" are the genotypes with the shortest and greatest distances in the Control/Saline comparison, respectively. In Figure 1b, "AS 3575 IPRO" and "CG 8166 RR" are related to the shortest and greatest distances in the Control/Drought comparison, respectively. subjectivity in the definition of input values; and (iii) it has consistency in the comparison of the alternative ranking [24]. The main objective of this research is to select soybean genotypes exposed to stressful environments (saline or drought) using similarity based on Manhattan distance and the TOPSIS method.

Results
To illustrate the proposed approach, Figure 1 shows a further example considering only two normalized variables, i.e., TDM and TL in Figure 1a, "97R73 RR" and "HO Paranaiba IPRO" are the genotypes with the shortest and greatest distances in the Control/Saline comparison, respectively. In Figure 1b, "AS 3575 IPRO" and "CG 8166 RR" are related to the shortest and greatest distances in the Control/Drought comparison, respectively.  Figure 2 shows the values of the Manhattan distances obtained in comparing normalized variables between control and abiotic stress environments and the score obtained by the TOPSIS method. For this experiment, the criteria weights (distances) were equal to 0.5. It is noteworthy that the TOPSIS method was employed for the Manhattan distances and not for the variables.  Figure 2 shows the values of the Manhattan distances obtained in comparing normalized variables between control and abiotic stress environments and the score obtained by the TOPSIS method. For this experiment, the criteria weights (distances) were equal to 0.5. It is noteworthy that the TOPSIS method was employed for the Manhattan distances and not for the variables.   Table 1 shows the ten best soybean genotypes (cultivars) in ascending order, considering the Manhattan distances obtained in the Control/Saline and Control/Drought comparison. The presented results make it clear that no genotype is better than another. Only the genotype ST 777 IPRO appeared on both lists. This trade-off makes the process of choosing the best genotype a difficult task. Therefore, it is necessary to use the TOPSIS method, which has the power to join these distances to decide which genotype performs better in both environments of abiotic stress.  Table 1 shows the ten best soybean genotypes (cultivars) in ascending order, considering the Manhattan distances obtained in the Control/Saline and Control/Drought comparison. The presented results make it clear that no genotype is better than another. Only the genotype ST 777 IPRO appeared on both lists. This trade-off makes the process of choosing the best genotype a difficult task. Therefore, it is necessary to use the TOPSIS method, which has the power to join these distances to decide which genotype performs better in both environments of abiotic stress.  Table 2 shows the ten best genotypes selected according to the TOPSIS score in descending order. In addition, the Manhattan distances of each genotype in each stressed environment are shown, as well as the rank position according to the distances shown in Figure 3.   The genotypes respond differently in each environment of abiotic stress (Tables 1  and 2). The results shown in Figure 2 assumed the same weight in the TOPSIS method for both Control/Saline and Control/Drought comparisons. To verify how this weighting affects the selection of the genotypes provided by the TOPSIS, the weights of the distances (criteria) are varied from 0.1 to 1, remembering that the sum of the weights is equal to 1, according to Section 4.3. Some results obtained with this experiment are shown in Table 3. Finally, Figures 3 and 4 show the original values (without normalization) of the variables GERM, SL, RL, TL, SDM, RDM, and TDM for the four best and worst genotypes, respectively, selected by TOPSIS, considering the same weight for both criteria. This comparison is important to determine if the Manhattan distance and the TOPSIS method are selecting those genotypes that suffer fewer changes in abiotic stress environments.    Table 4 shows the percentage changes (increase or decrease) considering the control and abiotic stress environments. Values are displayed only for the best and worst genotypes according to the TOPSIS selection. The negative sign indicates an increase in the value of the variable.

Discussion
Our experience reveals that objects with close features are more similar. Distance metrics mathematically verify this notion [19,25]. In this work, it is being investigated whether these metrics can be used to measure the similarity between soybean genotypes in the control environment and abiotic stress environments. For this, it was necessary to model the obtained data in a 7-dimensional vector. We chose the Manhattan distance to calculate the similarity between the samples because it presents better results in highdimensional vector spaces [18,[20][21][22]25]. Since the genotypes show different responses in saline and drought-stressed environments, we also included the TOPSIS method to select the one with the greatest similarity in both environments. Yao et al. [26], using the TOPSIS approach, compared seeds of Bupleurum chinense and found that the green ones had a good germination characteristic and were recognized as the superior group, followed by the yellow, brown, and black ones. Successful application of the TOPSIS in dealing with complicated issues in managing crop priority planning has been employed in the soybean crop [27].
As a result of using the Manhattan and TOPSIS distance, Figure 1 illustrates well how distance is related to similarity. The genotype "97R73 RR" (Figure 1a) has the shortest distance in the saline stress environment; therefore, it has greater similarity with the average sample of the control environment. The values of the TDM and TL variables are very close for the samples in the control and saline stress environment. On the other hand, the genotype "HO Paranaiba IPRO" has the greatest distance in the saline stress environment. Therefore, it was observed that, in the stress environment, the values of the TDM and TL variables are very different from those in the control environment. The analogous conclusion can be observed in Figure 1b in the drought-stress environment. In this case, it is important to note that, although the distance was shown only for two variables, the distance metric is generalizable for vector spaces of any dimension [19,25]. The difficulty of selecting genotypes through adaptability studies under abiotic stress conditions such as drought, salinity, and aluminum toxicity has been shown in soybean [3][4][5]14,16], sorghum [6], wheat [7], and corn [11]. Several methods are available to evaluate groups of genotypes in different environments. However, it is still difficult to select the best genotypes because the responses are very variable, so new approaches, such as those described in our work, are important.
The results presented in Figure 2 and Tables 1 and 2, make it clear that no genotype is absolutely better than another. Some genotypes are more similar to the control samples in the saline stress environment, while others are more similar to the control samples in the drought environment. This is because genotypes respond differently to abiotic stress [3][4][5]14,16]. The genotypes "97R73 RR" and "AS 3575 IPRO" are the most similar in saline and drought stress environments, respectively. However, the genotype "AS 3575 IPRO" occupies the 65th position when considering distance in the Control/Saline comparison. On the other hand, the genotype "97R73 RR" occupies the 28th position in the Control/Drought comparison. The response to different stresses is always variable, as verified in the present work, hence the difficulty in selecting most crops.
Analyzing the results in Table 2, we noticed that none of these genotypes was selected by the TOPSIS method. Moreover, the first genotype selected, i.e., "RK 6813 RR", occupies the 12th and 4th position concerning the Control/Saline and Control/Drought comparisons, respectively. From the Rank column of Table 2, we conclude that TOPSIS makes a balanced selection of genotypes. If we add the Manhattan distances (4th and 5th columns), we notice that the sum values increase, although the individual values of the distances do not show a different order.
Considering the results in Table 3, which show the selection by TOPSIS when varying the criteria weights (distances), we observed that the genotype "ST 777 IPRO" was selected in all cases. On the other hand, the best genotype, "RK 6813 RR", was not selected in the last case, when the weights were 0.1 and 0.9 for the saline and drought environments, respectively. These genotypes are less dependent on the weights assigned to distances. In other words, these genotypes present more stability relative to abiotic stress environments and reduced distance values. However, the genotype that presents greater stability, and close similarities to both stress environments, is the genotype "AS 3610 IPRO" (see Figure 2). However, their distances for stress environments are greater than those of the genotypes "RK 6813 RR" and "ST 777 IPRO". The genotype "AS 3610 IPRO" occupies the 24th position, considering the TOPSIS score. It was shown that the TOPSIS method had practical meaning, confirming the applications made by Li et al. [28].
From the original values (without normalization) of the variables GERM, SL, RL, TL, SDM, RDM, and TDM, it can be seen from the results shown in Figures 3 and 4 that the TOPSIS method made an appropriate selection. The values of these variables for the best (RK 6813 RR) and worst (CG 8166 RR) genotypes, respectively, are shown in Figures 3a and 4a. We noticed that the changes in the values of the variables are much more pronounced for the worst genotype than for the best. In other words, the worst genotype, according to the TOPSIS method, is the one that suffered the most modification in abiotic stress environments. Xue et al. [29] demonstrated that the TOPSIS method could be used efficiently to evaluate the total content of bioactive compounds of different grains, thus providing a database for manufacturing companies to optimally select the germination period.
In Table 4, for example, we note that the variable GERM suffered a decrease of 3.03% and 5.55% in the saline and drought environment, respectively, for the best genotype. On the other hand, for the worst genotype, this decrease was 23% and 14%, respectively. Even more drastic was the change concerning the RL variable. For the best genotype, there was an increase of 10.29%, while for the worst, there was a decrease of over 40%. Similar changes are observed for all other variables, showing that the selection provided by the Manhattan distance combined with the TOPSIS method is correct.
By analyzing the most important strengths and weaknesses of the TOPSIS method, we can say that the method shows the accuracy of results when evaluating a large number of alternatives. Expert knowledge is a basic source of information for making effective managerial decisions in the selection process of plants subjected to abiotic stress.
In addition, a wide variety of techniques have been used to introduce effective factors in the selection of superior genotypes. However, the best or most suitable method is not clearly defined, so various methods are used to ensure the correct decision. The constant evaluation in stress environments combined with selection strategies is constantly sought among soybean breeders to promote the selection of genotypes with the best performance. Therefore, the simultaneous use of methods and the presentation of their most important criteria is the only solution to identifying the best genotypes.

Plant Material and Stress Treatments
Seeds from a total of 70 midwestern Brazilian commercial soybean genotypes [Glycine max (L.) Merrill.] listed in Table 5 were produced under field conditions at Cassilândia, MS, Brazil (19 • 05 16" S, 51 • 48 04" W, and an altitude of 480 m), during the 2019 to 2020 growing season, and used in this study. Minimum and maximum air temperatures during the growing season were 21.7 and 35.3 • C, respectively, and mean air relative humidity ranged from 51 to 83%. The harvest was manually performed at the R8 stage (full maturity), and the plants were air-dried at room temperature for 96 h. The seeds were extracted by hand, sieved through round hole sieves with 6.00 mm diameters, and then stored in sealed paper bags at 13 • C and 35% moisture content until use. Before starting the experiment, the water content, thousand seed weight, and germination rate were determined, as described in the Official Rules for Seed Analysis [30]. The results obtained for the soybean genotypes are shown in Table 5.  The seeds were previously disinfected by immersion for 10 min in 1% sodium hypochlorite solution (v/v), washed in running water, and placed to germinate under stressful (drought and saline stress) and non-stressful (control) conditions. The drought and saline stresses were induced by exposing seeds from each soybean genotype to solutions with an osmotic potential of −0.20 MPa prepared with polyethylene glycol (PEG-6000) and sodium chloride (NaCl), respectively. The amount of PEG-6000 (119.57 g L −1 ) added to obtain the solution with an osmotic pressure of -0.20 MPa was determined by the equation of Michel and Kaufmann [31]: Ψs = [−(1.18 × 10 −2 ) × C − (1.18 × 10 −4 ) × C2 + (2.67 × 10 −4 ) × C × T + (8.39 × 10 −7 ) × C2 × T]/10, where Ψs is the osmotic potential (MPa), C is the concentration (g L −1 of PEG-6000), and T is the temperature ( • C). The amount of NaCl (2.357 g L −1 ) added to obtain the osmotic pressure of -0.20 MPa was calculated by the van't Hoff equation [32]: Ψs = −R × T × C × i, where R is the universal constant of noble gas (0.008314 MPa mol −1 K −1 ), T is the absolute temperature (273.15 + • C), C is the molar concentration of the solute (mol L −1 ), and i is the van't Hoff factor, that is the number of ions released when the solute is dissolved in water (i.e., for NaCl this value is 2.0 (Na + and Cl − )). Distilled water with an osmotic potential of 0.00 MPa was used as a control. Using an osmotic solution at -0.20 MPa efficiently discriminates the tolerance differences between soybean genotypes [4].

Germination Conditions and Measured Variables
Four replicates of 50 seeds from each soybean genotype were placed to germinate on three sheets of germination test paper towels, previously moistened with distilled water (control), PEG or NaCl solutions of -0.2 MPa, in the proportion of three times the mass of the dry substrate. The paper towel sheets were then turned into rolls and packaged into plastic bags to prevent evaporation and to maintain the relative humidity close to 100%. Germination was conducted in a growth chamber under 12/12 h photoperiod (light/darkness), with a light intensity of 240 µmol m −2 s −1 and a temperature of 25 • C for 14 days. Seeds were considered germinated when the primary root was longer than 10.0 mm. Germinated seeds were recorded 14 days after the test installation.
After 14 days of exposure to drought and salt stresses, the shoot length (SL), primary root length (RL), and total seedling length (TL) were measured using a meter scale. The shoot dry matter (SDM), root dry matter (RDM), and total seedling dry matter (TDM) were recorded after oven drying at 85 • C for 48 h.
Manhattan distance can be understood as the shortest route taken by a taxi driven in a city whose streets are perpendicular. On the other hand, the Euclidean distance is the intuitive notion of distance used by us. Aggarwal, Hinneburg, and Keim [25] establish that the Manhattan distance is the most adequate to contrast the difference between the nearest and farthest vectors from a fixed vector.
Since the variables can be on different scales, applying a pre-processing step named normalization is necessary. This consists of dividing the variables by their maximum value, i.e., x i = x i /max(x i ), where x i is the value of the variable for the i-th sample and x i is the respective normalized value. This ensures that x i is in a range between 0 and 1. In addition, the scale is eliminated, and the variable becomes dimensionless. (1) For = 1 and = 2, this distance is named Manhattan and Euclidean, respectively. For comparison, Figure 5 shows these distances in the plane, where d( , , 1) = + ℎ and d( , , 2) = . Manhattan distance can be understood as the shortest route taken by a taxi driven in a city whose streets are perpendicular. On the other hand, the Euclidean distance is the intuitive notion of distance used by us. Aggarwal, Hinneburg, and Keim [25] establish that the Manhattan distance is the most adequate to contrast the difference between the nearest and farthest vectors from a fixed vector.

TOPSIS
The TOPSIS method can be employed in six steps, as described below [33]. Let X = x ij m×n be a decision matrix with m alternatives and n criteria, where x ij is the value of the alternative i concerning criterion j, do: Step 1. Normalizes the decision matrix: Step 2. Given a criteria weight vector w = [w 1 , w 2 , . . . , w n ], obtain the weighted normalized decision matrix: v ij = r ij w T j , such that ∑ w j = 1; Step 3. Determine the worst alternative A w (negative-ideal solution) and best alternative A b (positive-ideal solution) as: Step 4. Calculate the Euclidean distance from each alternative i to the worst and best alternatives: , respectively; Step 5. Calculate the relative closeness from each alternative i to the worst alternative: The coefficient C i provides a TOPSIS score to rank the alternatives. The higher its value, the closer that alternative is to the ideal solution. In the context of this study, the alternatives are the genotypes, and the criteria are the Manhattan distances.

Proposed Approach
Our objective is to verify which genotypes are less sensitive to changes in the stressed environment, referring to salinity and drought. As stated in Section 4.3, we can use distance metrics to measure the similarity between objects. The experiments carried out with the soybean genotypes considered three environments: control, saline, and drought. We know that stressed environments alter plant responses. Therefore, there will be changes in the values of the measured variables. We can verify how much these modifications altered the characteristics of the plants. To do so, we can model it as a VSM problem. We calculated the distance between the control samples' mean and the stressed samples' mean for each genotype. Then, we compare the distances. Those genotypes that present the smallest distance to the control samples are the genotypes less sensitive to saline and/or drought stress. Our approach considers the Manhattan distance because, according to Aggarwal, Hinneburg, and Keim [25], it is more suitable to contrast the difference between the nearest and farthest vectors from a fixed vector.
A second approach aims to combine distances for the comparison in the stressed environments, that is, Control/Saline or Control/Drought. In this way, we will know which genotypes are more similar to the control samples in both stressed environments. To combine the distances of saline (d s ) and of drought (d d ) environments from the control, we propose the use of the TOPSIS method, where the first and second column of the decision matrix is composed of values d s and d d , respectively, for each genotype (alternative).
Although TOPSIS can be applied directly in the measured variables, this would not be adequate. Therefore, there is no way to determine whether a greater or lesser value of the magnitude of these variables implies a better or worse genotype. Therefore, it is necessary to use the idea of similarity (or distance). So, we are sure that a smaller distance indicates greater similarity with the control sample.

Conclusions
Our results show that the genotypic stability of 70 soybean genotypes can be quantified by the TOPSIS method and Manhattan distance when comparing drought and salinity conditions in relation to the non-stressful environment (control). Based on the TOPSIS method, we can select the best soybean genotypes for environments with multiple abiotic stresses. Among all soybean genotypes tested, RK 6813 RR, ST 777 IPRO, RK 7214 IPRO, TMG 2165 IPRO, 5G 830 RR, 98R35 IPRO, 98R31 IPRO, RK 8317 IPRO, CG 7464 RR, and LG 60,177 IPRO are the 10 most stable under drought and saline stress conditions. Furthermore, from the point of view of plant breeding, these selected genotypes can be used as parents to obtain genotypes resistant to drought and salinity.  Data Availability Statement: All available data can be obtained by contacting the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.