Fuzzy Model Identiﬁcation Using Monolithic and Structured Approaches in Decision Problems with Partially Incomplete Data

: A signiﬁcant challenge in the current trend in decision-making methods is the problem’s class in which the decision-maker makes decisions based on partially incomplete data. Classic methods of multicriteria decision analysis are used to analyze alternatives described by using numerical values. At the same time, fuzzy set modiﬁcations are usually used to include uncertain data in the decision-making process. However, data incompleteness is something else. In this paper, we show two approaches to identify fuzzy models with partially incomplete data. The monolithic approach assumes creating one model that requires many queries to the expert. In the structured approach, the problem is decomposed into several interrelated models. The main aim of the work is to compare their accuracy empirically and to determine the sensitivity of the obtained model to the used criteria. For this purpose, a study case will be presented. In order to compare the proposed approaches and analyze the signiﬁcance of the decision criteria, we use two ranking similarity coefﬁcients, i


Introduction
Decision support methods are a significant branch of operational research. They are designed to support decision-making processes concerning an extensive range of problems. However, the main objective is to analyze complex decision-making processes, which very often requires the engagement of domain experts. The main challenge is also the analysis of many often conflicting decision criteria [1,2]. In those cases, multicriteria decision analysis (MCDA) methods are used to help the decision-maker to the right solution. However, there is another challenge with choosing the proper MCDA method because it is a problematic task [3,4]: Not every method will be able to handle uncertain or partially incomplete data [5]. Another problem is that most MCDA methods require knowledge of the importance of criteria weighting [6].
Before further discussion, let us recognize the difference between uncertainty and incompleteness. The cause of the uncertainty is the noise joined with the original value with a particular distribution. Experts apply different methods, e.g., interval number [7], fuzzy numbers [8,9], or gray numbers [10], to perform a calculation based on uncertain data. Incompleteness, on the other hand, is a lack of data.

Fuzzy Set Theory
The idea of Fuzzy Set Theory was introduced by Lofti Zadeh in [42]. Fuzzy Set Theory is used in many scientific fields and could be especially useful for solving MCDA problems [43][44][45]. Here, we present some definitions and basic concepts of the Fuzzy Set Theory which are necessary to understand COMET method [46,47]. Definition 1. The fuzzy set A in a certain non-empty space of solutions X is defined by (1), where is a membership function of the fuzzy set A. This function indicates the degree of the membership of the element in the set A. µ A (x) = 1 means full membership, 0 < µ A (x) < 1 means partial membership, and µ A (x) = 0 means no membership at all.

Definition 2.
The triangular fuzzy number A(a, m, b) is a fuzzy set whose membership function is defined as (3), and it is visualized in Figure 1.
and fulfill characteristics (4) and (5):  Definition 3. The support of a TFN-subset of the A set in which all elements have a non-zero membership value in the A set (6).
Definition 4. The core of a TFN is a one-element fuzzy set (singleton) with membership value 1 (7).
Definition 5. The fuzzy rule-it is based on the Modus Ponens tautology. The IF-THEN, OR, and AND logical connectives are used in the reasoning process.
Definition 6. The rule base-it includes logical rules defining the relations in the system between the input and output sets.

The COMET Method
The Characteristic Objects METhod (COMET) is based on fuzzy logic and triangular fuzzy sets. The accuracy of the COMET method was verified in previous works [48][49][50]. The formal notation of the COMET must be recalled based on the work in [36], and Figure 2 presents the flowchart of the COMET method as a summary.
Step 0: Initiate the process Step 1: Modeling structure of the problem Step 2: Expert evaluation of the Characteristic Objects Step 5: Evaluation of the set of alternatives  Step 0. Initiate the process-it is a preparatory stage, which aims to identify the problem to be further analyzed clearly. In the beginning, it is necessary to define the purpose of the research and determine the specificity of the MCDA problem. Then, we should select an expert or a group of experts whose task will be to select decision alternatives and criteria for their evaluation. After selecting a group of alternatives, a set of criteria that should be taken into account in further analysis should also be selected.
Step 1. Definition of the space of the problem-the dimensionality of the problem is determined by the expert, which selects r criteria, C 1 , C 2 , . . . , C r . For each criterion C i , e.g., {C i1 ,C i2 , . . . ,C ic i } (16) a set of fuzzy numbers is carefully selected: where C 1 , C 2 , . . . , C r are the ordinals of the fuzzy numbers for all criteria.
Step 2. Generation of the characteristic objects-the characteristic objects (CO) are obtained with the usage of the Cartesian product of the fuzzy numbers' cores of all the criteria (17): As a result, an ordered set of all CO is obtained (18): where t is the count of COs and is equal to (19): Step 3. Evaluation of the characteristic objects-the Matrix of Expert Judgment (MEJ) is determined by the expert, which comparing the COs pairwise. The MEJ matrix is presented as follows (20), where α ij is the effect of comparing CO i and CO j by the expert. The function f exp express the individual judgment function of the expert. It is a representation of the knowledge of the selected expert, whose preferences can be presented as (21) After the MEJ matrix is prepared, a vertical vector of the Summed Judgments (SJ) is obtained as follows, (22): Finally, the values of preference are estimated for each characteristic object, and a vertical vector P is obtained. The i-th row includes the estimated value of preference for CO i .
Step 4. The rule base-each characteristic object and its value of preference is converted to a fuzzy rule as (23) IF C C 1i AND C C 2i AND . . . THEN P i (23) In this way, a complete fuzzy rule base is obtained.
Step 5. Inference and the final ranking-each alternative is represented as a set of values, e.g., A i = {α i1 , α i2 , α ri }. This set is addressed to the criteria C 1 , C 2 , . . . , C r . Mamdani's fuzzy inference technique is used to calculate the preference of the i-th decision variant. The constant rule base guarantees that the determined results are unequivocal, and it makes the COMET completely rank reversal-free.

Correlation Coefficients
Correlation coefficients make it possible to compare obtained results and determine how similar they are. In this paper we would use the sample Pearson correlation coefficient (24), the weighted Spearman correlation coefficient (25), and the rank similarity coefficient (26) to determine how similar obtained with COMET rankings to reference rankings.

The Sample Pearson Correlation Coefficient
The Pearson correlation coefficient is the ratio between the covariance of the two variables and the product of their standard deviations. The range value of the correlation coefficient is an interval from −1.0 to 1.0. The value of 1.0 means that we have a linear relationship which describes the relation between X and Y, with all data points lying on the line for which Y increases with X. The coefficient value of −1.0 means that all data points are on the line for which value of Y-variable decreases with increasing value of X-variable. A value of 0.0 means that there is no relationship between the variables. The Pearson correlation ratio, when applied to the sample data, is commonly represented by N is sample size, x i and y i are the individual sample elements indexed with i, andx is the sample mean (analogously forȳ).

Weighted Spearman's Rank Correlation Coefficient
For a samples of size N, the rank values x i and y i are defined as (25). In this approach, the positions at the top of both rankings are more important. The weight of significance is calculated for each comparison. It is the element that determines the main difference to the Spearman's rank correlation coefficient which examines whether the differences appeared and not where they appeared.

Rank Similarity Coefficient
For a samples of size N, the rank values x i and y i are defined as (26) [51]. It is an asymmetric measure. The weight of a given comparison is determined based on the significance of the position in the first ranking, which is used as a reference ranking during the calculation.

Material
This study case is based on data and initial results published in [41], where Wątróbski et al. provide a hybrid MCDA approach as support for choosing electrical cargo vans for city logistic. They use the PROMETHEE II to rank vans with complete information and the fuzzy TOPSIS method to rank ones with gaps in data. In this paper, we present a comparison of two approaches using COMET method, i.e., monolithic and structured approaches. Both techniques allow dealing with both crisp and uncertain data and are resistant to the rank reversal phenomenon.
In the first step, we show two different models obtained by using the COMET method to calculate preferences for ten electric vans with complete data. The obtained results are compared with the initial results [41], and in this way the accuracy of both identified models will be validated. Selected alternatives will be evaluated from the perspective of criteria described in Table 1, where nine criteria are split into four groups. In order to validate the correctness of the results, we use a group of ten alternatives according to the work in [41]. All decision variants have complete values for all attributes and are shown in Table 2.  Based on the selected group of alternatives, the corresponding characteristic values are indicated. Three values are indicated for each alternative, and a complete set of all characteristic values is presented in Table 3. The domain range of the problem state has been identified as the minimum and maximum values. The third characteristic value is indicated by using the adaptive approach as the arithmetic mean. This approach is dictated by the lack of an appropriate field expert. However, based on the data provided in the source paper and stochastic optimization methods, it is possible to fill in the MEJ matrix in a structured and monolithic approach. The next step is to determine the triangular fuzzy numbers (TFNs), which are essential in the COMET method. Based on the characteristic values, 36 TFNs were obtained. They determine the definitions of the three linguistic values for each criterion, which are the low, medium, and high values of the attribute. They are presented in Figure 3.  Figure 3. Triangular fuzzy numbers (TFNs) generated for the criteria C 1 -C 9 and the preferences P 1 -P 3 , where colors mean the following, blue-low, orange-medium, and green-high.

Methods
The structured approach will use the hierarchy according to the classification from Table 1, and is presented in Figure 4. The monolithic model assumes the use of only one COMET block with nine criteria inputs. The P 1 model describes the performance of the evaluated vehicles taking into account three-component criteria (C 1 -C 3 ) [52][53][54]. When identifying it, 27 characteristic objects are created, which requires 351 queries to the expert, and the full MEJ matrix is shown in Figure 5 (left). When evaluating the engine (model P 2 ), only its power and torque are taken into account [53,55,56]. We get nine characteristic objects, and the MEJ matrix requires only 36 queries to the decision-maker, which are presented in Figure 5 (center). The battery evaluation modules (P 3 ) have the same complexity as the P 1 model and the MEJ matrix for this model is presented in Figure 5 (right) [56][57][58]. The last model in the structured approach is the P final model, which has four inputs, which are performance, engine, battery, and C 9 criterion [53,58], which is the vehicle price. At the same time, it is the largest of the models requiring 4095 comparisons of objects characteristic of the expert. This MEJ matrix is shown in Figure 6. In total, the concept of the structured approach requires 4833 pairwise comparisons. This is an approach which corresponds to a large reduction of comparisons compared to the monolithic approach. In the second case, only one model is created, which has nine inputs. This means that the number of characteristic objects is equal to 3 9 because three characteristic values describe each criterion. This is related to the dimensional curse as such a reference would require more than 193 million pairwise comparisons. It is a number practically unattainable by humans because it would require the MEJ matrix to be completed for more than six years (assuming one comparison per second). In order to complete the MEJ matrix, we use the identification of characteristic dominated objects, stochastic optimization methods [39,59,60], and transitive relation of pairwise comparisons (27):

Comparison of the Monolithic and Structured Approaches
The decision models obtained, although they concern the same issue, have different numerical scales. The values obtained by using the structured approach, in addition to the final preferences, contain additional information on partial preferences for the evaluation of the performance, engine, and batteries of the considered freight vehicles, and the results are presented in the Table 4. An important issue is to know which of the analyzed approaches is more accurate. It may seem that the monolithic model could give more accurate results, but this way needs more pairwise comparisons than the structured approach. Moreover, creating intermediate models provides us with more information about the modeled decision problem. However, a large number of comparisons can influence the accuracy negatively. Table 5 provides a complete list of reference values, obtained preferences, and rankings. The obtained preference results differ, which is natural with two different operational approaches. It should be noted, however, that the final ranking match is at a very high level. In the case of the monolithic approach, there have been two order replacements between the alternatives (A 1 , A 2 ) and (A 6 , A 7 ). Besides, in the case of structured approach it was only one pair (A 1 , A 2 ). In both of these cases, we have quite small differences in the value of preferences, and at the same time the correlation of preferences results in both models are highly correlated with each other and amount to 0.9756, which indicates an almost linear relationship. Table 5. Preference values and rankings for structured and monolithic approaches, where Re f means a place in the reference ranking from [41]; P str and P mon are preference values for the structured and monolithic approaches, respectively; and r(·) means ranking. However, the values of preferences alone are not enough because, for the decision support system, it is more important to map the rankings correctly. Therefore, the final rankings of both approaches were compared with a reference ranking using a r w and WS ratio (see Table 6). The structural approach proved to be slightly more similar to the reference ranking than the monolithic approach. However, from the results obtained, it can be concluded that both models return very strongly correlated results with reference results. Table 6. Comparison of the values of r w and WS coefficients of the reference ranking with the ranking achieved by using a structural and monolithic approach. Therefore, this example proves that the structured approach does not necessarily have to be worse than the monolithic one. The high matching of rankings shows that it is important to focus on the right hierarchy of criteria. Next, it is necessary to ensure that the MEJ matrix is correctly completed. In our work, we used the knowledge from the reference article and several approaches to reduce the number of queries, which guaranteed high quality of the received models. In the next section, we propose a new way of examining the relevance of the decision criteria used.

Significance Analysis of Criteria
For a monolithic approach the most significant criteria were C 2 , C 4 , C 5 , P 1 , and P 3 . This is important because this approach does not use the intermediate criteria P 1 and P 3 , and has stronger correlations than the structured approach they have. On the other hand, the least important criteria are C 1 , C 6 , C 7 , C 8 , and C 9 . However, using the monolithic approach, another interesting study can be done by eliminating the individual criteria. In this case, it should be expected that if the change was significant, the ranking should be disturbed more strongly than in the case of an insignificant criterion. The monolithic approach will be used for this task because in the case of the structured approach it would involve a change in the hierarchy of criteria, which would be another element that could disturb the final ranking. Figure 7 presents the visualization of correlation coefficients. First, we build nine rankings using the monolith COMET approach, one for each criterion excluded, and we calculate r w and WS correlations between these rankings and ranking obtained with none criterion excluded. We also calculate the euclidean distance between vectors of the preference values, i.e., between the full set of criteria and a set of criteria with exclusions. Results are presented in Table 7.
After the elimination of one criterion, it turned out that excluding criteria C 4 , C 5 , or C 8 does not influence the ranking. This is surprising because criterion C 4 and C 5 were indicated as the criteria with the highest relevance for the Pearson correlation coefficient analysis. However, after removing one of them, the ranking remains unchanged, and the value of r W and WS is 1. The most important criterion, i.e., the one that has the most significant influence on the change of the ranking, turned out to be the criterion C 7 , which has the smallest value in the Pearson coefficient analysis. Table 7. Ranking similarity coefficients determined for one or more excluded criteria. The second most important decision criterion is the C 6 criterion, which was also considered to be one of the less important in the Pearson's coefficient analysis. The criteria C 6 and C 7 refer to the battery charge time, with the more critical parameter being the time it takes for the cells to recharge to 80% and the less critical parameter being a 100% charge. This is by common sense and the literature on the subject, where the most important challenge for the electric vehicles is the charging time. It depends on it whether the vehicle can move or will stand idle. The third most important criterion is C 9 , which is the cost of purchase. When we additionally analyze the distances between the vectors of the obtained preferences before and after the reduction of the number of criteria, it turns out that these criteria have the most outstanding value of distance. However, the distance itself seems to be a non-prejudicative predictor as the distance values vary from 0.1600 to 0.2093 for criteria that have not changed the ranking. It is also interesting that the ranking was not influenced by criteria related to engine parameters and battery capacity.
The next step of our analysis is to check the exclusion of all combinations for criteria C 4 , C 5 , and C 8 . Despite the lack of their strong relevance, it turns out that in case of exclusion of two combinations, the model deteriorates significantly and reaches r w values close to the situation when only the C 7 criterion was excluded. In the case of subsequent exclusions, i.e., the three criteria, we have assumed results identical to those of two C 4 and C 5 . However, further analysis of Table 7 shows us that distance between preference vectors is increasing with a number of criteria excluded. Therefore, the excluded triple criteria have equal rankings, but the distance between preference vectors is almost twice time bigger.
Excluding even three criteria from the monolithic COMET proved to be no more significant than removing the C 7 criterion. Figure 8 shows the visualization of changes taking place in the initial ranking when eliminating three criteria from the set of criteria. As it turns out, only the alternatives A 4 and A 10 remained in their places. The exclusions of single criteria are presented in Figure 9. The exclusion of criterion C 7 causes greater changes at the top of the ranking, while the alternatives A 5 , A 8 , and A 10 remain in place. Figures 8-10 show visualization of r w and WS coefficients, where we can quickly make a comparison between disordered levels. Figure 10 shows the difference between WS and r w . In all three presented cases, the r w ratio has a similar value. This is completely different in the case of the WS coefficient, where when excluding the C 5 and C 8 criteria, we get a relatively high value of 0.9229. Figure 10 shows that is because both rankings have the same top three alternatives in the ranking (A 10 also does not change position but its importance is marginal and does not exceed 2 −10 ). In the other two cases, only the alternatives A 4 and A 10 do not change their order with respect to initial raking.
In conclusion, the most significant potential in determining the relevance of criteria has an approach in which we exclude individual criteria from the model and check the impact of this on the similarity of the obtained rankings. Useful indicators for this purpose are both the r w and WS ratio. The distance between the preference vectors may be helpful, but it is certainly not the most critical factor.   Figure 10. Visualization of changes taking place in the initial ranking when eliminating two criterion from the set of criteria.

Incomplete Data
Let us assume that the decision alternative is described with partially incomplete data. There are three examples of alternatives in Table 8, where we do not know the values for criteria C 5 , C 7 , and C 9 for the alternatives A 1 , A 2 , and A 3 respectively. Incomplete data makes calculations impossible in this situation because one of the input signals is missing. The only thing we can assume in such a situation is that the value of the attributes is within the scope of our model. Therefore, instead of missing data, we insert intervals that are equal to the extreme characteristic values. Data has already been entered in Table 8. We then calculate preference values using a monolithic and structural approach by calculating all possible combinations. As a result, we get the interval of the lowest and highest possible preference. Table 8. Sample alternatives with incomplete data and interval preferences. In the purpose of ordering the alternatives evaluated by using the interval values, we will use Ishibuchi and Tanaka's approach [61], where if A = [a L ; a R ] and B = [b L ; b R ] are two interval profits, then the order relation ≤ LR for maximization problems is defined as (28) We also use the order relation ≤ CW for maximization problems. Let A = [a C ; a W ] and B = [b C ; b W ] be two intervals in center and radius form, then the order relation ≤ CW for maximization problems is defined as (29) A ≤ CW B iff a C ≤ b C and a w ≥ b w , A < CW B iff A ≤ CW B and A = B.
(29) Figure 11 shows the interval results for the monolithic and structured approaches. It seems that we have achieved two different results, i.e., results with different ranking orders. However, applying the approach (28) and (29), respectively, we get two identical rankings, where for monolithic approach we get A 3 < LR A 1 < LR A 2 and for structured approach A 3 < CW A 1 < CW A 2 . Both approaches can be used for calculations with partially incomplete data. However, it should be remembered that these calculations only contain the correct result and do not represent it. This means that only one value in the interval is the real final preference.

Conclusions
In this paper, we show two approaches to identify fuzzy models with partially incomplete data. We have compared using two approaches, the monolithic approach, which assumes creating one model, and the structured approach with a decomposed problem on several interrelated models. The main contribution is the comparison of both approaches in terms of accuracy based on the analyzed study case. Additionally, we proposed an effective way to identify the relevance of the criteria. For this purpose, we have applied two ranking similarity coefficients, i.e., r w and WS coefficients. This approach proved to be more effective than the Pearson correlation analysis between input signals and final alternative preferences.
In the last stage of the research we showed how to use monolithic and structured approaches to decision-making in multicriteria problems with partially incomplete data. Based on an example taken from the literature, we obtained results for the samples of alternatives with incomplete data in the form of interval values. These values are not trivial to interpret, but based on Ishibuchi and Tanaka's approach we have obtained an identical order of obtained intervals. However, it should be remembered that the received solution is correct, but unfortunately it is a little inaccurate or, to be more precise, such a solution is quite broad. The research confirms the main contribution that both approaches can solve problems with very similar accuracy. The results obtained differed from each other in a statistically insignificant way.
During the research, some improvement areas have been identified. The future work directions should concentrate on 1. research of using other number generalizations instead of interval numbers to solve problems with partially incomplete data, 2. more extensive research on the accuracy of monolithic and structured approaches using computer simulations, and 3. developing a new method to the identification of criteria significance.