Ranking Factors of Infant Formula Milk Powder Using Improved Entropy Weight Based on HDT Method and Its Application of Food Safety

: Food safety is about everyone’s health. Through risk assessment and early warning of food safety, food-related safety issues can be identiﬁed as early as possible and take timely precautions. However, the detection data of food safety are complex and non-linear, so it is necessary to ﬁnd the relationship and hierarchical representation of factors a ﬀ ecting food safety. This paper presents an improved entropy weight based on Hasse diagram technology (HDT) method to analyze the inﬂuencing factors of food safety. The entropy weight method was used to calculate the weight of each factor index, and the relationship matrix was obtained. Then, the data of infant milk powder in China were analyzed hierarchically by the HDT method. Thus, we can obtain the multi-level structure that a ﬀ ects food safety. It provides an e ﬀ ective basis for early warning of food safety, can help government regulators to strengthen management, and urge enterprises to produce food safely.


Introduction
With the development of the times, food safety has increasingly become the focus of attention in today's society. Recently, food safety incidents have emerged, which have not only seriously damaged the interests of consumers, but also had a certain impact on social stability [1,2]. In 2017, there was a massive emergency recall of vegetables containing the deadly bacterium listeria in North America, and the stores involved were all supermarkets with high credibility [3]. At the end of July of the same year, an outbreak of Salmonella infection occurred in which 47 people in 12 states of the United States were infected with Salmonella due to eating contaminated papaya, where 12 patients were hospitalized and one died. In 2018, lettuce and carrot salads sold by McDonald's were tested to contain the Cyclosporiasis virus, causing 476 infections in 15 states and 21 people being hospitalized in the United States [4]. There have also been outbreaks of E. coli from eating Roman lettuce in several parts of the United States and Canada that infected at least 58 people in 13 states, with one death each in California and Canada, two more with hemolytic uremic syndrome, and one with renal failure. In recent years, a series of food safety incidents such as Sudan Red have also occurred in China [5,6], melamine [7] and gutter oil [8]. Three batches of unqualified samples were found in 519 batches of seven categories of food in 2017 such as dairy products, food processed products, candy products, and beverages as organized by the State Administration of Food and Drug Administration [9]. In 2018, the inspection report of a batch of full-fat milk powder purchased by MingYi Dairy Company showed that the "protein/non-fat milk solid" was 33.77%, which did not meet the requirement of (>34%) stipulated in the product standard, but the conclusion was qualified [10]. Therefore, it is very important to find out the main factors that affect food safety.
In recent years, food safety incidents have occurred frequently around the world, making food safety research a hotspot of global research. Nowadays, Bayesian network (BN) and artificial neural network (ANN) are relatively mature research methods in the field of food safety [11,12]. The ANN model is usually applied to the classification of food types and quality and risk warning [13]. However, the neural network algorithm has limited modeling ability. BN, as a probability network, is widely used in the field of risk assessment and food fraud prediction [14,15]. However, due to the high dimension and complexity of food safety data, the analytical capability of BNs is limited.
The Hasse diagram technique (HDT) is an approach based on partially ordered sets that preserve important elements of the evaluation and decision-making process. Bruggemann et al. gave a comprehensive description of HDT, and Voigt et al. compared HDT with multivariate statistical methods [16,17]. According to the evaluation standard of chemical pollution, HDT is used to represent the comparison between the pollution levels of various chemicals, and then the harm of chemicals is ranked. Halfon et al. used HDT to show the comparative relationship between the pollution levels of various chemicals according to the chemical pollution assessment criteria, and then ranked the hazards of the chemicals [18]. Carlsen et al. used a Hasse diagram to study the financial system of 81 countries, obtained the evaluation criteria of a financial system design, and then clustered and ranked each country through a Hasse chart, which provided a new perspective for the quantitative study of the system [19]. Tsakovski et al. used the Hasse diagram technique for partial ordering to explain some specific relationships between the chemical indicators analyzed and ecotoxicity tests for acute and chronic toxicity [20]. More complete sediment risk assessment was realized and more reliable sediment pollution history information was extracted [21]. Voyslavov et al. used advanced multivariate data processing methods such as Kohonen self-organizing graph (SOM) and Hasse graph (HDT) to evaluate surface water quality [22]. Kudłak et al. discussed the new application of HDT in the ecotoxicity test of groundwater quality [23]. To determine the applicability of various ecotoxicity tests, HDT was used to rank samples at different monitoring levels according to the test used. Zhou et al. used the cloud model method with entropy weight to predict rock burst classification [24]. Xiao et al. proposed a matter-element evaluation algorithm based on fuzzy entropy weight [25]. The entropy weight method was used to determine the index weight, and the closeness of the matter-element matrix was calculated. This method avoids the shortcomings of the traditional matter-element method in the calculation of artificial eigenvalue and correlation degree. Aiming at the problems existing in regional informatization ecological environment construction, Xu et al. proposed a method of regional informatization ecological environment construction based on the entropy weight modified AHP hierarchical model [26]. To the best of the authors' knowledge, it is the first time that HDT has been applied in the agro-food sciences.
Entropy is the measure of the disorder in a system. It can be widely used to evaluate the disorder degree and effectiveness of the information for a system [27]. This method is an objective weighting method, which can reflect the utility value of index information entropy. The weight value determined by this method is more accurate and credible than that of the subjective weighting method. Hence, the weights identified by entropy are also the measurement of the disorder degree of the evaluation system. Entropy weight represents useful information of the evaluation index. The bigger the entropy weight, the more useful the information of the index. In this paper, we used the entropy weight method to quantify the food data index and to determine the weights.
Therefore, in light of the high-dimensional and complex characteristics of food safety data, HDT based on the entropy weight method is proposed to analyze the factors affecting food safety.

Calculate the Entropy Weight
(1) Construct the evaluation matrix According to Equation (1), the evaluation index factors are determined.
For n samples and m indicators, X(ij) is the value of the ith sample corresponding to the jth index.
(2) Calculate the normalization matrix The normalized matrix can be obtained according to Equation (2).
(3) Calculate entropy for all criteria using Equation (5) where ρ ij is the weight of the jth sample value in the ith index. The entropy value of the index (column) of item j is shown in Equation (5) where k is the Boltzmann constant, k = 1 ln(n) , 0 ≤ e j < 1 . (4) Calculate the entropy weight wj of the j indicator by Equation (6) We can obtain W = w 1 , w 2 , w 3 , · · · , w j , n j w j = 1.

Hasse Diagram Technology (HDT)
Hasse diagrams visualize partial order relations between objects described by a certain number of variables. HDT is well described and this study is only described here briefly [28,29].
Using the HDT method, the ranking of objects is done with respect to variables, which is called the information basis. The processed data matrix P(A × B) contains A objects and B variables. The entry p(x) is the numerical value of the r-th variable. The p r are variables by which the objects will be ranked. The two objects s and t are comparable if: If there is only one p r for which p r (x) > p r (y), then the objects x and y cannot be compared. By using the coverage relation matrix to collect the relations between objects, the partial order set can be easily established. The matrix is an (n, n) antisymmetric matrix, where the entry axes of each pair of elements x and y are given: A = a xy n×n (10) Therefore, if there is no element "a" in E, for which: x ≤ a ≤ y, a x, y and x y, then x is covered by y.
(1) Hasse diagrams are used to represent the order relations in matrices and are constructed as follows: (2) If x ≤ y, draw x below y, and all relationship lines should be drawn in the same direction. All up or down. (3) If y and x overlap each other, then a line is drawn between the corresponding objects and the elements are comparable. (4) If x ≤ y and y ≤ z, the relationship between x and y can be obtained according to transitivity rules: x ≤ y. The boundary between x and y is represented by a line between x and y.
For the decision problem with m schemes and n indexes, Chen gave a simple implicit weighting method, that is, the weight rank of each index satisfies w1 > w2 > . . . > wn, the scheme decision problem containing weight information is expressed in matrix form [30]: x is the evaluation matrix, If the ith row in matrix D is greater than or equal to the jth row, then the ith scheme is better than or equal to the jth scheme. Therefore, a kind of partial order relation of implication weight information is constructed. The partial order relation can not only sort the scheme, but also analyze the structural relation of the scheme. On the basis of the cumulative transformation matrix, the comparison relation matrix is obtained by row by row comparison.
Transformation formula between relational matrix and Hasse matrix is: where A is the original matrix; B is the multiply matrix; I is the unit matrix; R is the relation matrix; and S is the Hasse matrix.

Entropy Weight-Hasse Diagram Technology (HDT) Method
The structural analysis steps of the entropy weight-HDT method are shown in Figure 1.
where A is the original matrix; B is the multiply matrix; I is the unit matrix; R is the relation matrix; and S is the Hasse matrix.

Entropy Weight-Hasse Diagram Technology (HDT) Method
The structural analysis steps of the entropy weight-HDT method are shown in Figure 1.

Case Study
This paper analyzed the ranking of the factors affecting food safety. The detection data were provided by China's food inspection agency.
First, the weight of each factor was analyzed by the method of entropy weight, and the relationship matrix was obtained by directed-Hasse diagram technology (D-HDT). Then, the

Case Study
This paper analyzed the ranking of the factors affecting food safety. The detection data were provided by China's food inspection agency.
First, the weight of each factor was analyzed by the method of entropy weight, and the relationship matrix was obtained by directed-Hasse diagram technology (D-HDT). Then, the influencing factors were stratified and the multi-level structure of influencing food safety factors was established. This article used the test data of infant milk powder and infant formula for analysis.

Data Preprocessing
The original data collected by food inspection institutions were used. This paper selected the test data of infant milk powder for research. According to the National Food Safety Standard for infant milk powder (GB 10765-2010) [31], there are nine items including sensory requirements, raw material requirements, the physical and chemical indicators, pollutant limit, mycotoxin limit, and microbial requirements.
The collected test data contained a lot of information related to products. When processing the data, some properties that are not strongly correlated should be ignored to extract the key data. According to the above National Standards, the table of influencing factors of infant milk powder can be obtained, as shown in Table 1.

Data Integration
Due to the particularity of food data, where missing value data and outlier data will appear, it is necessary to re-integrate data when processing dairy product detection data.
Processing missing value data. These data are mainly used to monitor the chemical contaminants in food. When the contaminant content is lower than the detection limit and quantitative limit of the equipment, the measurement data are shown as "undetected", and the data with missing values can be deleted, or the same mean interpolation, maximum likelihood estimation, regression, and other methods can be used to complete.
Handle abnormal data. The reasons for abnormal data include three parts: different data sources, wrong data collection methods, and data measurement methods. Outliers are typically handled in three ways:

2.
Treat the outliers as missing values and use the missing value handling method.

3.
Use the average to correct.
Part of the data after preprocessing of the infant milk powder is shown in Table 2.

Construct evaluation matrix
In this paper, 60 pre-processed infant milk powder test datasets were used. Each infant milk powder dataset contained many attributes. The normalized matrix can be obtained according to Equation (2), and is shown in Table 3. 2. Calculate the entropy weight wj of the j indicator by Equation (6) and the entropy weight is shown in Table 4. 3. Finding the relational matrix The matrix was reordered according to the order of weight from large to small, as shown in Table 5. According to Equation (7), we can obtain the fuzzy accumulation matrix, as shown in Table 6.

Compare the row vectors of the cumulative transformation matrix
If the ith row is greater than or equal to the jth row, rij = 1, otherwise, rij = 0, and the comparison matrix of cumulative index data R = (rij) mxm is obtained, as shown in Table 7. The reachable matrix is shown in Table 8. Table 7. The comparison matrix of the cumulative index.  Table 5 is transformed into a directed HDT of hierarchical relations through topological operation, as shown in Figure 2.  Figure 2 shows the hierarchical structure of the main factors that affect infant milk powder. The first level is chromium; protein, fat and solid-not-fat are in the second level, and these three factors are related to chromium; from the third level to the seventh level, the influencing factors are aflatoxin M1, acidity M1, lead, mercury, and arsenic. According to the hierarchy of the loop, the final hierarchy can be obtained, as shown in Figure 3. Lead and acidity in the second level are connecting factors.  Figure 2 shows the hierarchical structure of the main factors that affect infant milk powder. The first level is chromium; protein, fat and solid-not-fat are in the second level, and these three factors are related to chromium; from the third level to the seventh level, the influencing factors are aflatoxin M1, acidity M1, lead, mercury, and arsenic. According to the hierarchy of the loop, the final hierarchy can be obtained, as shown in Figure 3. Lead and acidity in the second level are connecting factors.  Figure 2 shows the hierarchical structure of the main factors that affect infant milk powder. The first level is chromium; protein, fat and solid-not-fat are in the second level, and these three factors are related to chromium; from the third level to the seventh level, the influencing factors are aflatoxin M1, acidity M1, lead, mercury, and arsenic. According to the hierarchy of the loop, the final hierarchy can be obtained, as shown in Figure 3. Lead and acidity in the second level are connecting factors. The basic factors affecting the safety of infant milk powder were solids-not-fat, arsenic, mercury, protein, and fat, and the major influencing factors were chromium and aflatoxin M1, which play a key role. By analyzing the hierarchical structure of infant milk powder data, it was found that the important indicators for evaluating the quality of infant milk powder were the chromium and mycotoxin limiting factors in heavy metal pollution. The reasons for the residual heavy metal pollution in infant milk powder are described below. First of all, due to natural or man-made reasons, the environment contains heavy metals, which leads to the pollution of animal feed, and the content of heavy metals in raw milk is relatively high. The second reason is that dairy products have been contaminated during procurement, production, and transportation. Chromium pollution was The basic factors affecting the safety of infant milk powder were solids-not-fat, arsenic, mercury, protein, and fat, and the major influencing factors were chromium and aflatoxin M1, which play a key role. By analyzing the hierarchical structure of infant milk powder data, it was found that the important indicators for evaluating the quality of infant milk powder were the chromium and mycotoxin limiting factors in heavy metal pollution. The reasons for the residual heavy metal pollution in infant milk powder are described below. First of all, due to natural or man-made reasons, the environment contains heavy metals, which leads to the pollution of animal feed, and the content of heavy metals in raw milk is relatively high. The second reason is that dairy products have been contaminated during procurement, production, and transportation. Chromium pollution was the most serious heavy metal pollution, followed by acidity and lead. Mycotoxin contamination in pasteurized milk is due to aflatoxin M1 contamination in dairy cow feed. Aflatoxin M1 in dairy products mainly comes from the feed of dairy cows. Aflatoxin M1 is found in milk and dairy products when cows are fed aflatoxin B1. According to the above analysis results, the quality of raw milk should be strictly controlled during the production of sterile milk. By monitoring the planting environment, it can be ensured that the air, soil, water, and feed are not contaminated, thereby ensuring the quality of raw milk.

Conclusions
First, the entropy weight based on the HDT method was proposed. The entropy weight method was used to calculate the weight of each factor index, and the relationship matrix was obtained. Then, the data of infant milk powder in China were analyzed hierarchically by the HDT method.
The proposed method can focus on the main factors affecting food safety, strengthen supervision of relevant departments, and supervise the safe production of such enterprises.
Second, this method can effectively analyze the main factors affecting the safety of infant milk powder. After basic hierarchical analysis and loop optimized hierarchical structure analysis, the evaluation indexes of infant milk powder were divided into three levels. The first level consisted of the main influencing factors. By analyzing the main factors affecting the quality of infant milk powder, we found the main reasons that affected the quality of infant milk powder. At the same time, a comparative analysis of different milk powder data was performed to obtain similar influencing factors, which proved the validity of the research results. This provides a data basis for risk assessment and early warning, and can effectively reduce the occurrence of food safety incidents.
Third, the proposed method has some shortcomings. The method of artificial neural network and machine learning can be considered for optimization when determining the fuzzy matrix. An improved analytic hierarchy process for influencing factors of food safety is proposed. Selecting the test data of infant milk powder as examples, the entropy weight method was used to calculate the weight of each factor. Then, the HDT method was used to analyze the influencing factors, and a multi-level structure of influencing factors was established.
Finally, using aseptic milk data as a sample set, nine key factors that affect the safety of aseptic milk were analyzed. Through the analysis of the main influencing factors, combined with the actual production situation, an improved plan to improve the quality of infant formula milk and aseptic milk was proposed. At the same time, this method can be used as a basis for risk assessment and early warning.
In future work, we will use the ANN method and machine learning to optimize our proposed method so that it can be applied to other food safety fields.