1. Introduction
In social science research, certain attitudes of social groups are often studied using a questionnaire [
1], such as voters’ attitudes toward presidential candidates, individuals’ attitudes toward a certain policy, customer satisfaction, and social public satisfaction with the government. Due to the comprehensive effects of a person’s psychology, personality, emotions, and other characteristics, the evaluation of various indicators often reveals some irrational information. For example, there are three evaluation indicators that constitute a phenomenon: A, B, and C. The respondents are required to give a score evaluation based on these three indicators, and the score points range from one to five, with one representing the worst and five representing the best. When the respondents make their score evaluations, there will be two cases. In the first, the gap between the evaluations and indicators is very small, i.e., the scores are all high scores, low scores, or medium scores; in the second, there is a large gap in the evaluations given to the various indicators.
For example, in research on people’s attitudes toward and views of the management of a community, we often divide the community management into a number of indicators, including sanitation, afforestation, public security, maintenance services, the shopping environment, and so on. Each item can be rated from one to five, with one representing the worst and five the best. There may be some in the community who are always dissatisfied with the community’s sanitation, thus bringing their opinions to the management. However, these people may not give the lowest score on the questionnaire regarding the community’s sanitation level. For example, these people’s average score may be 2 or 3 points, and those who give an average score of 1–2 points may be silent on the matter in everyday life, not voicing their opinions to community management. Why does this phenomenon occur? It cannot be explained by the mere evaluation of the “degree of sanitation”. However, the overall opinions of the two types of evaluations can be easily explained. Regarding the group that voices its opinion regarding the community’s sanitation problems, compared to others, these people may not give the lowest score for the “degree of sanitation” compared to other community members but their scores for sanitation are lower than the scores they give other indicators. This group’s evaluations of the security situation may be five points and, for the degree of afforestation, four points, but the group’s evaluation of the degree of sanitation is the lowest, perhaps two points. In other words, this group gives scores with a larger gap in various indicators, which can be interpreted such that this group of people takes a serious attitude and has clear ideas concerning what they hate or love. Although the groups with the lowest degree of sanitation evaluation scores do not take any action or voice their opinions, their evaluations on all indicators are very low, and the gap among the evaluation scores is not large. In fact, if the community only has a sanitation problem and the other aspects of community life are good, a group’s evaluations that consist solely of low scores on all indicators will exhibit a small score gap, but their evaluations may not constitute entirely rational evaluations. This group may be driven by certain irrational factors, such as a pessimistic temperament, general social grievances, and so on. Those who give evaluations with large score gaps actually show their true opinions. Similar situations are common, such as polls on election candidates, polls on government performance, and customer satisfaction with companies’ services.
Almost all research is designed to genuinely understand respondents’ attitudes and preferences, but conventional statistical methods can hardly reflect such attitudes and preferences and, therefore, contradict the study’s purposes. In this paper, we believe that to achieve the purposes of a study—to effectively highlight the attitudes and opinion preferences of respondents—in the process of data analysis, we should pay more attention to those individuals who mark different scores with large gaps in the evaluation of different indicators and take them as more important factors among the results during the calculation process. Similarly, we posit that we should pay less attention to those respondents who evaluate various indicators with a small gap (some of whom even give the same scores for all indicators) and take these respondents as less important factors in the calculation process.
The source of this paper is a study on the evaluation of the effectiveness of securities regulations in China. To evaluate the regulations by researching their effect on market participants, this paper classifies market participants into five groups [
2,
3]: regulators, managers of listed companies, individual investors, managers of fund companies, and managers of securities companies. The respondents are asked to evaluate all aspects using a score. The essence is not highlighted during the first-round analysis of the study, which is conducted by a traditional method. In the second-round analysis, we propose a calculation method based on relative entropy theory, which solves the problem and also has a certain universality.
2. Literature Review
Preference is a widely used concept. Arrow [
4], a Nobel laureate in economics, describes preference in his book as follows. When faced with an alternative consisting of a number of selection items, these options can be represented by x, y, z, ..., thereby forming the set S. The preference of the selector is represented not by treating each of the alternatives in S equally but instead by sorting them after comparison or by preferentially selecting an item. Arrow first defines the reference or indifference relationship; this relationship is an axiom, which is expressed as follows:
For any two optional objects x and y, there must be xRy or yRx.
xRy means “x is not inferior to y”, and yRx means “y is not inferior to x”; this relationship is called the “weak ordering” relationship. The weak ordering relationship contains a preference or an indifference, and if the indifference is excluded, then only the preference relationship is defined. Arrow’s definition is as follows:
If yRx is false, then it is called xPy.
“xPy” means “x is better than y”, and Arrow calls it a strict preference relation.
Thus, comparison is a prerequisite for a preference, and a preference is produced in comparison and through people’s selection of different things. The concept of preference is widely used in the field of marketing in customer research. Since each person has preferences when a facing a choice, Arrow notes that to obtain the public attitude toward a problem, this problem becomes a gathering of individual preference types, producing the only society preference type based on these individuals.
There are a number of methods to reflect human preferences by gathering large amounts of data. The most traditional method involves traditional statistical methods, which calculate the average of each person’s evaluation of each variable or the ratio of selections for each option and examines the significance of the differences in the mean value or ratio of different groups of data. This is the most basic method, but its limitations are obvious. When gathering data with the mean value and ratio, each row of data in the data table (i.e., the choice of each respondent for a series of variables) is given equal status, regardless of the rows’ differences in size, which makes their influence on the results consistent, weakening the impact of preferences on the results to a certain extent. The limitations of mean values and standard deviations in data analysis—in addition to entropy theory and its approach, which can compensate for these limitations—have already been noted by scholars [
5].
Association rules [
6] in the data mining method can find a series of related project groups, as opposed to gathering a large amount of data into a parameter, which cannot solve this problem.
The variance in statistics reflects only discrepancies in the data, and it is difficult for it to reflect the preferences of different groups of data. Gathering a large number of the respondents’ data using entropy theory and the relative entropy method can effectively reflect respondents’ preferences.
The work published by Shannon in 1948 [
7] is an important symbolic moment of the birth of information theory. The other two important milestones in the paradigm-shifting development of entropy theory are as follows: (1) the principle of maximum entropy proposed by Jaynes in 1957 [
8], i.e., that states that the distribution that is closest to the uniform probability will be chosen in a probabilistic distribution that satisfies all given constraints, namely, the maximum entropy probability distribution; and (2) relative entropy theory, which was developed based on the early directed divergence proposed by Kullback [
9]. Relative entropy can be used to measure the proximity of two probability distributions. In this paper [
10], eight entropy-based methods, including Shannon’s entropy and relative entropy, are compared using the entropy-based image threshold technique, and the relative entropy method is further developed. Another form of information entropy is mutual information. In their discussion of the purpose of mutual information, Baratpour et al. note that mutual information can be used to measure the degree of dependence of the recorded values of two variables [
11].
The development of information entropy, the main theory, and its application have been discussed in detail by Gray, Qiu, and Zhong [
9,
12,
13]. Since the relationship between entropy and information was formed, information entropy has been widely used in economics, managerial studies, and the social sciences [
14,
15,
16,
17,
18].
The latest research is more in-depth and broader. Two scholars, Baez and Pollard [
19] review various information-theoretic characterizations of the approach to equilibrium in biological systems. The replicator equation, evolutionary game theory, Markov processes and chemical reaction networks all describe the dynamics of a population and of a probability distribution. Under suitable assumptions, the distribution will approach equilibrium with the passage of time. Relative entropy provides a quantitative measure of how far from equilibrium the system is [
19]. The social preferences of different groups studied in this paper can also be regarded as an ecological system in equilibrium; therefore, entropy theory can also be used to measure how far the actual observation state and the system equilibrium. In terms of specific algorithms and models, Dziurosz-Serafinowicz [
20] further affirmed the traditional principle of maximum relative entropy (MRE) and extended its application to the expression of new degrees of beliefs as a result of learning. The study closest to this study is “Preference Inconsistence-Based Entropy” by Pan et al. [
21]. These authors note as follows: “As available information is usually obtained from different evaluation criteria or experts, the derived preference decisions may be inconsistent and uncertain. Shannon entropy is a suitable measurement of uncertainty.” Although their study has the same principle as the present study, it is different in application. In their study, the theory of relative entropy was used to establish models that distinguished the preferences of decision-makers in decision making, which involved individual preferences. In this study, a large volume of social survey data was used to distinguish the preferences of different social groups. Makowski et al. [
22] studied the issue of transitivity of preferences in an argument between two people. A recent study, “Information-Theoretic-Entropy Based Weight Aggregation Method in Multiple-Attribute Group Decision-Making” from He et al. [
23] is also a study of decision makers’ preferences. Thus, studying individual preferences in the decision-making process using relative entropy theory is more common than studying group preferences.
Nonetheless, the study of social group preferences is a research hotspot over the years, although most of this scholarship involves consumer preferences [
24] and/or establishing models [
25]. Although the theory of entropy has many applications in terms of handling social survey data [
26,
27,
28,
29,
30,
31,
32,
33,
34], we have not found that previous applications of the theory of entropy are basically the same as that used in this paper, but it is absolutely possible to extend these applications based on existing results.
According to information entropy theory [
9,
12,
13], if something has m types of states, then the probability of each state is P
i (i = 1, 2,..., m), and the information entropy of the system can be defined as follows:
When the entropy value of the system is high, the implication is that there is a high degree of chaos in the system; when the data distribution is even and the degree of variation is small, the implication is that the amount of information is small. When system entropy is low, the degree of variation of the data is large, and the amount of information contained is large. According to the principle of maximum entropy, in the state of nature or in the absence of outside intervention, the entropy of a system tends to increase; if entropy is to be reduced, external forces must be applied to the system. For a data system that reflects the state of something, the data should be evenly distributed in the random condition. If the data distribution is not even and the degree of variation is large, then it is often influenced by certain system factors, and there are reasons to investigate. Regarding the particular problem in this paper, when the respondents must choose among a number of indicators, if there are no preferences, then the data distribution is even, and there will be no differences among the options; if the data distribution is uneven and there are differences among the various options, then the ratio of the different options is different and inevitably affected by preferences. Therefore, the basic definition of the entropy theory formula reflects individuals’ preferences, and the concept of relative entropy should still be adopted to measure the preferences of different people or different groups.
The definition of relative entropy is based on the ratio of two probability distributions. If there are two probability distributions P and Q, then the relative entropy of P to Q can be expressed as D (P:Q) and can be defined as follows:
Relative entropy D (P:Q) defines the degree of closeness between probability distributions P and probability distributions Q. The smaller the absolute value of the relative entropy, the closer the two are. In the extreme case, if Pi = Qi, then D (P:Q) = 0. According to the principle of relative entropy, to obtain optimal results, the result of gathering should be the closest to the probability (a priori probability) of a choice distribution among all the probability distributions satisfying a given constraint.
When the maximum entropy principle and the conditions of the relative entropy principle are met simultaneously, the two issues in the data analysis of this study can apparently be solved; in other words, the final parameters will reflect the integrity requirements and preference characteristics of people. In line with the maximum entropy principle, the final parameter can better reflect the integrity requirements, and conforming to the relative entropy principle, it can better reflect the preference characteristics of people.