Real Estate Valuations with Small Dataset: A Novel Method Based on the Maximum Entropy Principle and Lagrange Multipliers

: Accuracy in property valuations is a fundamental element in the real estate market for making informed decisions and developing effective investment strategies. The complex dynamics of real estate markets, coupled with the high differentiation of properties, scarcity, and opaqueness of real estate data, underscore the importance of adopting advanced approaches to obtain accurate valuations, especially with small property samples. The objective of this study is to explore the applicability of the Maximum Entropy Principle to real estate valuations with the support of Lagrange multipliers, emphasizing how this methodology can significantly enhance valuation precision, particularly with a small real estate sample. The excellent results obtained suggest that the Maximum Entropy Principle with Lagrange multipliers can be successfully employed for real estate valuations. In the case study, the average prediction error for sales prices ranged from 5.12% to 6.91%, indicating a very high potential for its application in real estate valuations. Compared to other established methodologies, the Maximum Entropy Principle with Lagrange multipliers aims to be a valid alternative with superior advantages.


Introduction
In real estate markets, accuracy in property valuations is a fundamental element for making informed decisions and effective investment strategies.The complex dynamics that characterize real estate markets, together with the high differentiation of properties, make the adoption of advanced approaches crucial to obtaining accurate valuations [1,2].
However, this is countered by a frequent scarcity of real estate data and opaqueness of related markets, problems that can be found in various territorial contexts.The causes of these phenomena can be traced back to a series of factors, including resistance to change in the real estate sector, the lack of standardization in registration practices, the absence of regulatory requirements that mandate complete disclosure of information, the limitations or incompleteness of information in public data, and the reticence of private individuals in disclosing transaction prices.The relevance of the negative impacts of all these factors is evident in the understanding of real estate markets, property valuations, and investment decisions within the real estate sector due to information asymmetries [2].
In this framework, the Principle of Maximum Entropy emerges as a powerful tool, offering a new paradigm to address the challenges of real estate valuations.
Entropy is a fundamental concept in information theory and is closely associated with the idea of measuring uncertainty or randomness in a system [3].The Maximum Entropy Principle proposes to select the probability distribution that reflects the maximum uncertainty, given a set of observed constraints.In other words, it involves choosing a distribution that is as neutral as possible with respect to the known information.Applying this principle to the field of real estate valuations entails balancing the complexity and variety of data, allowing statistical models to adapt naturally, guided by the maximum possible entropy [4].
The Maximum Entropy approach moves away from assuming additional information not supported by observed data, providing valuations that are, by definition, the result of an inference process based on maximum uncertainty.In the real estate field, this approach allows you to flexibly integrate different sources of information, reflecting a variety of variables that can influence the value of a property.
When dealing with optimization problems with constraints, as in the case of the general formulation of Maximum Entropy Principle, the Lagrange multipliers are often used to incorporate these constraints into the objective function.The goal is to find the maximum of objective function subject to the given constraints.Thus, the integration of the Maximum Entropy Principle with Lagrange multipliers enables the handling of constraints in probability appraisal.This approach allows us to find the probability distribution that maximizes the entropy given the constrained knowledge of the system, ensuring consistency with the available information [5,6].
From a logical point of view, the proposed methodological approach, not unlike other procedures, leads to determining the market value or income of a property through a comparison with prices of properties that have similar characteristics to the one being estimated.A prerequisite is that the comparative real estate data occurred recently in relation to the time of the valuation.While it is logical to assume that a greater number of comparison data leads to a better estimate result, the conducted experimentation considers a small sample of real estate sales sufficient for arriving at a reliable estimated value.Under this last aspect, the approach effectively addresses the challenge posed by the scarcity of data that characterizes real estate markets.The method in question estimates the value of a property by comparing its characteristics with those of comparable properties, in accordance with the "similia similibus aestimentur" criterion [7].
One of the operational limits currently associated with the use of the proposed innovative approach, based on Maximum Entropy Principle and Lagrange multipliers, is the practical impossibility of obtaining the marginal prices of features that influence the market price of an examined property, and due to this limit, it is not suitable for real estate mass appraisal.
In conclusion, the objective of this study is to explore the applicability of the Maximum Entropy Principle to real estate valuations with the support of Lagrange multipliers, emphasizing how this methodology can significantly enhance the precision of valuations.
The research starts with a literature review of the concept of entropy and existing studies on its applications in the real estate field.Subsequently, a mathematical discussion of the Principle of Maximum Entropy integrated with Lagrange multipliers is presented, being a necessary and fundamental part of understanding the logic and functioning of the novel proposed method.After a brief description of the study's territorial context and real estate sample, the new method is applied to forecast the market prices of a small sample of real estate data (eight total observations, including the subject to be estimated).Finally, some critical considerations and perspectives for further future developments in the research are formulated.

Literature Review
The word "entropy" first appeared in 1864 in the context of a thermodynamics treatise by Rudolf Clausius, where it represents a state function that quantifies the unavailability of a system to produce work (in variational form, it is equal to the ratio between the amount of heat absorbed or released reversibly and isothermally by the system at a certain temperature considered).In accordance with its original definition, entropy, therefore, indicates which processes can occur spontaneously: the evolution of a system always proceeds in the direction of increasing entropy [8].
In 1870, with the development of statistical mechanics, J.W. Gibbs gave a new meaning to entropy, linked to the possible molecular arrangements of a system of particles.The Gibbs entropy (S) is defined as [9]: where k B is the Boltzmann constant and p i is the probability that the system is in the i-th microstate.Maximizing the entropy function (S), the system reaches its equilibrium state.Equation (1) can be regarded as the fundamental definition of entropy, as all other expressions of S can be derived from (1) but not vice versa.
Subsequently, Boltzmann reworked Gibbs's concept, defining entropy as the measure of the number of possible microstates of a system, given its macroscopic thermodynamic properties [10].
In 1948, Shannon introduced the concept of information entropy, demonstrating how it was possible to quantify the information contained in a message emitted by a source.He completely disregarded the semantic content of the term entropy, considering the quantity of information solely in probabilistic terms.The information is quantified through a function that measures the uncertainty of X, namely entropy, defined as [3]: where K is a positive and arbitrary constant that depends on the logarithmic base, and (p 1 , . .., p p ) are the probabilities of a set of possible events.In this case, entropy measures the amount of uncertainty or information present in a random signal.
Starting in 1957, Jaynes dedicated himself to demonstrating the connection between the physical concept of entropy and that of information theory, developing the Principle of Maximum Entropy.Through this principle, Jaynes showed how it was possible to determine probability distributions of a configuration from partial information.The basic idea is to leverage the available information and impose that the sought distribution is the one that maximizes Shannon's entropy, as a measure of uncertainty and information quantity [4].
From these studies, it is inferred that the reduction of entropy can represent a concept of fundamental importance in the economic domain.Indeed, low entropy can govern economic values [22,23] or measure the scarcity and value of economic goods [24].Similarly, the economic value of a good, incorporating complex, indeterminate, and anthropic features, derives from the law of entropy [25].
International studies specifically focused on the application of the concept of entropy in the real estate field are very limited.
Brown [26] first investigated the effectiveness of entropy in explaining the inefficiency of the real estate market, followed by Chen et al. [27].
The paper by Ge and Du (2007) investigates the main variables that influence residential property values in the Auckland property market (New Zealand) and ranks the variables using the Entropy method [28].
Lam et al. proposed in 2008 a mathematical model for predicting housing prices in Hong Kong based on the integration of entropy and artificial neural networks [29].Subsequently, in 2009, the same authors implemented artificial neural networks with support vector machines to enhance the accuracy of real estate assessments in Hong Kong and mainland China.The identification of key real estate variables, which could influence property prices, has been addressed through an entropy-based rating and weighting method aimed at providing objective and reasonable weights [30].
In 2009, Zhou et al. dealt with a complex problem of multi-objective decision making in the real estate venture capital sector, where the weight was assigned based on base points and maximum entropy [31].
Salois and Moss developed a dynamic information measure in 2011 to examine the informational content of farmland values and farm income in explaining the distribution of farmland values over time [32].
The primary goal of Gnat's 2019 study was a proposal to modify the classical entropy measure, enhancing its ability to accurately reflect the specificity of assessing the homogeneity of valued areas in the context of property market analysis [33].
In 2020, Kostic and Jevremovic addressed the topic of property attractiveness, where property image features are used to describe specific attributes and examine the influence of visual factors on the price or duration of real estate listings.They considered a set of techniques for extracting visual features for efficient numerical inclusion in modern predictive algorithms, including Shannon's entropy, center of gravity calculation, image segmentation, and the use of Convolutional Neural Networks.They concluded that the employed techniques can effectively describe visible features, thus introducing perceived attractiveness as a quantitative measure in predictive modeling of housing [34].
The study by Basse et al. (2020) utilizes the concept of transfer entropy to examine the relationship between the US National Association of Home Builders Index and the S&P CoreLogic Case-Shiller 20-City Composite Home Price Index.The empirical evidence suggests that the survey data can contribute to predicting US house prices [35].
The last work in chronological order is Özdilek's addressing in 2023 the incorporation of entropy measurements into real estate valuation, modifying and integrating triadic estimates of price, cost, and income; his results have significantly improved the precision of value measurement [11].
The above studies, where entropy is applied to various aspects or issues of the real estate markets, all highlight a common theme: a significant improvement in the predictive accuracy of the measured values.

Methodology
The proposed model is based on constrained optimization with Lagrange multipliers.The problem involves maximizing an objective function, which is the opposite of the sum of Shannon entropy weighted by real estate prices, subject to normalization constraints and consistency moments.
The model design consists of the following phases: 1. market analysis for the identification of recent sales of similar properties; 2. verification of information, considering whether the observed prices adhere to the definition of the market value estimation criterion, and whether the transactions belong to the specific market segment to which the property being appraised belongs; 3. selection of elements of comparison (property features); 4. definition of the objective function based on Shannon's Entropy, capable of incorporating the sum of products between the optimal weights of property features and the corresponding prices from the estimated sample data; 5. setting variability and normalization constraints, as well as moments of consistency for real estate variables; 6. definition of the Lagrangian function that includes the specified constraints, followed by the redefinition of the objective function that returns the sum of the Lagrangians; 7. processing the solution and optimal value of the objective function, with the definition of weights for each optimal solution.
The foundational core of the proposed methodology lies in integrating the Maximum Entropy Principle with Lagrange multipliers.This integration provides a powerful approach to dealing complex optimization and inference problems, ensuring efficient handling of constraints, and preserving information with the maximum uncertainty allowed by the available data.In this way, the incorporation of additional information into the optimization problem is allowed, especially when dealing with complex problems featuring multiple constraints or when seeking a probability distribution that adheres to specific characteristics.To execute the model, access to a MATLAB environment or a similar computational platform is required [36].
Let us see in formal terms what applying the Maximum Entropy Principle with Lagrange multipliers involves [5,6,37].
Consider a system described by a set of state variables: {x 1 , x 2 , . .., x N } ≡ x, where each possible configuration has a certain probability of being observed.The probability of a state, in this case, cannot be thought of in a frequentist sense but rather should be understood as our knowledge of the system.Since one often deals with systems of very high dimensionality, with N being very large, it is convenient to study the distribution of suitable functions of the states.
Therefore, quantities related to the configuration are defined, f 1 (x), f 2 (x), . .., f K (x), which summarize some properties of a system, and whose average values ⟨f ν (x)⟩ exp can be calculated.The distribution P(x) is then sought such that the average values of the K considered functions, ⟨f ν (x)⟩ exp observed experimentally, coincide with their expected values ⟨f ν (x)⟩ P with respect to the distribution.
The following expression for the entropy function is considered: and the constrained maximization problem is solved, where the constraints are given by the partial information available, imposing: with v = 0, . .., K. Since the probability distribution must be normalized, the following technique is used: we choose f 0 (x) = 1 and impose that it is equal to the experimental value 1.To solve the problem, Lagrange multipliers are used, introducing the K parameters {λ µ } and the generalized entropy function: The optimization is then performed on S[P; {λ µ }] with respect to the probability P(x) and the parameters, imposing two conditions.
A first condition: (6)   from which we obtain: with Z me ({λ ν }) = S x exp(−λ 0 − 1).Normalizing with respect to λ 0 is equivalent to normalizing the distribution, so it can be explicitly written as: A second condition: from this, it can be seen that maximizing the generalized entropy with respect to the parameters {λ ν } is equivalent to imposing that the averages of the considered functions measured experimentally coincide with the values predicted by the distribution.Writing explicitly ⟨f µ (x)⟩ P and substituting in it the expression of P(x) found in (7), we have: Substituting the distribution P me (x) into (5), we obtain the following expression for the generalized entropy: This last expression coincides with the logarithm of the probability that the model generates the observed data, namely the logarithm of the likelihood.
The Maximum Entropy Principle is based on the concept of distributional divergence of observed data and, therefore, does not guarantee the fulfillment of a result based on minimum variance, a preferential requirement in cases where estimating the value for predictive purposes is necessary.For this reason, Equation (3) may also be supplemented, as needed, with an additional optimization constraint based on the measurement of standard deviation or variance.
In practical terms, concerning the price variable in the context of real estate properties, a higher standard deviation could indicate greater variability in property prices, with some properties having significantly higher or lower prices than the average.On the other hand, a lower standard deviation might suggest that property prices are more homogeneous and closer to the mean.When using standard deviation as a constraint in an optimization problem, it is possible to adjust this parameter to model the desired tolerance regarding data variability.For instance, a constraint based on standard deviation could be employed to impose limits on the variability of weights assigned to variables in the optimized model.In analytical terms: with i = 1, . .., h, and where h represents the observed comparable properties, Price i denotes the sales price of the i-th property, w i corresponds to the optimal weights for each ith property, V* is the estimated value of the subject determined by the optimal set of weights w i .All this can be interpreted as an additional penalty to the model based on the deviation of the price variable from its mean, fixing a desired value for a quantity associated with the standard deviation of real estate prices.

Context of the Dataset
Naples is an Italian city of approximately 910,000 inhabitants, the third largest in Italy by population, capital of the Campania Region, the metropolitan city of the same name and the center of one of the most populous and densely populated metropolitan areas in Europe [38].Considered one of the great Italian and global tourist destinations, Naples is divided into 10 administrative municipalities of approximately one hundred thousand inhabitants each [39].
The average prices of residential properties in the city of Naples, updated to November 2023, are equal to €/sqm 2802, recording an increase of 3.47% compared to November 2022 (€/sqm 2708).In the last 2 years, the average price within the city of Naples reached its maximum in the month of October 2023, with a value of €2816 per square meter.The month in which the lowest price was requested was January 2022: an average of €2682 per square meter was requested for a property for sale [40].The sample of comparables was drawn from the "Centre" area, where, in November 2023, residential properties for sale recorded an average offer price of €2304 per square meter, with an increase of 7.61% compared to November 2022 (€/sqm 2141) [40].A general overview of the average real estate values for the residential segment of Naples is shown in Figure 1.Abstract: Accuracy in property valuations is a fundamental element in the real estate market for making informed decisions and developing effective investment strategies.The complex dynamics of real estate markets, coupled with the high differentiation of properties, scarcity, and opaqueness of real estate data, underscore the importance of adopting advanced approaches to obtain accurate valuations, especially with small property samples.The objective of this study is to explore the applicability of the Maximum Entropy Principle to real estate valuations with the support of Lagrange multipliers, emphasizing how this methodology can significantly enhance valuation precision, particularly with a small real estate sample.The excellent results obtained suggest that the Maximum Entropy Principle with Lagrange multipliers can be successfully employed for real estate valuations In the case study, the average prediction error for sales prices ranged from 5.12% to 6.91%, indicating a very high potential for its application in real estate valuations.Compared to other established methodologies, the Maximum Entropy Principle with Lagrange multipliers aims to be a valid alternative with superior advantages.

Data Specification
In accordance with phases 1 and 2 of the proposed model, the analysis of the reference real estate market segment has led to the identification of only eight real estate data points in the "Center" area of Naples.Specifically, the sample of real estate data consists of eight properties located in multi-story condominiums and sold in 2023 (last six months).These properties are situated in a homogeneous urban area in terms of qualification and distribution of key services.Only nonhomogeneous real estate characteristics were detected for each sampled unit and, in particular (see Tables 1 and 2): • real estate sale price expressed in euro (PRICE); • commercial area of housing unit expressed in square meters, i.e., the sum of the internal area plus eventual other secondary areas virtualized through specific coefficients used in the respective real estate market (SUR); • number of floor levels of housing units (LEV); • maintenance status (MAIN) expressed with a score scale: two if the housing unit is in optimal condition, one if maintenance status is good, and zero otherwise (mediocre status); • number of rooms constituting the housing unit (ROOMS); • number of bathrooms in the housing unit (BATH).
Other real estate characteristics, manifesting with the same modalities in all the sampled units, were excluded from the dataset.

Results
Having defined the real estate sample, the methodology based on the Maximum Entropy Principle integrated with Lagrange multipliers was applied, assuming that, alternatively and individually, each comparable in the real estate sample constitutes the subject for which the value is to be determined.This allowed for effective testing of the methodology and avoiding the randomness of the result obtained on a single subject.All the elaborations were performed using MATLAB software vers.9.0.0 [36].
For each subject, the solution and optimal value of the objective function were then processed, along with the definition of weights for each optimal solution.In general, to obtain the predicted probabilities for a specific observation, you can use the optimal weights obtained from the optimization process and then apply them to the prediction function.In the case of interest, optimal weights for the variables ROOMS, BATH, SUR, LEV, and MAIN were obtained during the optimization process (see Tables 3-10).In more operational terms, this means that to obtain a specific appraisal for a subject, the initialization vector of values needs to be adjusted to provide a starting point closer to the desired solution for optimization.Alternatively, it is possible to add tighter constraints for the moments of consistency, to require the system that the variables take on the desired values.Both options should lead to similar results if configured correctly.
After estimating the market value for each individual subject in correspondence to the optimal solution, the model was reapplied for each of them, considering the application of an exogenous nonlinear limit (upper limit of the standard deviation for the PRICE variable), in accordance with relation (12), significantly increasing the degree of estimation complexity.A constraint so imposed could have relevant meaning, for example, when there is nonsample information available, in terms of subjective upper or lower bounds on the final value estimate.With this aim, the constraint can be incorporated into the model to meet this requirement.
Table 11 presents the optimal solution of the objective function imposing a specific upper limit on the standard deviation of the PRICE variable, set, for example, at €5000, €10,000, and €20,000.In Tables 3-10, varying the sample estimate also changes the incidence of weights, expressing a "ranking" of real estate characteristics for the respective sample estimate considered each time.The sum of the products between the optimal feature weights and the corresponding prices was also incorporated into the objective function.However, the formulation of the objective function and constraints can be further customized to one's specific needs.
The main results obtained from Tables 3-11 are summarized in Table 12, where observed real estate prices and corresponding values estimated by the model are reported, along with the percentage divergences between them.These divergences are categorized into two clusters: the Maximum Entropy "basic" model and the "best fit" model.In the "basic" model, the results pertain to the definition of the objective function based on Shannon's Entropy, as well as the Lagrangian function that includes constraints related to variability, normalization, and coherence moments for real estate variables.In the "best fit" column, improved results compared to the basic model are presented when an additional constraint is incorporated, namely an upper limit on the standard deviation concerning the "PRICE" variable.
In the basic model, the absolute percentage error ranges from a minimum of 2.43% to a maximum of 15.61%, with an average absolute error percentage of 6.91%.Regarding the "best fit" solution, the absolute percentage error varies from a minimum of 2.43% to a maximum of 12.27%, with an average absolute error percentage of 5.12%.In the latter case, the results show significant improvements, especially in cases where errors were higher in the basic model: for comparable three, the error changes from −10.11% to −8.64%.For comparable five, the error changes from 3.43% to −2.05%, and for comparable six, the error changes from −15.61% to −4.15%.

Discussion
The results obtained through the application of the proposed method are excellent and suggest that the Maximum Entropy Principle with Lagrange multipliers can be successfully used for real estate valuations.In the case study, the average prediction error for sales prices ranged from 5.12% to 6.91%, indicating a very high potential for its application in real estate valuations.
The proposed methodology has demonstrated particularly strong adaptability in cases where small real estate samples, consisting of a limited number of transaction data points, are necessary.This is due to spatial and temporal limitations in data collection, the atypicality and complexity of real estate, as well as challenges in certain national contexts (e.g., Italy) arising from difficulties in obtaining and verifying market prices.The most common purposes to which the model can be applied include real estate sale or purchase, mortgage or financing, inheritance or judicial divisions, legal disputes, regeneration and real estate development operations, leasing or rent, asset valuations, insurance, and regulatory compliance.
In practical circumstances, the real estate samples may be so small as to preclude the use of statistical tools designed for samples categorized as small or very small in the statistical sense [41,42].
In these latter circumstances, the so-called "Appraisal System" (or "General Appraisal System", SGA) and the "Market Comparison Approach" (MCA) have traditionally represented the only paradigm capable of providing a quantitative measure of the value of a property [43].Compared to these established methodologies, the Maximum Entropy Principle with Lagrange multipliers aims to be a valid alternative with superior advantages.
Among the drawbacks of SGA is the strong instability in solving the mathematical system that supports it in the presence of possible anomalies or outliers, with direct consequences on the estimation of marginal prices of real estate characteristics considered for comparison.In MCA, marginal prices are almost always determined empirically, so their estimation might not accurately reflect the dynamics of real estate market price formation.Unlike these limitations, the method proposed in this study does not require inferring marginal prices of real estate characteristics and demonstrates considerable stability from a computational perspective.This is because entropy is exploited as a tool for modeling: by using the Maximum Entropy method, it is possible to determine the probability of a configuration or set of real estate characteristics for the purpose of property valuation.
Future developments of the methodology certainly concern the minimum size of the real estate sample necessary to implement the methodology, the incorporation of qualitative features in the dataset (e.g., architectural, cultural or environmental resources, panoramic views, etc.), the study of constraints and parameters capable of optimizing the result computationally, and the potential integration with other statistical methodologies to explain the marginal prices of real estate characteristics considered in the model (e.g., hedonic price models).

Conclusions
In the dynamic landscape of the real estate sector, the accuracy of property valuations has become a crucial element for market operators, investors, and industry professionals.The increasing complexity of factors influencing property values requires a more sophisticated approach, where scientific research and the use of advanced computational tools become essential.This study aimed to explore the convergence between scientific research and the practical needs of real estate valuation, highlighting the importance of robust and accurate statistical models to address current and future challenges in the real estate sector.
Traditional real estate valuations, based on conventional approaches, can often underestimate the complexity of market dynamics, macroeconomic trends, and specific property characteristics.The need for more in-depth analysis is evident, pushing industry experts to increasingly turn to advanced statistical tools to obtain more reliable and informed estimates.
The Maximum Entropy Principle applied in this study allows deriving models substantially analogous to those of Statistical Mechanics, without the need to formulate any specific hypotheses to develop models describing the statistical behavior of an economic system.This is not the first time that economic sciences draw from the "hard" sciences, or vice versa, namely scientific disciplines that use quantitative, empirical, and often experimental methods to study natural phenomena.However, compared to the past, the current perspective is entirely new: no longer are universal laws sought to define the economic nature of a good according to a deterministic view of reality, but rather, there are some fundamental mechanisms that naturally emerge in a network of interconnected elements, regardless of the nature of the network itself.There are, in other words, some dynamics that repeat when there is a set of interacting units, be they humans, particles, or economic goods and systems.Often, in practice, recourse is made to mean-field models, which imply strong assumptions and simplifications, but in reality, the interactions between individual units of a system do not always have constant intensity.
In this context, the present research aims to represent a significant innovation in real estate valuation methodologies.Currently, the utilization of sophisticated statistical models and the integration of complex variables are crucial aspects for overcoming the limitations of traditional valuations.

Figure 1 .
Figure 1.Average real estate values for the residential segment of Naples, with the "Centre" area delimited by a blue line (source: www.immobiliare.it(accessed on 20 December 2023) [40]).

Table 2 .
Statistical description of real estate dataset.

Table 3 .
Solutions for comparable no. 1 (if considered as subject).

Table 4 .
Solutions for comparable no. 2 (if considered as subject).

Table 5 .
Solutions for comparable no. 3 (if considered as subject).

Table 6 .
Solutions for comparable no. 4 (if considered as subject).

Table 7 .
Solutions for comparable no. 5 (if considered as subject).

Table 8 .
Solutions for comparable no.6 (if considered as subject).

Table 9 .
Solutions for comparable no.7 (if considered as subject).

Table 10 .
Solutions for comparable no.8 (if considered as subject).

Table 11 .
Optimal solutions of the objective function for different standard deviation values.