Contextualized Property Market Models vs. Generalized Mass Appraisals: An Innovative Approach

: The present research takes into account the current and widespread need for rational valuation methodologies, able to correctly interpret the available market data. An innovative automated valuation model has been simultaneously implemented to three Italian study samples, each one constituted by two-hundred residential units sold in the years 2016–2017. The ability to generate a “unique” functional form for the three different territorial contexts considered, in which the relationships between the influencing factors and the selling prices are specified by different multiplicative coefficients that appropriately represent the market phenomena of each case study analyzed, is the main contribution of the proposed methodology. The method can provide support for private operators in the assessment of the territorial investment conveniences and for the public entities in the decisional phases regarding future tax and urban planning policies.


Introduction
In recent years, the awareness of the "value of the data" and its growing availability has determined an increasing diffusion of innovative techniques to measure, elaborate, and interrelate the detectable amount of information in all sectors. In the USA, the National Bureau of Economic Research has highlighted the opportunities in big data and its implications for the economics profession [1]. Several Scholars [2][3][4] have outlined the prominent role of "spatial" big data-in both the public and private sectors-in the future management and governance of smart cities. It should be outlined that in the urban planning sector, the concept of cognitive cities aims at overcoming the use of topographic maps and at making the interconnections explicit among the subjects involved (governments, citizens, businesses) through the inclusion of cognitive processes for decision-making that take into account a set of experiences and services that appropriately reflect the collective needs [5,6].
Following the US subprime mortgage crisis, in order to guarantee more objective and reliable valuations and to explain the functional relationships that link the market factors with property values, there has been also in the real estate sector the need to systematize the property data coming from heterogeneous sources [7,8] and to exploit the potentialities of more complex algorithms and software, usually named automated valuation models (AVMs), in processing the available amount of big data [9]. In particular, the term "Proptech" (property technology) refers to the new frontier of property data management: By accumulating the real estate data (sales, management, user interaction with the site), it is possible to outline growth trends in specific market segments and/or geographical areas that allow those in the sector to identify attractive initiatives for the potential investors. This new market largely takes into account art. 208 (3) (b) of the Capital Requirements Regulation (EU) No. 575/2013, which constitutes the main EU law aimed at decreasing the likelihood that banks go insolvent, and highlights the importance that credit institutions "use statistical methods to monitor the value of the immovable property and to identify the immovable property that needs revaluation".
Although the utility of the AVMs for urban planning and for contributing to the performance of urban information systems has been widely recognized [10], several authors have highlighted the danger of using "black boxes", i.e., models that are not transparent and/or are difficult to be managed by less competent users [11,12], who are inevitably forced to rely on experts, even when they are not sure how good these experts are [13]. Therefore, this contingency points out the need for models of mass appraisal that are capable, on the one hand, of constantly modifying in the short-term according to the variations of the starting scenario conditions, and on the other hand, of satisfying the empirical reliability of the results obtained and of providing for functional relationships that are easily interpretable and repeatable in different territorial contexts, avoiding excessively complex solutions, which inevitably lead to an excessive automatism of the process and an increase of uncertainty for the end-user [14].
The present paper aims at providing a contribution in the framework stated above. The research proposes the implementation of an innovative AVM on the data samples collected in different territorial contexts, able to generate a "unique" functional form that explains the market behavior in the areas considered. The relationships between the influencing factors and the selling prices are specified by different multiplicative coefficients that appropriately represent the market peculiarities of each case study. The analysis has been carried out on three study samples, each one constituted by two-hundred residential units sold in the years 2016-2017, respectively located in the South of Italy (Bari), in the Center of Italy (Rome), and in the North of Italy (Turin). For each study sample, the selling prices and the main technical and spatial factors that influence the price formation have been detected.
The advantages of the research are related to the definition of a methodology that could be simultaneously implemented in many territorial contexts: The outputs are models characterized by the same functional structure in terms of mathematical correlations among the variables, but differentiated by the multiplicative coefficients that reflect the specific market phenomena of each territorial area analyzed. Therefore, the proposed methodology can provide support for all professional subjects involved in the property market dynamics: (i) The public administrations, who could use the procedure for future tax policies, urban planning decisions, and for the verification of the reliability of the property selling prices declared in private transactions; (ii) the institutional entities (e.g., banks, asset management companies, etc.), who could quickly monitor the volatility of the property market, as provided by art. 208 (3) (a) of the Capital Requirements Regulation (EU) No. 575/2013, which stresses the need of frequent verifications of real estate values, and therefore the performances of the financed investments in any territorial area; (iii) the private/public investors interested in property development initiatives, who could automatically assess the revenues generated by these investments in different territorial contexts and identify the most convenient interventions; (iv) the private/public operators interested in investing in the "existent" property stock, who could define the combinations of the properties located in different territorial areas that minimize the total differential between the actual market values and the respective asking prices; and (v) the ordinary purchaser, who could independently verify the market reliability of the asking price of the property they are interested in.
Although the applications carried out are characterized by some limitations, mainly due to the sample sizes that it has been possible to collect, the research aims at providing a further contribution in the existing literature on mass appraisal methods through the analysis of a single functional form and the consequent generalization of the functional relationships among property market variables, even when the locations of the study samples are very different.
The paper is structured as follows. In Section 2, the main reference literature on mass appraisal techniques has been reported. In Section 3, the case studies have been presented and the influencing factors considered for the elaborations have been illustrated. In Section 4, the implemented AVM has been explained. In Section 5, the application of the method to the case studies has been described, the outputs obtained in terms of statistical performances and empirical reliability of the functional relationships have been highlighted, and the results have been interpreted. In Section 6, the hedonic price method has been applied to the sample data of each city in order to compare the results and outline the potentialities of the proposed methodology. Finally, in Section 7, the conclusions of the research have been discussed.

Background on the Main Mass Appraisal Techniques
The numerous applications in the reference literature have allowed us to appropriately outline the potentialities of mass appraisal techniques. McCluskey and Anand [15] elaborated a review of the main intelligent hybrid techniques for the mass appraisal of residential properties, highlighting the respective strengths and weaknesses. Pagourtzi et al. [16] and French and Gabrielli [17] collected the main AVMs employed in the real estate market, in order to provide a better understanding of the measurement of locational effects. Metzener and Kindt [18] schematized the main parameters to be considered for the implementation of an AVM in the assessment of residential properties. D'Amato and Kauko [19] highlighted the potentialities of AVM in the property assessments after the 2008 USA subprime crisis.
In addition to the classic mass appraisal method of the hedonic price modelling, numerous scholars have implemented different data-driven techniques, frequently characterized, on the one hand, by a higher flexibility than the common approach of the hedonic price method, on the other hand, by the difficulty to provide an explicit empirical interpretation of the functional relationships among the influencing factors and the property values [54]. Starting from the recent analysis reported in Kauko and D'Amato [55], the following mass appraisal methodologies have been studied in the reference literature and implemented in the real estate sector.
Fuzzy logic models have been proposed by Byrne [65], Bagnoli, and Smith [66], and Mao and Wu [67]. Chang and Ko [68] combined the fuzzy set theory with a multi-objective programming approach in order to determine the relationship between the land use and future urban planning. Bonissone et al. [69] developed a neuro-fuzzy system to be applied in the property market. Siniak [70] elaborated a fuzzy method as an evolution of the standard valuation approaches.
Autoregressive integrated moving average models have been implemented in real estate appraisal by Tse [71] and Makridakis et al. [72]. Autoregressive vectorial approaches have been developed by Sivitanides et al. [73], Iacoviello [74], Elbourne [75], and Chen et al. [76]. Kim Hin and Calero Cuervo [77] employed a cointegration approach to demonstrate the correlation among the residential property prices, the real gross domestic product, and the rate for mortgage banking loans.
Methods of spatial statistical analysis have been applied to determine the influence of the access services [78][79][80][81]. Some authors have implemented spatial autocorrelation statistics [82,83] and kriging techniques [84,85] for the identification of the neighborhood factors that mainly affect the property prices. La Rose [86] studied the potentialities of triangulated irregular network for predictive purposes.
With reference to the goal programming logic applied to the property market, Kettani et al. [87] developed an evaluation model by integrating the multicriteria approach through the mathematical programming. Kettani and Khelifi [88] elaborated a decision support model to be applied for the rationalization of the municipal tax system. Estellita Lins et al. [89] designed a procedure based on data envelopment analysis for the evaluation of housing property prices located in different neighborhoods of Rio de Janeiro. Adolphson et al. [90] applied goal programming techniques for the evaluation of the obsolescence of properties owned by State Railways.
Recently, the logic of genetic algorithms has been applied to the real estate market by several Authors. Kròl et al. [91] have developed a fuzzy rule-based system to support the property assessment by employing an evolutionary algorithm to generate the rule base. Wang [92] has proposed a decision support system that, through data envelopment analyses, converts numerical data into information that can be used to evaluate real estate investments. Dzeng and Lee [93] have elaborated a model to optimize the development schedule of resort projects by using a polyploidy genetic algorithm.
A specific application of a multi-objective genetic algorithm for the identification of the "best" regression models, named evolutionary polynomial regression, has been recently implemented to several property issues. Initially applied to hydraulic systems [94][95][96], the methodology has been borrowed and used to elicit the correlations between the property market and socio-economic variables [97,98], energetic factors [99], and for property tax purposes [100]. Other studies have allowed researchers to highlight the advantages of the methodology compared to orthodox [101] and innovative mass appraisal approaches [102]. There are no other applications to the property market so far.

Case Studies
The three study sample areas are located in Italian territorial contexts that are different for geographical, socio-economic, and climatic conditions: The first one was collected in the city of Bari, which overlooks the Adriatic Sea and is the capital of the Apulia region, in Southern Italy; the second study sample was detected in the city of Rome, capital of Italy, located in the Lazio region, in Central Italy; the third study sample area is located in the city of Turin, capital of the Piedmont region, in Northern Italy.
A sample of two hundred residential units sold in the years 2016-2017 were collected for each of the three Italian cities analyzed. Regarding the selection of the factors that mainly contribute to the formation of the selling prices in the corresponding market segment of each property of the study samples, the reference literature has widely highlighted the unavoidable trade-off between bias from omitted variables and increased sampling variance associated with collinearity that is involved in this phase [103][104][105][106], even if there is a relative agreement on the major influencing factors [107,108]. In this research, the influencing factors have been identified by taking into account the indications of local appraisers and real estate agents consulted [109]. Therefore, having selected the unit selling price, expressed in €/m 2 [P], as the dependent variable of the model, the following technical and spatial factors have been considered. The presence of the lift [L]. In the model, this variable is considered as a dummy variable, in particular the presence of the service is represented by the value "one", whereas the absence of the service is indicated with the value "zero"; -The quality of the maintenance condition of the apartment, taken as a qualitative variable and differentiated, through a synthetic evaluation, by the categories "to be restructured" [Mp], "good" [Mg], and "excellent" [Me]. Following the logic of the dummy variables, the score "one" is assigned to the category that defines the specific quality of each property, and the score "zero" for the remaining two categories [110]. In particular, the "to be restructured" state refers to properties that require significant refurbishment interventions, due to the fact that the functionality of the property is compromised by the inappropriate conditions of the elements that compose it; the "good" state indicates properties whose maintenance conditions are acceptable and whose functions can be conducted without heavy interventions. Finally, the "excellent" state refers to buildings characterized by construction and aesthetic high quality, possibly affected by recent redevelopment and renovation initiatives; - The energy performance certificate (EPC) label [Ep], expressed, according with the current regulations, through the denominations from A4 (the highest level) to G (the lowest level). In the present research, the EPC labels from A4 to A are gathered into a single explanatory variable (EpA). Therefore, the variables considered are specified by the following abbreviations, which recall the label they belong to: EpA, EpB, EpC, EpD, EpE, EpF, EpG. Each parameter is interpreted as a dummy variable, assigning a score equal to "one" to the EPC label of the property and, consequently, the score equal to "zero" to all the others; - The age of the building in which the residential unit is located [O]. This variable is calculated as the difference between the year when the property was sold (2016-2017) and the year of construction of the building.

Spatial factors -
The distance from the nearest highway, expressed in kilometers it takes to get there by car [T] (determined through the application on www.google.com/maps); - The distance from the nearest subway, expressed in kilometers it takes to walk to it [W] (determined through the application on www.google.com/maps); - The municipal trade area in which the property is located, considering the geographical distribution developed by the Italian Revenue Agency (http://www.agenziaentrate.gov.it), due to the different location characteristics that contribute to the formation of the selling prices. In particular, five trade areas are defined by the Italian Revenue Agency: "central", "semi-central", "peripheral", "suburban", and "rural". With regard to the cities under analysis, the Italian Revenue Agency considers four trade areas: "central" [Uc], "semi-central" [Usc], "peripheral" [Up], and "suburban" [Usb]. For each property, the score "one" is assigned if the property belongs to the specific trade area, whereas the score "zero" is reported for all the remaining spatial factors.
Some considerations should be developed based on the sizes of the three data samples, which could appear as the main limitation of the applications performed. However, even if they are not very large, the sample data are certainly interesting compared to other Italian mass appraisal applications and taking into account the structural opacity that generally characterizes the Italian real estate market. For example, in Simonotti [111] the sample size is constituted by 33 data; in Curto [112] the sample size is equal to 66 data; in Del Giudice and De Paola [113] the sample size is equal to 64 data; and in D'Amato [114] the sample size is constituted by 114 data. It is evident that the sample size affects the number of influencing factors that can be involved in the regression model. Furthermore, it should be highlighted that the sizes of the three data samples satisfy the rule-of-thumb established in the current literature [115]. According to this rule, it must be N ≥ 50 + 8m for the multiple correlation and N ≥ 104 + m for the partial correlation, where N is the number of subjects in each data sample and m is the number of influencing factors considered. Table 1 summarizes the variables in the analysis, specifying, for each one of them, the acronym used, the type (cardinal or dummy), and the measurement unit. Tables 2-4 show the main descriptive statistics of the selling prices and the influencing factors for the three cities analyzed. Regarding the selling prices of the collected properties, it should be noted that the average values are consistent with the quotations determined by official entities for the period in analysis (www.scenariimmobiliari.it).

The Method
The methodological approach in valuation, named evolutionary polynomial regression (EPR), constitutes a generalization of the stepwise regression. The method employs a simple genetic algorithm engine in order to combine the variables and to investigate the model mathematical structures through continuous reiteration: Specifically, the central idea of EPR is to search the best form of the function, which is a combination of independent variable vectors (the chosen variables, model inputs), by performing a regression with the least squares method, in order to obtain the value of the coefficients of each variable, elevated by the appropriate exponents. Therefore, the technique is based on both numerical and symbolic regression. In particular, EPR searches for the form of the polynomial expressions in which the monomial terms can be-more or less-complex combinations of the input variables, and it finds the values of the constants in the expressions by optimizing the least squares. The generic expression of the methodology is represented by Equation (1), which describes a generic nonlinear model structure: where n is the number of additive terms, i.e., the length of the polynomial expression (constant additive term excluded), a0 represents the constant additive term, ai are numerical parameters to be valued, Xi are candidate explanatory variables, (i, l)-with l = (1, ..., 2j)-is the exponent of the l-th input within the i-th term in Equation (1), f is a function constructed by the process. The exponents (i, l) are also selected by the user from a set of candidate values (real numbers). The structure of f is set by the user, according to any physical insight. The iterative investigation of the model mathematical structures, implemented by exploring the mix of exponents for each candidate input variable in Equation (1), is carried out through a population-based strategy that applies the genetic algorithm, whose individuals are the sets of exponents in Equation (1) and selected by the user according to the complexity required for the model. For example, the exponents should belong to the set (0, 1) to generate an additive linear model, whereas if the exponent "2" is also included, the possibility of a quadratic model is admitted. Therefore, the number and the complexity of the solutions that the genetic algorithm will generate depend on the maximum number of terms allowed and on the possible exponents that the user defines in the preliminary phase. It should be pointed out that EPR does not require the exogenous definition of the mathematical expression and the number of parameters that fit better the detected data, as the iterative process of the underlying genetic algorithm directly generates the best mathematical expression.
The main potentiality of EPR is the ability to pursue an optimal Pareto frontier of three objective functions. These objectives are conflictual, and aim at (i) the maximization of the model accuracy, through the satisfaction of the appropriate statistical criteria of verification of the equation; (ii) the maximization of the model's parsimony, through the minimization of the number of terms (ai) of the equation; and (iii) the reduction of the complexity of the model, through the minimization of the number of the explanatory variables (Xi) of the final equation. Through an electronic spreadsheet implemented in Excel (www.hydroinformatics.it), the optimization strategy, based on the Pareto dominance criteria defined above, allowed researchers to generate, at the end of the modelling phase, a set of model solutions-that represent the Pareto front of optimal models-for the three objectives. In this way, a range of models is offered to the operator, among which it is possible to choose the most appropriate solution according to the specific needs, the knowledge of the phenomenon in analysis, and the type of experimental data used. This condition provided by EPR constitutes an important innovation compared to the other mass appraisal techniques (e.g., a canonical hedonic price method)-in fact, in real estate literature the classic modality of treating the correlation between the property prices and the influencing factors consists of implementing the main functional forms (linear, semi-log, quadratic, log-linear) or in employing the Box-Cox and Box-Tidwell transformation [116,117], and then selecting the best data interpolation function.
Taking into account the advantages of EPR, an evolution of the methodology has been proposed and tested in this research. This approach develops generalized prediction models by simultaneously considering the variables from a number of individual data samples. With reference to the property market sector, the methodological approach, named multi-case strategy for EPR [118], simultaneously identifies the best set of significant explanatory variables (i.e., the influencing factors) and their best combination, able to describe the same phenomenon (i.e., the selling prices) in all the data samples analyzed. The mathematical logic of the multi-case strategy borrows the genetic algorithm search procedure of EPR, by using the least squares method in order to simultaneously estimate the unknown polynomial coefficient values for all the data samples considered, and by calculating the three objective function values (sum of squared errors, number of polynomial terms, number of significant explanatory variables) in order to assess each model structure's fitness. In particular, the mathematical approach of the multi-case strategy does not calculate separately the statistical model accuracy of each model as a separate objective to be maximized-as the basic EPR does-, but it merges all data samples' measures of model accuracy into a single fitness value, named CODMCS and defined in Equation (2) where m is the number of data samples for which a generalized prediction model is required (k = 1, ..., m); Nk is the size (i.e., number of individuals) of the k-th data sample; N is the total number of individuals that constitute all the m data samples in analysis; yk is the value of the dependent variable estimated by the methodology through the statistical inference on the k-th vector of parameters, ydetected is the corresponding observation, and mean(ydetected) is the average value of the collected values in the m data samples. The closer to the unit value the CODMCS is, the more suitable the model structure is in representing the overall observed dataset.

The Generalized Model Obtained and Its Specification to the Case Studies
The multi-case strategy for EPR method has been implemented by considering the base model structure reported in Equation (1) with no function f selected. Each generated algebraic expression consists of a maximum number of ten terms and each additive monomial term is assumed as a combination of the inputs (i.e., the explanatory variables) elevated by the proper exponents. Taking into account that the larger the exponent range is, the higher the number of elaborations performed by the genetic algorithm underlying the method is, the eligible exponents have been assumed equal to nine (−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2), in order to reduce the complexity of the calculation and at the same time to generate a wide range of models among which to select the "best compromise" in terms of CODMCS and empirical reliability of the relationships between the candidate inputs and the dependent variable (selling prices).
In Table 5, the mathematical specification of the multi-case for EPR model implemented has been reported.
In accordance with the mathematical formulations generally found in the current literature concerning the property market [27,[119][120][121], the log-linear model has been used. Therefore, the dependent variable to be estimated in the model is Y = ln(P). The log-linear form has many attractive characteristics [122], compared to a classical linear form [123]: (i) it allows for the joint determination of expenditures in the regression, i.e., the prices of one component partially depend on the other factors in the house; (ii) the interpretation of the coefficients of a log-linear model is always simple, as they represent the percentage change in the dependent variable given a unit change in the explanatory variable [124]; (iii) it partially mitigates a common form of heteroskedasticity; (iv) the model is easy to be implemented and interpreted.  The implementation of the algorithm to the three case studies has generated nine equations, reported in Table 6, that are able to simultaneously describe the main relationships between the selling prices and the influencing factors in the Italian cities considered. The models are characterized, on the one hand, by a progressively articulated and complex algebraic form, on the other hand, by a gradually increasing level of statistical accuracy: The equations present a progressively increasing number of monomial terms and independent variables and, at the same time, a value of the statistical performance indicator (CODMCS) closer to the unit. The CODMCS relating to the models obtained, in fact, varies between a minimum value of 60.00%-Equation (3) in Table 5-and a maximum value of 78.32%-Equation (11) in Table 5.

Polynomial Expression Structure
In order to define a generalized model that allows researchers to simultaneously explain the price formation mechanism in the three different market segments analyzed, the model of Equation (11) has been selected; in fact, this model is characterized by the highest statistical accuracy and takes into account the largest number of influencing factors. In particular, the explanatory variables identified by the algorithm implemented in Equation (11) are the following: the location of the property in a central trade area (Uc); the distance from the nearest highway (T); the lowest EPC label (EpG); the highest EPC label (EpA); the "excellent" maintenance conditions (Me); the "to be restructured" maintenance conditions (Mp); the age of the building (O); the floor level of the property (F); the number of bathrooms (B); the presence of the lift (L); the total surface of the property (S); the distance from the nearest subway (W); and the location of the property in a suburban trade area (Usb). Therefore, taking into account the explanatory variables initially identified and detected for the three case studies, the generalized model elaborated by the multi-case strategy for EPR algorithm does not judge the "intermediate" values of the dummy variables considered-Mg, EpB, EpC, EpD, EpE, EpF, Usc, Up-as relevant in the explanation of the selling price formation phenomena in the residential market of the three Italian cities under analysis.
The main advantage of the methodology proposed is the possibility to obtain a unique functional form that is valid for all the case studies considered, with the determination of the coefficients ai for each sample according to the specific market conditions of the territorial context analyzed. In Table  7 the parameters of the generalized model selected-Equation (11) in Table 6, specified for each of the three study samples, have been reported.  Table 8 shows the final models for the three study samples and the respective main statistical performance indicators, i.e., the root mean squared error (RMSE); the mean absolute percentage errors (MAPE), that is, the average percentage error between the prices of the original sample and the values estimated; and the maximum absolute percentage errors (MaxAPE), that is the maximum percentage error between the prices of the original sample and the values estimated by the model. The indicators in Table 7 highlight, on the one hand, the good statistical reliability of the outputs obtained for all the three case studies, on the other hand, the ability of the genetic algorithm underlying the methodology employed to make the results statistically uniform, i.e., its capacity to simultaneously pursue the Pareto optimal solutions for all the study samples under analysis. In fact, it is not a coincidence that the best statistical indicators have not been obtained for a single case study: The lowest RMSE concerns the Rome case study (3.89%), which is instead characterized by the highest MAPE (3.02%); the lowest MAPE has been calculated for the Bari study sample (2.41%), which presents the highest MaxAPE (9.62%); the lowest MaxAPE has been verified for the Turin case study (8.24%), for which the highest RMSE has been determined (4.15%).
In order to also investigate the stability of the models obtained, a ten-fold cross-validation [125] has been implemented on the starting database for each model. The outputs obtained confirm the good prediction performances of the models-in Table 9, the average percentage errors between the detected prices and the estimated prices in the training set and in the validation set of each iteration have been reported. In all the tests, the statistical indicator is less than 4.0%; several iterations-nos. 2, 4, 6, and 9 for the model of the city of Bari; nos. 3, 5, 9 for the model of the city of Rome; nos. 2, 5, 8, and 9 for the model of the city of Turin-the average percentage error calculated for the validation set is less than the corresponding indicator for the training set. Table 10 summarizes the explanatory variables identified by the genetic algorithm as the main influencing factors on the selling prices of each residential market segment analyzed, and ablethrough the appropriate combinations described by the models' expressions-to pursue the optimal Pareto frontier of the three objective functions (statistical accuracy, minimization of the number of terms, reduction of the complexity of the model).   Table 9. Average percentage errors [%] between the detected prices and the estimated prices obtained through a ten-fold cross-validation. It should be firstly outlined that, if the number of the bathrooms (B) and the presence of the lift (L) are the only characteristics that appear in all the three models, a different market behavior is verified in the city samples in regards to the other factors. With reference to the energetic components, the model for the Bari study sample only takes into account the "extreme" conditions of the EPC label (EpA and EpG); the model for the Rome case study only selects the worst label (EpG); whereas the Turin model does not provide a relevant market appreciation of the energetic characteristics in the housing prices formation. In regards to the quality of the maintenance condition, the models for the Bari and Turin samples appreciate the contribution of the excellent state (Me) on the selling prices, whereas the Rome model reveals a higher importance for the worst condition (Mp). The floor of the property (F) and the total surface (S) are identified by the models for the cities of Bari and Rome, whereas the age of the building (O) is only selected by the model for the Rome study sample. The distance from the nearest subway (W) incorporates the contribute of the location factor in the model for the Bari sample, whereas in the Rome model the position in the central (Uc) and the suburban (Usb) trade areas are also included; finally, in the model for the Turin study sample the spatial characteristics identified by the genetic algorithm are the distance from the nearest highway (T) and the position in the central perimeter of the city (Uc).

Empirical Analysis of the Functional Relationships in Each City Model
In order to verify, beyond the statistical accuracy of the models that has been already pointed out, the empirical coherence of the functional relationships determined by the implementation of the multi-case strategy for EPR on the three study samples and to explicit the quantitative contribution of each influencing factor on the selling price formation, a logical-mathematical approach has been adopted. This procedure constitutes an exogenous simplified method that, instead of determining the partial derivative of the dependent variable-P = e Y , which is the transformation of the logarithm, in order to obtain the outputs in €/m 2 measurement unit-with respect to the i-th variable, provides for the variation of the i-th variable studied in the variation interval in the observed sample by keeping constant and equal to the respective average value the mathematical terms of the other variables.
The application of the defined approach has allowed us to represent the outputs in Figures 1-3, which describe the contribution of the influencing factors identified for each study sample on the selling prices in the cities of Bari, Rome, and Turin.
The obtained results confirm the empirically expected relationships between the housing prices and the influencing factors selected by the multi-strategy methodology.
The models for the cities of Bari and Rome have demonstrated a negative functional relationship between the surface of the property (S) and the unit housing prices. With reference to increments equal to 20 m 2 in the amplitude range of the samples collected [20-400 m 2 ], in the city of Bari the unit selling prices decreased by −3.5%; for the city of Rome, instead, the obtained model has revealed a particular market behavior-in fact, the contribution of the surface on the unit selling prices is not very significant (−2.5%) if the property is located in a trade area that is different from the "suburban" one (Usb = 0), whereas there is a relevant influence (−23%) if the property belongs to the suburban zone (Usb = 1). This phenomenon, clearly represented in the graph I of Figure 2, is a characteristic of the marginal zone of the city of Rome, in which a substantial absence of market transactions for large property sizes is actually verified.  All of the three models confirm the empirical evidence of the positive correlation between the housing prices and the presence of the lift (L), as well as the selling prices and the number of the bathrooms (B). In particular, the increase of the unit selling price determined by the presence of the lift is equal to +22.22% for the Bari sample, +58.52% for the Rome sample, and +32.96% for the Turin case study, whereas the marginal contribution of the variable B is equal to +6.21% for the city of Bari, +12% for the city of Rome, and +10% for the city of Turin. It should be outlined that the importance of the presence of the lift-especially in the city of Rome-reflects a specific contingence of the Italian context, that is the high percentage of elderly people, generally characterized by a higher purchasing power than the younger ones-according to the United Nations Development Programme (www.undp.org), the average age of Italian people (approximately equal to 45 years old) is the highest in Europe, after Germany (approximately equal to 46 years old).
The floor on which the residential unit is located (F) constitutes an influencing factor on the selling prices in the models for the cities of Bari and Rome; in particular, the models have highlighted a significant increase of the unit housing price if the same property is "transferred" from the ground floor to the first floor (+15.38% for the Bari sample and +31.07% for the Rome case study), whereas the positive marginal contribution is lower for subsequent increments of the floor height (+3.4% for the city of Bari and +6.5% for the city of Rome). The "excellent" quality of the maintenance condition of the property (Me) presents a relevant positive contribution in both the Bari and Turin residential segments (respectively, +27.70% and +45.82%), whereas the model for the city of Rome has outlined an important inverse correlation (−41.42%) between the "to be restructured" maintenance conditions (Mp) and the housing prices, but this negative contribute only occurs when the property is characterized by the "G" EPC label.
In regards to the energetic variables, according to the model generated for the Bari study sample, the "extreme" EPC labels (EpA and EpG) are characterized by a significant influence on the housing selling prices of the Italian southern city. In fact, other things being equal, the highest EPC label is able to determine an increase of the unit selling prices equal to +26.07%, whereas the lowest EPC label can depreciate a property value of −23.10%. A lower contribution of the "G" EPC label has been detected by the model for the city of Rome, for which this condition involves a decrease of the unit residential price equal to −10.46%. Finally, the model for the city of Turin has not revealed any correlation between the energetic components and the housing prices-this output is consistent with the results obtained by the recent research of Fregonara et al. [126], concerning the limited influence of the energetic components on the selling price in the Turin housing market. The contingence detected is related, on the one hand, to the lack of specific Italian regulations that bind the transactions of existing properties characterized by non-performing energy ratings-probably also to avoid further damaging an already depressed market-, and on the other hand, to the "ordinary" typology of property sold in the Italian metropolitan cities. In fact, the Italian property assets is the oldest in Europe, of which, 70% is represented by buildings realized before 1976, whereas the 25% of the total housing stock has never been affected by any renovation [127,128,129].
The age of the building in which the residential unit is located (O) represents an influencing factor only in the model for the city of Rome. In particular, the graph VII in Figure 2 confirms the empirical evidence of a negative correlation between this parameter and the unit housing price. However, the model obtained has revealed a difference of the unit selling price equal to about −20% between new buildings and recent constructions (< 20 years), whereas this negative effect decreases for older properties (−6.0% every ten years).
In regards to the influence on the housing prices of the spatial factors, the models for the cities of Rome and Turin have pointed out a significant influence (respectively, +35.45% and +32.14%) of the property position in the "central" trade area (Uc). Furthermore, the model for the Rome study sample has denoted an inverse correlation between the housing prices and the location of the property in a suburban trade area (Usb), by determining a decrease of the selling prices equal to −35.39%. The distance from the nearest subway (W) has been recognized as an influencing factor in the models for the cities of Bari and Rome, characterized by a negative marginal contribution equal to −6.0% for the Bari sample and −4.8% for the Rome case study. Finally, the contribution of the distance from the nearest highway (T) on the selling prices has been identified by the model for the city of Turin-in this case, the higher distance from air and noise pollution zones determines a higher appreciation of the residential units, equal to about +20% for each further kilometer.

Comparison with the Hedonic Price Method
In order to highlight the potentialities of the proposed methodology, the hedonic price method has been implemented on the sample data of each studied city. Table 11, Table 12 and Table 13 show the hedonic models obtained, by considering the log-linear form and the variables selected by the multi-case strategy for EPR algorithm.
The best performances of the proposed methodology are evident. For the city of Bari, the coefficient of determination of the hedonic model was lower (= 62.05%) than the statistical indicator related to the corresponding model in Table 8, and the Student's t-test indicated to exclude some variables (total surface, number of bathrooms, and floor), which instead have been selected by the corresponding model in Table 8 as significant explanatory factors. For the city of Rome, the coefficient of determination of the hedonic model was rather low (= 55.77%), the Student's t-test indicated to exclude several variables (total surface, number of bathrooms, "to be restructured" maintenance conditions, and "G" Energy Performance Certificate label) and the direct correlation between the distance from the nearest subway (W) and the selling prices was not empirically consistent. Finally, the hedonic model for the city of Turin was characterized by better outputs compared to the previous two hedonic models, as the coefficient of determination was equal to 76.52% and the Student's t-test indicated that all the variables offer a significant contribution to the explanation of the selling prices, however the multi-case strategy for EPR algorithm has allowed us to identify a combination of the factors that is characterized by a better statistical accuracy.  Notes: "***" is for <0.01, "**" is for <0.05, "*" is for <0.1; sample size = 200; residual standard error = 0.3674; R 2 = 0.58; adjusted R 2 = 0.5577; F-statistic = 25.972 on 188 degrees of freedom, p-value: <0.0001; VIF statistics < 5.

Conclusions
Following the global economic crisis triggered by the US subprime, the present research takes into account the current need for rational valuation methodologies, able to correctly interpret the available market data and provide a quick check tool of the punctual assessments performed by the direct valuers that operate in the territory. Therefore, an innovative AVM has been applied to three study samples, collected in three different Italian cities. After detecting, for each case study, the selling prices and the main influencing factors, the implemented methodology, named multi-case strategy for EPR, has generated a "unique" functional form. This is able to simultaneously describe the market relationships in the three different study samples and to generate, for each study sample, the endogenous specification of the multiplicative coefficients of the additive terms in the generalized model.
The application of the methodology has allowed us to define a model, valid at the same time for the three city samples, characterized by a good statistical performance; the specification of the multiplicative coefficients, which reflect the market dynamics of each case study, has allowed us to verify the empirical reliability of the functional relationships between the housing prices and the influencing factors selected by the algorithm of the methodology.
The ability to automatically and quickly select the optimal regressive function for different territorial contexts-in terms of statistical accuracy and understanding-is the main advantage of the implemented algorithm. The comparison with the results obtained by the application of the hedonic price method has highlighted a better statistical performance and a higher empirical reliability related to the models generated by the proposed methodology. Furthermore, the obtained models are easy to interpret and allow people to quickly check the empirical consistency of the functional relationships; in this sense, the methodology could be a further support for direct valuers, as a supplementary tool for the verification of property valuations, in order to avoid the complexity related to more sophisticated methods that, even if they can be statistically more precise, often generate "black boxes" for professional practitioners.
The main limitation of the implemented methodology concerns the sample sizes that have been possible to collect, due to the structural opacity that generally characterizes the Italian real estate market. However, the methodology and the relative application offers a further contribution in the property valuation field and in the reference literature explaining the steps and moving forward in the artificial intelligence methods implemented in the real estate sector. The definition of a single functional form, valid for more territorial contexts and able to synthesize the market relationships between the influencing factors and the selling prices, could provide support for private operators in the assessment of the territorial investment conveniences and for public entities in the decisional phases regarding future tax and urban planning policies, especially when the valuations are simultaneously required for very different locations. That is why further insights could concern the application of the methodology to a multitude of study samples-even different areas of a same city, taking into account the frequent heterogeneousness among the neighborhoods and, consequently, the different market appreciations of the influencing factors in different territorial contexts-in order to define a generalized model, with the corresponding specification of the coefficients according to the market peculiarities of the sites analyzed, which could be effectively used for social, environmental, and economic aims.
Finally, further developments of the analysis aimed at improving the obtained models can concern: the inclusion of the "supply of services" and the "green areas" in the spatial factors [130]; the database extension-whereby it is possible to collect other transactions data in order to validate the analysis outputs better; the comparison of the models obtained for each city with the outputs generated by other methodologies (e.g., a geographically weighted regression when the data are georeferenced).