Spatial Analysis of Housing Prices and Market Activity with the Geographically Weighted Regression

The main part of the study will be to demonstrate that models taking into account spatial heterogeneity (Geographically Weighted Regression and Mixed Geographically Weighted Regression) which reproduce housing market determinants better reflect market relationships than conventional regression models. The spatial heterogeneity of the housing market determinants results in the spatial diversity of the market activity, as well as of real estate prices and values. The main aim of the study was to analyse an effect of these socio-demographic and environmental factors on average housing property prices and on the number of transactions in a spatial approach. In previous research conducted on a national scale, usually all variables were treated in a similar way, i.e., as global or local variables. During the research, an attempt was also made to answer the question of which of the variables adopted for analysis have a local impact on prices and market activity, and which are global. The study was conducted in Poland and used data from the year 2018 on 380 counties (Local Administrative Units). The study showed that determinants both for average prices and for the housing market activity show spatial autocorrelation with high–high and low–low cluster groups. Owing to these models, it was possible to draw specific conclusions on local determinants of flat prices and the market activity in Poland. The study findings have confirmed that they are an extremely effective tool for spatial data analysis.


Introduction
Space, in its physical, economic, institutional, legal and social dimension, is the basic element of the processes that take place on the housing market. The factors that have an impact on the housing market display spatial heterogeneity with resulting spatial diversity of the market activity as well as of real estate prices and value. These factors include global factors, associated mainly with macroeconomic conditions, as well as local factors. It is accepted that global factors have the same impact on the market at each space point and are independent of it, while the impact of local factors is non-stationary and variable, both temporally and spatially. Because of the effects of the real estate permanence at a place, one can assume that spatial relations are of fundamental importance as part of the market mechanism on the real estate market. Proper understanding of the process of supply and demand formation on the housing market requires special attention paid to spatial information, especially given the rapid development of spatial infrastructure and GIS (Geographic Information System) tools. Therefore, identification of sources of spatial diversity of supply and demand for real estate caused by exogenous (economic, legal, social, etc.) and endogenous (i.e., related to a real estate as a physical object and its De Bruyne and Hove [15] propose taking into account the number of apartments sold in relation to the available housing stock, as well as the number of newly designed residential buildings (both private and constructed by developers).
Among the factors affecting the housing prices of flats, those related to the environment quality and pollution are also important. Ridker [17], Kim et al. [18], Saphores et al. [19] and Lin et al. [9] pointed out that low air quality causes the housing prices to decrease considerably. On the other hand, the availability of green areas can have a positive impact on the prices.
In general, one can claim that demographic, economic and environmental factors have the greatest impact on prices and market activity. Lin et al. [9], for example, used twenty local indicators, including the population age, percentage of marriages, education, unemployment, safety, air quality, etc. The household income, rent/income ratio and percentage of Asian population were the most significant variables. It is notable that the market activity, measured by the number of transactions, proved to be a de-stimulant of the average price.
The regional diversity of socio-economic and environmental characteristics as well as the different intensity and directions of social processes result in different levels of housing demand in different parts of the country. Thus, it can be observed that the development of the housing market varies considerably in time and space. The spatial and temporal dynamics concern different levels of the spatial hierarchy, with economic and demographic processes relating to housing changes at these hierarchical levels in different ways [20,21]. For example, Reichert [13] claims that the prices at the national level are affected to the greatest extent by mortgage interest rates, while at the regional level-by population migration, employment rate and household income.
The groups of factors mentioned above (demographic, socio-economic and environmental) can be quantified in regional and local terms, depending on the spatial resolution of statistical data. It would certainly be a great simplification to assume that the impact of the above-mentioned factors is equal throughout the country [21][22][23]. De Bruyne and Hove [15] point mainly to local factors, such as differences in income levels, demographic effects, government policies and the quality of life. They also point out that the relative location of individual areas is very important for the development of the housing market. In particular, housing prices are influenced by the distance and time of access to economic centres, which offer employment and an extensive network of services [24]. Furthermore, investments in the transport network, including roads, motorways and public transport systems, affecting travel time and distance, form the basis for the decision-making process of individuals and households, which choose the location of their future home.
Econometric modelling of real estate prices using socioeconomic and environmental factors has a relatively long tradition and has been often described in the literature. An extensive review of statistical models describing the relationships between housing prices and factors influencing them both at the national, regional and local level is presented by Gasparenie et al. [25], who mention both advantages and disadvantages of the models as well as their structural elements. It should be noted, however, that most of the models developed so far do not take into account spatial relationships, either as a geographical reference of the variables adopted or as a structural model. Spatial effects taken into account in price and market activity models, especially in regional terms, may concern both spatial autocorrelation and spatial heterogeneity. Spatial autocorrelation is included in spatial autoregressive models (SAR) as well as spatial panel models [26][27][28], while spatial heterogeneity can be presented with geographically weighted regression models. The occurrence of spatial autocorrelation may also form the basis for the application of the eigenvector spatial filtering (ESF) approach, which is a certain alternative to SAR models. In its basic format, the eigenvector spatial filtering method is an approach that captures spatial dependence applying map pattern variables obtained from spatial connectivity information using the Moran coefficient [29,30]. Spatial filtering addresses spatial autocorrelation from a quasi-semi-parametric point of view. Apart from the observed covariates, also known as the systematic component, spatial filtering techniques generate synthetic explanatory variables representing the dataset's spatial structure. More flexibility is added to the model by bringing these synthetic variables (considered the model's non-parametric component [31]) into the systematic part of a model. This approach produces unbiased parameter estimates, reduces spatial misspecification error, increases model fit, increases the normality of model residuals and can increase the homoscedascity of model residuals' spatial dependence and spatial spill-over effect [32]. Although in many respects the ESF approach seems to be more advantageous than GWR modelling, the interpretation of the GWR model is certainly more intuitive and, what is very important, allows to determine whether the analysed dependencies are local or global. Griffith [33] also stated that there is an indirect relationship between GWR and spatial filtering via interaction terms. GWR can be seen as a special case of indirect spatial filtering. In other words, spatial filtering should be able to address apparent heterogeneity in behaviours by interacting eigenvectors (synthetic variables) and systematic covariates.
Geographically Weighted Regression (GWR) is widely used in the real estate market primarily in local-scale research (e.g., References [34][35][36][37][38][39]). The GWR model is used slightly less frequently for real estate market research at a regional or national level [40]. Spatial-temporal GWR models play an increasingly important role in the study of spatial diversity determinants of the real estate market, which assume not only about spatial heterogeneity but also temporal heterogeneity [41][42][43]. Basic GWR models assume that the influence of explanatory variables may differ at each point of the analysed space. It has been shown earlier that some of the variables may be global and some may be local. This assumption was the basis for creating Mixed Geographically Weighted Regression (MGWR) models. These models are used increasingly often for both local and regional research [40,41,43,44]. The results of the studies presented so far in the literature indicate that the results obtained with MGWR models may be much better, i.e., in fulfilling the statistical requirements, than the basic GWR models. It is therefore advisable to use mixed GWR models to analyse housing prices and housing market activity.

Geographically Weighted Regression (GWR)
Geographically Weighted Regression (GWR) originates from traditional regression methods that model the relationship between the response variable and the explanatory variables. In the classic linear regression model, the parameters estimation is usually done by the Ordinary Least Squares (OLS) method, whereas the significance tests are performed with the F statistics and t statistics [45]. Classical regression models do not directly take into account spatial interactions and assume that the process of price formation in geographical space is constant. Therefore, the significance of parameters does not depend on the spatial structure of the phenomenon under study, which may lead to a wrong interpretation of the results [46]. The GWR model is an extension of the classical linear regression model obtained by taking into account spatial relationships in the form of assigning weights to individual observations depending on the location. It is derived from non-parametric regression [42], and its essence lies in the construction of local linear regressions at each point where measurement data exists. The GWR model can be formulated in the following way [46]: where (u i , v i ) describes the location expressed with coordinates u i and v i . Estimation of the GWR model parameters is performed in a similar manner as in the classic models, but location-dependent weights of observations are taken into account: where W(u i , v i ) is a diagonal matrix of weights, which are the function of the distance between the location given by coordinates (u i , v i ) and the location of each point at which an observation was made.
Functions with a shape similar to that of the Gauss curve are usually used to determine the weights, e.g., such as the bi-square kernel function taking into account the parameter bandwidth [46]. Parameter bandwidth describes the spatial range from which observations will be taken for the calculation. The larger the bandwidth, the closer the GWR results to the global multiple regression model. In practice, a non-Euclidean distance metric is also used to determine the bandwidth parameter [35]. Applying the GWR model yields a number of surfaces defined by the estimated parameters. The diversity of these parameters in space indicates local variability of the impact of response variables on the explanatory variable, and thus on the spatial heterogeneity of the phenomenon under study [46].
The fit of a model and the data is assessed with a hat matrix S, which, when multiplied by empirical values of the response variable, yields the theoretical values [47,48]: The trace of matrix S (sum of elements on the main diagonal) in the global model is also the number of parameters. The effective number of parameters is determined as: It depends on the number of explanatory variables and the bandwidth and it is not usually an integer. To assess the goodness of the model fit, the adjusted Akaike Information Criterion (AIC) is usually applied [49,50], especially when models with different numbers of explanatory variables are compared [48]. This criterion is applied not only to compare models but also to determine the optimum bandwidth.
The form of the test statistics to test the null hypothesis indicating that there is no significant difference between the global regression model and GWR, is given, among others, by Leung et al. [51].
The following zero hypothesis is put forward when testing the significance of the model's local parameters: H 0 : β k (u i , v i ) = 0 for each k = 0, 1, 2, . . . , p and i = 1, 2, . . . , n The test statistic has the following form: and while c kk is the k-th diagonal element of the matrix CC T where: The critical value t for the statistic is determined for the number of the degrees of freedom of d f 1 = δ 2 1 /δ 2 . However, researchers have realized that GWR and estimation of the Ordinary Least Squares method has some limitations, such as correlated model coefficients across study areas, strong influence of outliers and weak data problem [52,53]. Hence, the proposed solution may also be the Bayesian approach, which most eliminates these imperfections [52]. The use of a classical approach in this work results, however, from well-established theoretical foundations of the used method and the transparency of the interpretation of the results.

Mixed Geographically Weighted Regression (MGWR)
The degree of variability of local GWR coefficients may vary in an area covered by the study. Some of them can be seen as permanent (i.e., global, stationary), while others can be seen as local (non-stationary). A MGWR model can then be defined [46], which can be expressed as follows [54,55]: where y is a vector of the response (dependent) variable, X a is a matrix of global variables and "a" is a vector of global coefficients, X b is a matrix of local variables and "b" is a matrix of local coefficients.
Estimation of the mixed model parameters is performed in a traditional manner [54], assuming that: The model fitting procedure can be described in six steps [54]: Step 1. Supply an initial value forŷ a , sayŷ a , using OLS (ordinary least squares) Step 2. Set i = 1 Step 3. Setŷ Step 4. Setŷ Step 5. Set i = i + 1 Step 6. Return to Step 3,unlessŷ A slightly different method of the MGWR model estimation is presented by Fotheringham et al. [46], who apply the method proposed by Speckman [56]. Furthermore, Wei and Qi [57] propose a constrained two-step estimation by transforming the MGWR to GWR and performing an estimation by the Lagrange Multiplier procedure.
It may be a problem in the MGWR model to determine which of the explanatory variables are global and which are local. Fotheringham et al. [46] adopt a step-by-step procedure for this purpose, where all possible combinations of global and local variables are tested, while the optimal mixed model is selected based on minimizing AIC values. It is a comprehensive, but also computationally expensive approach, hence there is an alternative Monte Carlo approach to testing significant (spatial) variability of each regression coefficient from the basic GWR [46]. This approach determines the variability in each local regression coefficient for the basic GWR model and compares it with the variability determined from a series of randomised datasets. If the true variance of the coefficient does not lie in the top 5% tail of the ranked results, then the null hypothesis (i.e., the relationship between dependent and independent variable is constant) can be accepted at the 95% level, and the corresponding relationship should be globally fixed when specifying the mixed GWR.

General Data Characteristics
A study which used the GWR model and the MGWR model to analyse price determinants and market activity was conducted based on data concerning the housing property market in Poland.
On the housing market in Poland in 2018, there was a slight improvement in the indicators of the economic and housing situation of households, compared to previous years. The number of inhabitants of the largest cities increased, but their depopulation was observed in smaller centres, partly as a result of migration. This is in line with the observed global trends, where development is concentrated in major cities. An increase in demand and a slightly smaller increase in supply were also observed. The relatively high demand for flats was a consequence of a significant increase in household wages and the maintenance of low nominal interest rates. Both prices and the number of transactions on local markets showed an upward trend of 5-10% annually.
The area of the country, as in many European countries, is economically, socially and even culturally diverse [58]. Local disparities are often associated with a division into the eastern and western parts of Poland. Therefore, it can be expected that the factors determining prices and market activity are diverse in the geographical space. Counties were taken as a statistical unit corresponding to the space division, in accordance with the fourth level of the nomenclature of territorial units for statistics (NUTS) introduced in the EU countries by Regulation No. 1059/2003 of the European Parliament and of the Council of 26 May 2003. An analysis of an impact of spatial index variability on the real estate prices and the market activity was conducted using available transaction and price data for 380 counties in Poland, shared by GUS (Central Statistical Office in Poland) and Local Data Bank. Table 1 presents the variables adopted for analysis as a selected set of indices, chosen to represent socio-demographic, economic and environmental conditions. Variables were selected primarily on the basis of literature and previous research. Although the factors taken into account are represented in the national statistics by a much larger number of indices, limiting their number minimises the risk of collinearity (i.e., their correlation). Migration index persons/1000 population X 5 Average monthly gross remuneration PLN/month X 6 Registered unemployment rate % X 7 Entities registered in the business entities register number/1000 population X 8 Emission of particulate pollutants PM10 (a mixture of airborne particles with a diameter of not more than 10 µm) t/km 2 X 9 Average floor area of a housing unit m 2 X 10 New housing units completed units/1000 population Variables were selected primarily on the basis of literature and previous research. It was assumed that these variables should reflect socio-demographic, economic and environmental conditions. The importance of variables X 1 -X 3 is emphasised by many authors, among others, Engelhardt et al. [7] and Essafi and Simon [8]. The adoption of the variable X 4 results, among others, from research presented by Magnusson and Turner [10]. The selection of X 5 -X 7 variables was influenced by research conducted, among others, by Reichert [13] and Gallin [11]. The adoption of variable X 8 is justified by the results of studies carried out, among others, by Kim et al. [18] as well as Saphores et al. [19]. In addition, variables characterizing existing real estate resources (variables X 9 and X 10 ) were used. The significance of these factors is also emphasised in the literature concerning the area of Poland [59][60][61][62]. Table 2 shows the main descriptive statistics of the variables taken for analysis. The greatest variability was observed for variables X 4 and X 8 , which denote the migration index and emission of particulate pollutants, respectively. The lowest variability is observed for variable X 3 , which denotes a percentage of the working-age population. Figure 1 shows a distribution of average prices and the number of transactions in individual statistical units (counties The greatest variability was observed for variables X4 and X8, which denote the migration index and emission of particulate pollutants, respectively. The lowest variability is observed for variable X3, which denotes a percentage of the working-age population. The above choropleth maps clearly show the disproportions between metropolitan areas, larger urban centres and areas attractive for tourism, and counties of low investment and tourism attractiveness.

Results and Discussion
It was assumed in the course of the study that spatial relationships, especially heterogeneity and spatial autocorrelation, may play a key role in explaining the role of socio-demographic, economic and environmental factors affecting housing prices. Since this may also apply to market activity measured by the number of transactions, the research was conducted in several stages. In the first stage, classic OLS (Ordinary Least Squares) models were built as a reference point for evaluation of models which use spatial relationships. Subsequently, an analysis of spatial autocorrelation of both response and explanatory variables was performed. In the next step, geographically weighted regression models were developed, and diagnostics of these models were compared with classic models. This made it possible to draw conclusions about the validity and possible advantage of using GWR models in explaining the phenomenon under study. In the course of the work, the R environment and GWR4 and ArcGIS software were used to calculate and visualise the results. The above choropleth maps clearly show the disproportions between metropolitan areas, larger urban centres and areas attractive for tourism, and counties of low investment and tourism attractiveness.

Results and Discussion
It was assumed in the course of the study that spatial relationships, especially heterogeneity and spatial autocorrelation, may play a key role in explaining the role of socio-demographic, economic and environmental factors affecting housing prices. Since this may also apply to market activity measured by the number of transactions, the research was conducted in several stages. In the first stage, classic OLS (Ordinary Least Squares) models were built as a reference point for evaluation of models which use spatial relationships. Subsequently, an analysis of spatial autocorrelation of both response and explanatory variables was performed. In the next step, geographically weighted regression models were developed, and diagnostics of these models were compared with classic models. This made it possible to draw conclusions about the validity and possible advantage of using GWR models in explaining the phenomenon under study. In the course of the work, the R environment and GWR4 and ArcGIS software were used to calculate and visualise the results. Parameters of classic OLS models with response variables Y 1 and Y 2 are shown in Table 3. The majority of variables (eight) in the OLS1 model, in which the average price of 1 m 2 of a flat was a response variable, proved to be statistically significant at the level of significance under 0.001. The level of significance for the registered unemployment rate and pollution emissions (variables X 6 and X 8 ) was higher than 0.05, which might mean that their impact on average prices was not as obvious as it might seem to be. In the OLS2 model, with a relative number of transactions (per 1000 existing flats) as the response variable, only six variables proved to be significant, although the determination coefficient indicates that its fit to the data is slightly better. The high level of the p-value of the parameters next to the selected variables may be a premise indicating that they do not have a significant impact on the explained variable, but this obviously concerns the global relationship. This does not exclude the situation that these variables may turn out to be locally significant.
The spatial data autocorrelation was examined with the use of Moran's global and local statistics (Moran I), expressed with the following formulas [63]: Moran's I shows whether an agglomeration effect exists. Positive autocorrelation means that there are clusters of similar values (high or low), whereas negative values of Moran's I are interpreted as hot spots, i.e., isles of definitely different values (high or low). The principles of testing the significance of Moran's statistics are presented by Goodchild [64]. The following is a set of choropleth maps showing the spatial distribution of units in which the local Moron's I proved to be statistically significant ( Figure 2). The figure also contains information on the global statistics values.  In the next step, the estimation of the GWR model parameters was performed, in which it was assumed that all parameters would be local. Variables shown in Table 1 were used in the models. The bi-square kernel function in accordance with Formula (4) was used for the estimation. The kernel function range (bandwidth) was determined based on the AIC criterion minimisation. The results are characterised in Table 4.
The greatest differences between parameters in the GWR1 model were observed for variable X8 (emission of PM10a mixture of airborne particles with a diameter of not more than 10 μm particulates), whereas the smallest were observed for the parameter at variable X7 (entities in the business entities register). The greatest differences in the GWR2 model were also observed in variable X8, whereas the smallest were observed in the parameter at variable X10 (new housing units completed). Parameter differentiation can be a premise for the determination of which factors are global and which are local.  All the explanatory and response variables show a positive global spatial autocorrelation. The significance test of global Moran's I shows that global spatial autocorrelation is not significant only for variable X 8 . An analysis of local Moran's statistics shows that in each case, there are clusters of areas with a similar variables level in the administrative space. For average prices (variable Y 1 ), groups of high-high clusters occur mainly around Warsaw and Gdańsk. For variable Y 2 , which describes the market activity, the south-east of Poland is dominated by low-low clusters.
The above analysis results indicate the influence of spatial relations between administrative units of different intensity on the variables taken for analysis. Therefore, it is justifiable to include information on relations between counties in the econometric models describing variables characterising the housing market in Poland.
In the next step, the estimation of the GWR model parameters was performed, in which it was assumed that all parameters would be local. Variables shown in Table 1 were used in the models. The bi-square kernel function in accordance with Formula (4) was used for the estimation. The kernel function range (bandwidth) was determined based on the AIC criterion minimisation. The results are characterised in Table 4. The greatest differences between parameters in the GWR1 model were observed for variable X 8 (emission of PM10-a mixture of airborne particles with a diameter of not more than 10 µm particulates), whereas the smallest were observed for the parameter at variable X 7 (entities in the business entities register). The greatest differences in the GWR2 model were also observed in variable X 8 , whereas the smallest were observed in the parameter at variable X 10 (new housing units completed). Parameter differentiation can be a premise for the determination of which factors are global and which are local.
According to the assumption made during the study and to the study results, local GWR coefficients have different degrees of variability in the study area. Coefficients, characterised by low variability, can be seen as stationary, i.e., global in nature. A preliminary assessment of the global or local character of the variables may be made by assessing the overall characteristics of the parameters of the models in Table 4. However, in this case, it is difficult to establish a clear criterion of variability. In order to determine which coefficients are global and which are local, the Monte Carlo approach can be used to test the significant variability of each regression factor from the basic GWR model [46]. However, research has shown that this solution may yield unstable results (the simulation results obtained each time may vary slightly). Therefore, the diff-criterion (differences in the indicator between the models) was used to test the variability of regression coefficients. The diff-criterion value shows the difference between the original and switched GWR models (MGWR) by comparing the selected model indicators (in this study, the AICc (corrected Akaike criterion) was the selected model indicator). Positive values indicate that the switched model has a better fit and the evaluated variable is stationary. The test results are presented in Table 5. It was assumed based on the GWR model parameter variability test that X 1 , X 4 and X 6 are global variables in the MGWR1 model, with the response variable Y 1 , and X 1 , X 2 and X 4 are global variables in the MGWR2 model, with the response variable Y 2 . As a result, MGWR models were built with variables regarded as global. Table 6 shows the results of the MGWR1 model parameter estimation. Among the global explanatory variables, only X 6 (unemployment rate) explains the unit prices significantly. As expected, this variable is a de-stimulant. Interestingly, the parameter at this variable in the OLS1 model proved to be insignificant. A slightly smaller parameter span was observed in the MGWR1 model than in the GWR1 model. Figure 3 shows the spatial distribution of the coefficients at local variables in the MGWR1 model. To make possible a comparative assessment of individual variables' impact on average unit prices, this impact was presented with the results of MGWR1 model estimation for standardised explanatory variables. As a result of standardisation, the impact of individual variables is comparable. in the OLS1 model proved to be insignificant. A slightly smaller parameter span was observed in the MGWR1 model than in the GWR1 model.  Figure 3 shows the spatial distribution of the coefficients at local variables in the MGWR1 model. To make possible a comparative assessment of individual variables' impact on average unit prices, this impact was presented with the results of MGWR1 model estimation for standardised explanatory variables. As a result of standardisation, the impact of individual variables is comparable.  The spatial distribution of the parameters at local variables in the MGWR1 model shows that variables X2, X3, X8 and X9 are de-stimulants in most of the area, whereas the other variables have mostly a positive effect on average prices. The largest differences, especially in the north and west of Poland, can be observed for variable X8 (emission of particulate pollutants PM10). The largest positive impact on average prices, practically throughout the country, is exerted by variable X7 (entities registered in the business register). The largest negative impact on prices was observed for variable X3 (percentage of working-age population in the total population), which applies to the central and northern part of Poland. Figure 4 shows a visualization of the spatial distribution of statistics t (as a quotient of the estimated parameter and its standard error). A high value of t indicates that the relation in a given area is significant. Considering the effective number of degrees of freedom, the significance level equal to 0.05 will correspond to the value |t| of about 1.96.  The spatial distribution of the parameters at local variables in the MGWR1 model shows that variables X 2 , X 3 , X 8 and X 9 are de-stimulants in most of the area, whereas the other variables have mostly a positive effect on average prices. The largest differences, especially in the north and west of Poland, can be observed for variable X 8 (emission of particulate pollutants PM10). The largest positive impact on average prices, practically throughout the country, is exerted by variable X 7 (entities registered in the business register). The largest negative impact on prices was observed for variable X 3 (percentage of working-age population in the total population), which applies to the central and northern part of Poland. Figure 4 shows a visualization of the spatial distribution of statistics t (as a quotient of the estimated parameter and its standard error). A high value of t indicates that the relation in a given area is significant. Considering the effective number of degrees of freedom, the significance level equal to 0.05 will correspond to the value |t| of about 1.96.
ISPRS Int. J. Geo-Inf. 2020, 9,   The analysis shows that certain areas can be identified where statistically significant relationships between a given variable and the average unit price are observed. For example, the relationship between the average number of births (variable X2) and the average price is the most significant in the north-west (the Szczecin-Gdańsk belt) and the south-east of the country. Table 7 shows the results of the MGWR2 model estimation for response variable Y2, which denotes the housing market activity. Among the explanatory variables of a global nature, a statistically significant impact on market activity was observed for variables X1 (population density) and X2 (number of births). It is notable that the parameter at variable X1 in the OLS2 model reached a much lower value. Figure 5 shows the spatial distribution of the coefficients at local variables in the MGWR2 model. This impact was presented with the results of the model estimation for standardised explanatory variables.
The greatest negative impact on the market activity was observed for variable X9 (average floor area of a housing unit) and the greatest positive impact was observed for variable X10. The impact of variable X10 is natural and expected. For variable X9, this relationship means that higher average  The analysis shows that certain areas can be identified where statistically significant relationships between a given variable and the average unit price are observed. For example, the relationship between the average number of births (variable X 2 ) and the average price is the most significant in the north-west (the Szczecin-Gdańsk belt) and the south-east of the country. Table 7 shows the results of the MGWR2 model estimation for response variable Y 2 , which denotes the housing market activity. Among the explanatory variables of a global nature, a statistically significant impact on market activity was observed for variables X 1 (population density) and X 2 (number of births). It is notable that the parameter at variable X 1 in the OLS2 model reached a much lower value. Figure 5 shows the spatial distribution of the coefficients at local variables in the MGWR2 model. This impact was presented with the results of the model estimation for standardised explanatory variables. salary) on the number of transactions in the majority of the country is positive, while in the northeastern part of the country, this variable has a negative impact on the market activity. An interpretation of the impact strength should be accompanied by an assessment of the significance of the tested relationship. Therefore, the spatial distribution of statistic t for coefficients at local variables in the MGWR2 model is presented in Figure 6.  Given the effective number of degrees of freedom, as in the MGWR1 model, a significance level of 0.05 will correspond to a value of |t| of about 1.96. For variables X3, X6 and X8, the relationships expressed by the MGWR2 model can hardly be regarded as significant in most of the country area. For the other variables, distinct areas are a characteristic feature of the presented spatial distribution, where the relationships described by the model are statistically significant.
A local determination coefficient R 2 can testify to the local fit of individual models. Figure 7 shows coefficient R 2 for models GWR1, MGWR1, GWR2 and MGWR2. The greatest negative impact on the market activity was observed for variable X 9 (average floor area of a housing unit) and the greatest positive impact was observed for variable X 10 . The impact of variable X 10 is natural and expected. For variable X 9 , this relationship means that higher average prices correspond to smaller areas of housing units. For the other variables, an interesting phenomenon was observed, in which a given factor cannot be identified definitely as a stimulant or a de-stimulant.
For example, an analysis shows that the impact of variable X 5 (average monthly salary) on the number of transactions in the majority of the country is positive, while in the north-eastern part of the country, this variable has a negative impact on the market activity. An interpretation of the impact strength should be accompanied by an assessment of the significance of the tested relationship. Therefore, the spatial distribution of statistic t for coefficients at local variables in the MGWR2 model is presented in Figure 6. salary) on the number of transactions in the majority of the country is positive, while in the northeastern part of the country, this variable has a negative impact on the market activity. An interpretation of the impact strength should be accompanied by an assessment of the significance of the tested relationship. Therefore, the spatial distribution of statistic t for coefficients at local variables in the MGWR2 model is presented in Figure 6.   Given the effective number of degrees of freedom, as in the MGWR1 model, a significance level of 0.05 will correspond to a value of |t| of about 1.96. For variables X 3 , X 6 and X 8 , the relationships expressed by the MGWR2 model can hardly be regarded as significant in most of the country area. For the other variables, distinct areas are a characteristic feature of the presented spatial distribution, where the relationships described by the model are statistically significant.
A local determination coefficient R 2 can testify to the local fit of individual models. Figure 7 shows coefficient R 2 for models GWR1, MGWR1, GWR2 and MGWR2. The GWR1 model, in which the average unit price, Y1, was the response variable, the local coefficient of determination indicates the best fit in the northern part of the country and in the Warsaw area. The lowest values were noted in the central part (near Łódź) and the south-eastern part. In the case of the GWR2 model, the best fit was found in the south-eastern part of Poland. MGWR models are characterised by a similar distribution of the local coefficient of determination. Table 8 presents the basic diagnostic statistics of the models, allowing for comparison of the models with each other and assessment of model fit to the empirical data used. Both the information criteria and determination coefficients clearly indicate that geographically weighted regression models better reflect the relationships under study. For MGWR models, a slight improvement can be observed compared to the basic GWR models. During the research, it was confirmed that the method used gives better results than a classic approach, especially since it allows for obtaining additional information about spatial relationships. This may indicate that it is justifiable  The GWR1 model, in which the average unit price, Y 1 , was the response variable, the local coefficient of determination indicates the best fit in the northern part of the country and in the Warsaw area. The lowest values were noted in the central part (near Łódź) and the south-eastern part. In the case of the GWR2 model, the best fit was found in the south-eastern part of Poland. MGWR models are characterised by a similar distribution of the local coefficient of determination. Table 8 presents the basic diagnostic statistics of the models, allowing for comparison of the models with each other and assessment of model fit to the empirical data used. Both the information criteria and determination coefficients clearly indicate that geographically weighted regression models better reflect the relationships under study. For MGWR models, a slight improvement can be observed compared to the basic GWR models. During the research, it was confirmed that the method used gives better results than a classic approach, especially since it allows for obtaining additional information about spatial relationships. This may indicate that it is justifiable to use mixed geographically weighted regression models (MGWRs) to analyse spatial relationships in socio-economic studies.

Summary and Conclusions
This study attempted to assess the relationship between both socio-demographic and economic and environmental factors compared with average prices and activity of the housing market in 2018 in spatial terms. Both classic OLS regression models, as well as Geographically Weighted Regression models (GWR and MGWR) supplemented by the analysis of spatial self-correlation, were used in the study. The vast majority of variables adopted as determinants proved to be statistically significant both in terms of impact on prices and on market activity, although the role of some factors (e.g., registered unemployment rate and particulate emissions) is not as obvious as it might seem. The study findings show that the determinants of average prices and market activity are spatially differentiated, which is a consequence of economic as well as cultural or historical differences. This is indicated, among others, by an analysis of spatial autocorrelation and the concentration of high-high and low-low clusters. Like most European countries, Poland is not a homogeneous country, therefore socio-economic analyses carried out at the local level may differ significantly from those carried out at the national level.
The study has shown that GWR models are an extremely effective tool for analysing spatial data. The application of these models has shown that the impact of the analysed price determinants is spatially differentiated, and their greatest significance measured by the local coefficient of determination can be observed mainly in the Mazowieckie Voivodeship (near Warsaw) and Pomorskie Voivodeship (near Gdańsk). In the case of market activity determinants, their greatest significance was observed mainly in the south-eastern part of the country.
Studies show that treating all variables as local can be a simplification, as is the case with the global OLS model. Therefore, the MGWR model was used, in which, using the Diff criterion, variables were selected whose impact can be treated as global. For the impact on average housing prices, those included: the number of births, the migration rate and the registered unemployment rate. For the number of transactions, the global variables were the population density, the number of births and the migration rate.
Both the GWR and the MGWR models had a much better fit to empirical data than the global model. This is evidenced by both determination coefficients and information criteria based on the likelihood function (AIC-Akaike criterion, BIC-Bayesian information criterion). These models provided grounds for specific conclusions concerning local price determinants and market activity. It should be stressed, however, that although they met most of the expectations, they did not show a sufficiently good fit for all spatial units. The GWR and MGWR models used, unfortunately, also have some limitations that can be at least partially eliminated using, e.g., robust estimation methods or a Bayesian approach.
Comparison of analysis results with the results of previous research on the real estate market in Poland [59,61,62] confirms that the conditions, especially socio-demographic and economic, play a key role in shaping price formation processes and have a great impact on the activity of the housing market. It should be noted, however, that publications on the Polish housing market primarily use global models that do not allow a spatial approach to market processes across the country. Most studies using spatial models focus on local markets, hence the conclusions derived from them are also of a local nature.
The results indicate that the problem of including space in socio-economic research is very broad and the use of GWR and MGWR models can only be a starting point for further analyses, which should also take into account the dynamics of real estate market changes over time. The inclusion of space and time horizons could be an effective tool for the assessment, forecasting and simulation of real estate developments at both global and local levels.
The use of mixed MGWR models is a significant extension of the possibilities of spatial analysis in socio-economic geography, where regionally operated variables are usually used that show spatial heterogeneity. The presented methodology may, above all, facilitate the understanding of value-forming processes on the market and at the same time, its application is not limited to the diagnosis of the existing state. It can also be successfully used to predict and simulate phenomena in the housing market.