House Price Forecasting from Investment Perspectives

: Housing market dynamics have primarily shifted from consumption- to investment-driven in many countries, including Australia. Building on investment theory, we investigated market dynamics by placing investment demand at the center using the error correction model (ECM). We found that house prices, rents, and interest rates are cointegrated in the long run under the present value investment framework. Other economic factors such as population growth, unemployment, migration, construction activities, and bank lending were also important determinants of the housing market dynamics. Our forecasting results show that the Sydney housing market will continue to grow with no significant price decline in the foreseeable future.


Introduction
A prosperous and healthy housing market is critical for the modern economy. Yet, there are crucial research gaps, and, therefore, there is a need to better understand housing market dynamics. One of the most critical observations we made in past decades is that housing market dynamics have primarily shifted from consumption-to investmentdriven in many countries, including Australia. Investment activities (including speculation on capital gains) have been the center of housing demand. More importantly, this investment-driven housing demand is believed to continue in the foreseeable future. With firm guidance from investment theory, this paper examines housing market dynamics through a lens that places investment demand at the center of housing market dynamics using an augmented error correction model (ECM). Since the proposed ECM model contains information set on both current and past values of house prices, the results are useful in a one-step-ahead price forecast which is important in real estate markets.
The ECM method has been widely applied to house price forecasting in many countries such as the US [1], Netherlands [2], Euro areas [3], New Zealand [4], and Australia [5]. As pointed out by Wheaton, Chervachidze, and Nechayev [6], the method can handle both the stationarity and endogeneity problems that plague the time-series data for local housing price analysis. However, the empirical application of an ECM requires researchers to select possible "determinants" of the local housing market and distinguish their long-or short-term relationship in the market. Traditional reduced-form studies of housing market supply and demand struggle to provide firm guidance from theory on the selection of housing determinants, which means that the ECM specifications are often ad hoc in practice. The novelty of our approach is to specify an investment ECM model based on the long-run relationship among house prices, rents, and interest rates and a variety of short-term impacts from other economic variables. To the best of our knowledge, this is future. The chance for a significant downward market price adjustment is less likely. Note that our price forecasting was based on the data up to 2018, i.e., the best available information at the time of writing this paper. Although we cannot forecast future events like a pandemic, their economic impacts on the housing market are easy to incorporate in our models via the changes of interest rates and other economic variables in the specified dynamic system.
In the next section, we review the theoretical base for the modeling. We then outline the econometric tools used, describe the data, discuss the results, and provide the conclusions.

Theoretical Base for Modeling
The theoretical basis follows the present-value model, which is commonly used to estimate the value of stocks in the share market to estimate the values for housing. Under the proposed present-value theory, housing is valued as a financial asset where price relates to the expected future cash flows discounted to the present by using an expected discount rate. For owner-occupied housing, there is an implicit rent (called a user cost or economic rent) paid by the homeowners; for investment properties, there is market rent paid by the renters. Thus, housing price P at time t can be written as follows: where Dt is the dividend or cash flow at time t and R is the discount rate. E denotes the expected value.
In the finance literature, the first term is often called the fundamental value, and the second term is the price bubble. When n is sufficiently large, the second term converges to zero. The model implies that the current asset price is simply the sum of all expected present values of future cash flows, discounted at a constant rate. The model has been widely applied in finance to value shares.
In the property market, a simplified version of the above present model, called a capitalization approach, is used to value income-producing properties. However, while the assumption of a constant expected income D and discount rate R is analytically convenient, they contradict the evidence that both the expected income and the investor's expected rate of return vary over time. Campbell and Shiller [19,20] suggested a log-linear present-value model with time-varying expected returns, where a log asset price at time t is written as follows: When the time horizon n increases to infinity, the third term, which is the discounted expected value of the asset price, shrinks to zero. Accordingly, the current asset price can be presented as follows: This equation can be rewritten in terms of the log dividend-price ratio, which is: The above equation (4) is referred to by Campbell and Shiller [19,20] as the dividendratio model. This equation implies that the log dividend-price ratio should be stationary, provided the changes in log dividends and the expected stock return are stationary. Where the log dividend-price ratio is nonstationary, it is very likely that the expected stock return is nonstationary (highly persistent), even when the above present-value model is valid.

Econometric Tools for Model Development
The Augmented Dickey-Fuller (ADF) unit root test is applied to test for the stationarity of time-series variables. As expected, they are integrated order one I(1) processes. 1 Because house prices, rents, and interest rates are I(1) variables, they are further tested for cointegration under the proposed present-value framework.
Cointegration tests are widely used in time-series econometrics. In empirical tests, cointegration and unit root tests between stock prices and dividends give mixed findings depending on the time period studied. Using the annual US stock market data from 1871 to 1986, Campbell and Shiller [21] found that stock prices and dividends are not cointegrated. The deviation between prices and dividends is quite persistent. On the other hand, Diba and Grossman [22] indicated a possible cointegration relationship between stock prices and dividends for the US stock market. In terms of the housing market, Gallin [23] found the log rent-price ratio is stationary by using aggregated quarterly data for the US housing market. When Brooks, Katsaris, McGough, and Tsolacos [24] examined the monthly prices of UK equity-traded property stocks from 1986 to 1998, they found that prices and rents are not cointegrated over the sample period.
The error correction model (ECM) is the next application step in this study under the proposed present-value framework. The ECM model is based on the assumptions that house prices, rents, and interest rates are cointegrated and that house prices are affected by both long-run cointegration and short-run dynamics among prices, rents, and interest rates. Since house prices are affected by other factors and tend to be seasonal, key microand macro-economic variables and seasonal quarterly dummy variables are included as external control variables in the ECM model. The identified key economic variables include population and immigration growth, building costs, building supply, the amount of mortgage lending, and housing price growth in Australian capital cities. They are selected for the proposed ECM model according to their statistical significance in the modeling. To be specific, the Johansen cointegration test and the ECM model were written as follows: where x, y, and z are log price, rent, and interest rate in the first differences, α0 is a constant, μ is the error correction term, E is a vector of key micro and macroeconomic variables such as population, employment, immigration, bank lending, construction activities, etc. (see Table 1 for exact variables used in Equation (5)), S is a vector for seasonal dummies, and ε is white noise.  For simplicity, we excluded any potential structure breaks or regime shifts in the above models. This is justified as our main objective is forecasting rather than econometric modeling. The literature showed that forecasts are not substantially affected by the presence of structural breaks [25]. Elliott and Muller [26,27] further showed that gains from modeling a structural break might be offset by imprecisely estimated break dates and post-break parameters. Therefore, ignoring a break may lead to more accurate forecasts [28].

Variable Selection and Description
This analysis focused on the Greater Sydney Area, which covers 35 LGAs as defined by the Australian Bureau of Statistics (ABS). 2 The sample period spanned from Q3 2004 to Q4 2018. The reason the analysis started in 2004 is mainly due to data availability. For example, there is no reliable rental time-series data prior to 2004. All time-series data are in quarterly frequencies and collected from various data sources. In total, 50 variables were considered in the proposed ECM model, covering a broad range of national, regional, and local economies. Some key variables and data sources are listed below and detailed in Table 1. 3 (1) Australian Bureau of Statistics (ABS). This includes, for example, time-series data on regional population and immigration growth, nationwide bank mortgage lending, building construction cost, and price changes in other capital cities (2) Securities Industry Research Center of Asia-Pacific (SIRCA). This includes, for example, time-series data on median prices and rents for houses and units in the Greater Sydney Area. We assume that the mix and quality of quarterly residential property sales for houses and units (i.e., the number of bedrooms, bathrooms, car parks, lot size, land tenure, a mix of high-and low-value properties, and transaction types, etc.) are relatively stable over time. The impact of forced sales on reported median price indices as discussed in Renigier-Biłozor, Walacik, Źróbek, and d'Amato [29] is small in this study. (3) DataStream. This includes, for example, market data on Australia's interest rates and government bond yields (4) NSW Planning & Environment. This includes, for example, time-series data related to local housing and land supply. All the variables and forecasting results are expressed here in nominal terms, i.e., they are the actual value at the time of the analysis without adjusting for inflation. Depending on the purpose of their research, economists choose to use either real or nominal terms in their analyses. However, there is no consensus in the literature on what inflation indices should be used to deflate the variables. To avoid introducing unnecessary measurement errors in deflating variables for the analysis and later converting prices back to nominal values for the forecasting, nominal values are adopted in this market dynamic analysis. Table 2 shows the ECM modeling results for the Greater Sydney Area between 2004 and 2018. Column (1) contains the results for houses, and column (2) is for units. The adjusted R-squared is 0.843 for houses, which means that a variation of 84.3% in quarterly house price changes has been explained by the ECM house price model. For units, the adjusted R-squared is 0.812. Thus, both house and unit price changes are well modeled over the sample period.

Model Development
Under the present-value framework, asset prices, rents, and interest rates are expected to be cointegrated in the long run. The cointegration results in Table 2 show prices, rents, and interest rates are indeed cointegrated. 4 Both house and unit price changes are positively related to rent changes but negatively related to interest rate changes. For every one percent increase in rents, it increases house prices by about 0.262%, while for every one percent increase in interest rates, it decreases house prices by 0.582%. In contrast, unit prices are more sensitive to rent and less sensitive to interest rate changes. For every one percent increase in rents, it increases unit prices by 0.5%, while for every one percent increase in interest rates, it decreases unit prices by about 0.412%. One possible explanation for this is that houses are more expensive than units. Thus, house owners are more sensitive to interest rate changes than unit owners. Another explanation is that units are more likely to be held for investment purposes. Investors focus more on income instead of interest rate changes because interest costs are tax-deductible for property investment.
The long-term relationship, as indicated by the cointegration term CointEq1, is negative (−0.048 for houses and −0.046 for units) and statistically significant, which means the speed of adjustment from a short-run towards a long-run equilibrium in the Sydney housing market is about 4.6-4.8 percent each quarter. The results indicate a very slow price adjustment process in the housing market.
Short-term price dynamics also affect price changes. Table 2 shows that the current period price change is statistically significant in relation to the last period price change and the price change in other Australian capital cities (apart from Sydney). One percentage point increase in the last period price change increases the current period price change by 0.536% and 0.639% for houses and units, respectively. Although other economic factors are not statistically significant in the price model, they are important in rent and interest rate models. For example, rents are found to be negatively related to the last period's net migration figure in New South Wales and the new dwelling supply in Sydney. Interest rates, however, are found to relate positively to the last period's government bond yields, construction costs, net migration figure, and dwelling supply. Interest rates are also negatively related to Australian mortgage lending. 5 Overall, both house and unit price equations are well supported by the ECM modeling under the present-value framework. Our results show that prices are mostly driven by the short-run dynamics in the market, and the price adjustment process to its long-run equilibrium is very slow. It takes about 20 quarters for the housing market to adjust towards its long-run equilibrium. The results imply a self-fulfilling phenomenon in the Sydney housing market, which is consistent with the bubble literature on the Australian and New Zealand housing markets [5,30,31].     Table 1. Standard errors are shown in ( ). *, **, and *** denote significance levels at the 10%, 5%, and 1% level, respectively.

Out-of-Sample Testing, Underlying Variable Assumptions, and Measures of Forecasting Accuracy
To validate the proposed ECM model forecasting performance, we used an in-sample (training) data period from 2004 to 2013 to estimate the model and a pseudo-out of-sample period between 2014 and 2018 to test its forecasting performance. In other words, we used the first 10 years of data to build the model and compared its forecasted prices to the actual prices in the next five years from 2014 to 2018. The ECM models developed for the periods between 2004 and 2013 are included in Appendix A.
Since other economic variables are included as exogenous inputs in the proposed ECM model, we needed to make some assumptions about those variables in the out-ofsample testing period before forecasting prices. One approach was to take them as observable. In this case, forecasting errors between the forecasted and actual prices would have to be due to the ECM model itself rather than any estimation errors introduced from the other forecasted variables. However, this take-as-given approach is not realistic. In reality, we simply do not know those economic variables in advance. Thus, we introduced a conditional forecasting method of using time-series techniques to forecast those underlying economic variables. These advanced time-series forecasting techniques include auto-regressive integrated moving average (ARIMA) and the auto-regressive (AR) methods. Four statistical measures were used to evaluate a forecast performance and the statistical difference between the forecasted prices and actual prices. They are the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Theil inequality coefficient. While the first three statistics are based on quadratic loss and average errors, the Theil index measures the proximity between the forecasts examining the ability of the forecast series to match the mean and the variance of the actual series. The model forecast evaluation results are shown in Table 3. Panel A of Table 3 shows the model forecast evaluation results for houses when underlying variables are taken as given. Between the forecasted and actual quarterly house prices, the RMSE is AUD 22,578, the MAE is AUD 19,003, the MAPE is 2.207%, and the Theil for inequity measure is 0.013. The average house price in the Greater Sydney Area is about AUD 863,000 during the forecasting period between 2014 and 2018. Thus, the forecasted quarterly house prices closely track the actual house prices, which is demonstrated in Figure 1.  I II III IV I II III IV I II III IV I II III IV I II III IV  2014  2015  2016  2017 I II III IV I II III IV I II III IV I II III IV I II III IV  2014  2015  2016  2017  2018 Actual median prices Economic variables are taken as given Panel B of Table 3 shows the results of conditional forecast when the underlying economic variables are unknown and must be forecasted. Two popular and widely-applied time-series forecasting techniques (ARIMA and AR models) were considered in this analysis. The results show that using an AR(1) process in the underlying economic variable forecast produced the best forecasting results for units, while an AR(20) process produced the best forecasting results for houses. The ARIMA model produced somewhat forecasting results in between an AR(1) and AR (20) process. The results showed that the forecasting performance of developed models is conditional on the assumed knowledge of underlying variables. As AR(1) and AR (20) processes tend to set the boundary of forecasting, we chose to rely on the more sophisticated ARIMA model and selected AR(1) and AR (20) processes for future price forecasting in the next section. The results provide some insights into forecasting for decision-making under different underlying variable assumptions. Table 4 presents the forecast median house and unit prices for the Greater Sydney Area from 2019 to 2030. These prices are forecast-based on the model developed using the data between 2004 and 2018 described in Section 4. The underlying economic variables are estimated using the ARIMA and selected AR models. In other words, these are fitted values to each one of the macroeconomic variables, . For both houses and units, the best-predicted price pattern was conditional on a high order serial correlation in the underlying variable assumptions such as an AR (20) process, while the worst predicted price path was found if the underlying economic variables follow a low order serial correlation such as an AR(1) process. The results from the ARIMA variable assumptions were in between. Figure 2 shows the future price movements based on different underlying variable assumptions.

Forecasting Results
It shows that housing prices are more likely to follow a high-growth scenario, as indicated by the ARIMA underlying variable assumptions; the market will bottom out in 2019-2020 and continue to grow in the future. Under this scenario, the median price will reach AUD 1,596,568 for houses and AUD 1,217,315 for units in 2030, presenting an annual compounding growth rate of 5%. This forecast is supported by the alternative variable assumptions following an AR (20) process. Even in a low-growth scenario, as indicated in the AR(1) variable assumptions, there will be no significant price decline in the foreseeable future; rather, prices will mostly be flat or in a slow-growth mode with an average annual growth rate of 2% in the next decade.

Unconditional Forecast
The forecasting performance of the developed ECM model depends on the forecasting of underlying economic factors. This raises some concerns about the usefulness of the proposed ECM model. Alternatively, we used the unconditional forecast of the standard AR(1) process and ECM model without other economic factors to validate the results. The advantage of using an AR(1) process for house price forecasting is that it is simple and does not require any other variables in the modeling. As the standard ECM model only depends on prices, rents, and interest rates, there is no need to forecast other economic variables in this situation. Using the data from 2004 to 2013, we developed the standard AR(1) and ECM models and placed them in 5-year out-of-sample testing between 2014 and 2018 for performance comparison. The results are presented in Appendix B.
Our results show that the forecasting performance of a standard ECM model without underlying economic variables is less accurate than a standard AR(1) forecast, despite the ECM model having a higher adjusted R-squared (0.78) than the AR(1) process (0.62) in the model development period between 2004 and 2013. The findings were in line with the forecasting literature, stating that models, which fit the historical data well, do not necessarily perform better than others in forecasting. It is not surprising that our proposed conditional ECM model is superior to the unconditional forecast models such as the standard AR(1) and ECM models. Our findings stress the importance of including other economic variables in the ECM model development and, thereafter, forecasting.

Forecasted Median Rents
Our proposed ECM model can also be used to predict future rents. To check the validity of our price model, we further checked whether the predicted rents were in line with market expectations. The predicted rental price movements are presented in Appendix C. Our results show that the rental markets for both houses and units will continue to creep up in the foreseeable future. For houses, median rents are forecasted to increase from AUD 550 in 2019 to about AUD 760 per week in 2030. For units, the rental prices are forecasted to increase from AUD 530 in 2019 to AUD 840 per week in 2030, all in nominal prices. Our model shows a faster rental growth for units than houses. This could be due to the demand shift in the housing market as owning becomes unfordable for many people, especially for young people who choose to rent rather than own. As a result, the rental demands in the unit market grow at a faster pace than that of the housing market.

Forecasted Interest Rates
Interest rates (90-day bill rates) are a key variable in the proposed ECM forecasting model. Similar to the rent, we checked what the "predicted" interest rates look like. The predicted interest rate graph is included in Appendix D. Our model forecasts a continually downward interest rate curve that falls to near zero in the future, and virtually there is no difference between rates predicted in the house and unit markets. This forecast is in line with the recent interest rate cuts and the forecast by the Reserve Bank of Australia (RBA) that interest rates could stay low 'for a long, long time' [32]. Therefore, our model is robust in relation to the interest rate check. Note that interest rates, rents, and house prices are endogenously determined in the ECM model, which means that we can only take robustness checks on their predicted values but cannot conduct stress tests related to the increase or decrease in these variables in the future period.

Conclusions
Housing market dynamics have shifted from traditional consumption-motivated to investment-driven in many developed countries, including Australia, over past decades. Traditional reduced-form studies of housing market supply and demand are ad hoc without firm guidance from theory and struggle to confront the challenges brought about by the dramatically increasing importance of the housing market for both policymakers and the public in their decision-making. Using sophisticated econometric tools available for modern time-series analysis and forecasting, we explored the housing market dynamics by placing the investment demand at the center guided by the present value investment theory. We found that the principal variables in the housing market dynamics (long-run relationship) are prices, rents, and interest rates. Meanwhile, other influencing variables, such as bank mortgage lending, building construction costs, population growth, dwelling supply, and net migration, affected the market in the short run via rent and interest rates. We further demonstrated that the proposed investment model has a superior forecasting performance compared to alternative models. The results showed that the Sydney housing market is more likely to follow a high-growth scenario in the foreseeable future with an average compounding growth rate of 5% per annum.
As the interest rate is a national variable set by the Reserve Bank of Australia (RBA), an important policy implication from this research is if the State Government wants to stabilize housing prices to confront housing affordability, rent control for residential properties is a sensible way to do it. Alternative tools and policies could include increasing the building supply, restricting migration, and putting in place various kinds of purchase restriction controls. From the perspective of the Central Government, the most effective way to slow the housing market is via macro-prudential tools such as placing a loan-to-value ratio restriction on housing purchases. Importantly, we need to point out that our forecast results are based on the timeseries data up to 2018. The model parameters could be changed when new information is available. One advantage of our ECM forecast is that the model can be automated for updates in an IT platform designed to manage house price forecasts. Of course, the accuracy of our forecast depends on our understanding of the housing market dynamics and econometric tools adopted in this study. Funding: NSW Landcom.

Acknowledgments:
We thank Yang Zhang for their assistance with this research. We acknowledge Landcom, the NSW Government's land and property development organization, for funding this project.

1
A time series process with a unit root (a random walk). 2 The statistical area of the Greater Sydney Area is maintained by the Australian Bureau of Statistics. For 35 LGAs and their geographic locations and boundaries, please go to the Australian Bureau of Statistics website at: https://dbr.abs.gov.au/index.html (accessed on 25 July 2021). 3 The authors went through a wide range of data collection in this study. As not all variables are available or statistically significant in our model, we only report the key variables used in the ECM model, as shown in Table 1. Please contact the corresponding author for a complete list of variables collected in this study. 4 Results of Johansen's cointegration test are available on request from the corresponding author. 5 Results are available on request by contacting the corresponding author.