Next Article in Journal
Stochastic Inventory Optimization with Coherent Risk Measures: A Decision-Theoretic Framework for Probabilistic Forecasting and Constrained Optimization
Previous Article in Journal
Comparative Analysis of Ensemble Machine Learning Models for Risk-Oriented Monitoring of Military Procurement
Previous Article in Special Issue
Towards Generative Interest-Rate Modeling: Neural Perturbations Within the Libor Market Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Why Market Prices May Not Be the Best Benchmark for Automated Valuation Models: Empirical Evidence of Ex Ante Unobservability of Gender-Associated Price Discrepancy in the Auckland House Market

Department of Property, The University of Auckland, 12 Grafton Road, Auckland 1010, New Zealand
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2026, 19(3), 171; https://doi.org/10.3390/jrfm19030171
Submission received: 28 January 2026 / Revised: 26 February 2026 / Accepted: 26 February 2026 / Published: 28 February 2026
(This article belongs to the Special Issue Quantitative Finance in the Era of Big Data and AI)

Abstract

Automated Valuation Models (AVMs) are typically trained by learning to replicate observed housing transaction prices. This paper argues that such benchmarking is theoretically debatable. Market transaction prices are not direct measures of underlying property value but are realised outcomes of exchange processes that involve buyer-specific attributes that are unobservable prior to sale. Using residential housing transactions from Auckland, New Zealand, and buyers’ gender inferred from unstructured purchaser name data via artificial intelligence-based natural language processing, we provide empirical evidence that buyer attributes systematically affect transaction prices. Specifically, gender composition is shown to influence the discrepancy between AVM estimates and transaction prices, while no corresponding effect is found when AVMs are compared with capital values, which are the Council’s appraisals for rating purposes. This asymmetry reflects the shared information set of AVMs and professional appraisals, as both are based only on property and market information available prior to sale and do not incorporate buyer identity. The findings provide initial evidence for valuers to address the latest professional requirements of using AVMs.

1. Introduction

Automated Valuation Models (AVMs) have become central to contemporary property valuation, mortgage underwriting, taxation, and regulatory compliance (Karanikolas et al., 2025; Bjørgve et al., 2025; Jafary et al., 2024; Yiu & Cheung, 2025). Their credibility is typically assessed by how closely their estimates align with observed transaction prices. This paper challenges that convention, i.e., whether transaction prices constitute an appropriate benchmark for evaluating valuation models depends very much on whether the AVM is intended to forecast realised exchange prices or to estimate an expected transaction price conditional on ex ante information as required by appraisal standards. Market transaction prices are realised outcomes of exchange. They reflect not only the characteristics of the property and prevailing market conditions, but also buyer-specific preferences, gender, urgency, and financing constraints that are revealed only at the point of sale. These buyer-side factors are fundamentally unobservable prior to the transaction and, therefore, lie outside the information set available to both professional valuers and AVMs at the time an estimate is produced. As a consequence, discrepancies between AVM estimates and transaction prices may arise even when the valuation model is correctly specified and faithfully reflects all relevant pre-sale information.
This distinction has an important implication for how price discrepancy is understood in AVM benchmarking exercises. When transaction prices are used as the reference point, any systematic association between buyer attributes and realised prices will appear as a persistent discrepancy in AVM estimates. However, such a discrepancy is not necessarily remediable. It does not stem from model misspecification, data quality problems, or algorithmic failure, but from the use of a benchmark that incorporates information unavailable to the valuation process itself. Attempting to eliminate these discrepancies by forcing AVMs to fit transaction prices risks redefining valuation as transaction prediction and embedding post hoc buyer effects into what is intended to be an estimate of market value formed under standard valuation assumptions.
In contrast, professional appraisals and AVMs share the same knowledge foundation. Both are ex ante constructs that rely on property characteristics, market evidence, and comparable sales information observable prior to sale, and both operate under the assumption of a typical market participant rather than a specific buyer. From this perspective, divergence between AVM estimates and transaction prices should not be interpreted mechanically as valuation error. Instead, such divergence may reflect buyer-side heterogeneity that cannot be learned, predicted, or generalised by design. Recognising this distinction is essential for a coherent interpretation of AVM performance and for aligning empirical validation practices with the conceptual foundations of valuation.
Therefore, we argue that market transaction prices are not the appropriate theoretical benchmark for training AVMs when the purpose is to estimate ex ante market value under appraisal standards rather than to forecast realised exchange prices, because they embed buyer-specific attributes and idiosyncratic decision processes that are fundamentally unobservable prior to sale. That means not all apparent discrepancies identified through benchmarking against transaction prices are, in principle, remediable.
IVSC (2024, 2025) and RICS (2025) have issued new standards and guidance on the use of AVMs and artificial intelligence in valuation practice. Both emphasise due diligence, transparency about inputs and limitations, and the identification and management of bias in model outputs. An important implication is that some measured discrepancies in AVM benchmarking exercises can reflect structural features of market price formation rather than deficiencies in valuation models. Efforts to eliminate these discrepancies by forcing AVMs to fit transaction prices can blur the boundary between valuation and transaction prediction and can inadvertently impose ex post buyer effects into what is intended to be an ex ante estimate of market value under standard valuation assumptions. For this reason, it is necessary to clarify why certain forms of apparent discrepancy cannot be eliminated when AVMs are trained or validated primarily against market transaction data.
Using residential housing transactions from Auckland, New Zealand, this paper provides empirical evidence consistent with this interpretation. Buyer attributes, proxied by gender inferred from purchaser name strings using an artificial intelligence-based classification approach, are shown to have a statistically significant association with transaction prices. At the same time, no corresponding association is observed when AVM estimates are compared with contemporaneous capital values, which are appraisal-based benchmarks constructed under the same ex ante information constraints. This asymmetry supports the theoretical claim that transaction prices are contingent outcomes rather than objective realisations of underlying value, and that some forms of apparent AVM discrepancy arise unavoidably from the use of an inappropriate benchmark.
The contribution of this paper is threefold. First, it develops an information set argument showing why AVMs, as ex ante valuation tools, cannot and should not be expected to match realised transaction prices closely in all cases. Second, it provides empirical evidence from the Auckland house market indicating that transaction prices incorporate buyer-specific attributes, proxied by gender composition inferred from purchaser name strings, in ways that systematically affect AVM-to-transaction price discrepancies. Third, it reframes AVM evaluation by arguing that appraisal-based benchmarks constructed under the same ex ante information constraints offer a more coherent reference point for assessing AVM performance than transaction prices alone, if the purpose is to estimate an expected transaction price conditional on ex ante information as required by appraisal standards.

2. Related Theory and Methods

2.1. Appraisals as Ex Ante Valuation Constructs

Professional property valuation is an ex ante exercise. Appraisers estimate value based on known information: physical attributes, location, zoning, market conditions, and comparable sales. Critically, appraisers do not, and cannot, condition their estimates on the identity, preferences, or behavioural traits of the eventual buyer.
AVMs are constructed under the same constraint. Regardless of methodological sophistication, AVMs are limited to the observable data available before the transaction occurs. As such, AVMs are best understood as algorithmic appraisers, not transaction predictors.

2.2. Market Prices as Bilateral Outcomes

Market transaction prices are not valuations; they are equilibrium outcomes of a bilateral bargaining or matching process. Even in competitive housing markets, the final price reflects the following:
  • The reservation price of the seller;
  • The willingness-to-pay of a specific buyer;
  • Negotiation dynamics, timing, and strategic behaviour.
Buyer attributes, such as risk tolerance, income expectations, household composition, and behavioural biases, affect willingness-to-pay but are not observable or predictable prior to sale. Consequently, transaction prices are inherently noisy signals of value.
There have been very few empirical studies testing buyers’ gender effects on house prices, probably because the information is usually not available in housing transaction datasets. However, a recent study by the US Federal Housing Finance Agency (FHFA), Bosshardt et al. (2025), empirically found a gender bias in property valuation for mortgage refinancing. The authors could gather the information of the buyers because FHFA is the authority that is in possession of the proprietary version of the new Uniform Appraisal Dataset (UAD), and the home buyers are the applicants for mortgage refinances.
Yet, with the advancement of artificial intelligence (AI) and generative AI, gender can now be estimated with reasonable accuracy by Large Language Models (LLMs) based on buyers’ names (Burtch & Zentner, 2024). It enables our study to test the hypotheses.

2.3. Why AVMs Cannot Learn Buyer Attributes

Machine learning models require stable and generalisable relationships between inputs and outputs. Buyer-specific attributes violate this requirement for three reasons. First, buyer identity is unknown before a transaction; in other words, buyer-specific attributes are ex ante unobservable. Second, buyer preferences can vary across individuals in ways that are not structurally linked to property attributes; such an idiosyncrasy makes the relationships unstable. Third, the variety of buyer attributes is widely ranged; each transaction reflects a unique buyer–property match that cannot be replicated accurately based on past probabilities. Therefore, even in principle, AVMs cannot be trained to account for buyer-specific effects without contaminating the valuation task with post hoc information.
More broadly, the general mechanism proposed in this paper is where buyer-side behaviour and constraints can shape realised transaction outcomes in ways that are difficult to anticipate ex ante. This is consistent with behavioural evidence from household finance, such as Mugerman et al.’s (2016) study, showing that mortgage choices are often influenced by salient cues, framing, and simple heuristics of the borrowers rather than by fully forward-looking calculations. Such behavioural responses can generate systematic deviations between predicted and realised outcomes, even when valuation models incorporate rich market and property information.

2.4. Gendered Price Differentials

Gendered pricing dynamics have been widely studied in various disciplines. For example, Kim et al. (2019, p. 2) found in the US housing markets that “female homebuyers pay a 2% premium on average”. Goldsmith-Pinkham and Shue (2023, p. 1097) studied sellers’ gender and found that “single women earn 1.5 percentage points lower annualized returns on housing”. They partially explained it by gender differences in negotiation strategies, which have also been confirmed in other disciplines (Roussille, 2024; Castillo et al., 2013).
Harten (2021, p. 85) also found in China’s shared rental market that women paid “a premium of almost 10% to rent in better, less crowded conditions”. It can be caused by gender disparities in the willingness to pay for specific housing attributes (Li & Ellis, 2014). For example, Andersen et al. (2021) demonstrated that gendered price differences in real estate transactions can be eliminated when property attributes are controlled for.

3. Development of Hypotheses

3.1. Valuation and Transaction Prices Under Different Information Sets

From an information economics perspective, valuation and transaction prices are generated under different information sets. Let I V denote the information set available at the time a valuation is produced, and I T denote the information set at the point of transaction. By construction, I V I T . Professional appraisals and Automated Valuation Models are ex ante constructs formed using property characteristics, market conditions, and comparable evidence observable prior to sale. Buyer identity, preferences, constraints, and negotiation behaviour belong to I T but are not observable within I V .
This distinction has direct implications for model evaluation. An AVM trained or validated against outcomes generated under I T is assessed against information that was not available to the valuation process itself. As a result, discrepancies between AVM estimates and transaction prices may arise even when the valuation model correctly reflects all relevant pre-sale information.
In a rational expectations framework, transaction prices can be decomposed into an expected market value component and an idiosyncratic buyer-specific shock:
P i = V i + ε i
where V i = E [ P i I V ] represents market value conditional on the valuation information set, and ε i captures buyer-specific factors such as preferences, urgency, financing constraints, and negotiation dynamics. By definition, E [ ε i I V ] = 0 , implying that these buyer-side components are orthogonal to the information used by valuers and AVMs. AVMs are designed to approximate V i , not the realised transaction price P i . Consequently, even a perfectly specified AVM cannot eliminate discrepancies relative to transaction prices when those prices incorporate idiosyncratic buyer-side shocks. Such discrepancies do not represent valuation error but reflect irreducible variation arising from the exchange process.
The information set distinction implies an asymmetric prediction, which is a testable hypothesis and thus motivates our empirical tests. If buyer attributes affect transaction prices through ε i , then measures of AVM discrepancy relative to transaction prices should be systematically related to buyer characteristics. In contrast, no such relationship should exist when AVM estimates are compared with appraisal-based benchmarks constructed under the same ex-ante information constraints, such as capital values.

3.2. Hypotheses

This asymmetry provides a direct empirical test of whether observed AVM discrepancies reflect buyer-side heterogeneity rather than valuation model deficiencies. Based on the foregoing discussion, we propose the following three hypotheses.
Hypothesis 1 (Transaction Price Sensitivity).
AVM discrepancies relative to transaction prices are systematically associated with buyer attributes that are unobservable at the time of valuation.
Hypothesis 2 (Appraisal Benchmark Neutrality).
AVM discrepancies relative to appraisal-based benchmarks constructed under ex ante information constraints are not systematically associated with buyer attributes.
Hypothesis 3 (Benchmark Asymmetry).
The association between buyer attributes and AVM discrepancies is significantly stronger when transaction prices are used as the benchmark than when appraisal-based benchmarks are used.

4. Data and Methods

This study used a dataset of 16,728 residential freehold-interest housing transactions of Auckland, New Zealand, in 2024 with their contemporaneous council’s appraisals (capital values (CVs)) and AVM estimates matched with their corresponding buyers’ names identified. The transaction sample covered residential sales occurring in Auckland during the 2024 calendar year. The capital values were estimated in May 2024, and the AVM estimates were made in early 2024. The transaction prices and housing attributes were obtained from Cotality (2024), while the CVs were assessed by the Auckland Council valuers, conducted in May 2024 for rating valuation purposes (Auckland Council, 2024). The AVM valuations were estimated by Quotable Value Limited in early 2024 (QV, 2024) and reported by Relab (2024).
Some previous studies, such as Kim et al.’s (2019), relied on social security data and hospital data for babies born to identify home traders’ genders. However, this method can fail when many sellers and buyers are not locals. More importantly, many studies did not differentiate between single buyers and multiple buyers. For example, Kim et al. (2019) assumed a female buyer, “if one of the buyers is female”, in the case of multiple buyers, which would confuse mixed-gender buyer effects and female buyer effects. Andersen et al. (2021), meanwhile, considered single females, single males and couples, without differentiating single gender couples. The price differences were mainly attributable to single versus multiple buyers, rather than gendered price differentials. This study, therefore, categorised gender into six categories, viz., single female buyers, single male buyers, mixed-gender buyers, all-female multiple buyers, all-male multiple buyers, and no-gender (such as company) buyers. This allowed the separate identification of gendered price differentials against mixed-gender buyer prices.

4.1. Identification of Buyers’ Gender by an LLM

Property title records in New Zealand show property owners’ names and other property information registered in LINZ (2026). The owners’ name records contain all the full names of multiple owners, including corporate owners, separated by delimiters. However, neither the property title records nor the housing transaction information contains structured demographic information on buyers. Buyer identity is recorded only as unstructured text strings of purchaser names, which may represent individuals, couples, trusts, or corporate entities. As a result, buyer attributes, such as gender, cannot be directly observed and can only be inferred probabilistically.
We employed a natural language processing (NLP)-based zero-shot classification approach using OpenAI’s ChatGPT-4 (version 4.0) API to infer the buyer’s gender from purchaser name strings. Unlike supervised classifiers trained on name–gender dictionaries, this approach allows contextual interpretation of compound names (e.g., multiple purchasers, legal entities, and trusts), reducing mechanical misclassification. Each transaction was classified into one of seven mutually exclusive categories based on the purchaser’s name string:
  • Company—corporate entities, trusts, limited liability companies, or institutional buyers;
  • Female—individual buyers inferred to be female;
  • Male—individual buyers inferred to be male;
  • Mixed—multiple buyers inferred to include both male and female individuals (e.g., couples);
  • All female—multiple buyers inferred to be all female;
  • All male—multiple buyers inferred to be all male;
  • NA—indeterminate cases where gender cannot be inferred with reasonable confidence (e.g., no names, initials only, ambiguous or rare names).
In the classification process, the name strings were first tokenised. Strings containing keywords such as “Limited”, “Ltd.”, “Trust”, “Trustee”, “Holdings”, “Company”, or similar legal-entity markers are classified as “Company”. “NA” is also defined as cases where only initials were provided, or no clear personal and company name appeared, or the model could not assign gender with reasonable interpretability.

4.2. Validation of the LLM-Based Gender Classification

The validity and reproducibility of the NLP-based gender classification were evaluated using an external New Zealand gender-known dataset obtained from the New Zealand Companies Office (NZCO, 2024), administered by the Ministry of Business, Innovation and Employment. The registry contains more than 18,000 companies and provides person-level information on company directors and managers, including full names and self-reported gender. This administrative dataset serves as an independent ground-truth benchmark for assessing NLP-based classification accuracy of gender in the New Zealand context.
To avoid mechanical identification, honorific prefixes (e.g., Mr. or Ms.) were removed prior to classification. The model input consisted solely of the director’s full name. The exact prompt used was “The gender of the person is {Male, Female}.” The temperature parameter was set to 0, ensuring deterministic output. Under this configuration, repeated calls produced similar classifications, supporting reproducibility. We also tried four different prompts, and the accuracy rates were close.
The validation results show that the probability of correct predictions was 94.1% for female and 95.9% for male observations. These accuracy rates were relatively high and remained stable across repeated evaluations. While both classification probabilities were strong, the results indicate a modest asymmetry, with male names being slightly more likely to be correctly identified than female names. Overall, the validation exercise suggests that the NLP-based procedure provides a reliable gender classification for New Zealand personal names, subject to a small and measurable degree of classification error.

4.3. Descriptive Statistics

After removing “NA” buyers and 9 outliers (transactions of excessively low price), the cleaned dataset comprised 10,825 transactions (full sample), with buyer gender estimated. Further removing “company” buyers, the dataset had 8692 valid transactions (Subsample 1), with the descriptive statistics of the dataset in Table 1. After further removing transactions of same-gender multiple buyers (all female and all male), the sample size was reduced to 6954 (Subsample 2). Details of the data cleaning and subsampling criteria are in Appendix A.

4.4. Empirical Models

A simple t-test was conducted to test the equality of the means of AVMCV and AVMTP. It tested whether the AVM could estimate CV or TP more accurately. Simple regression models were also estimated, comparing gender-associated price discrepancies.
A V M i T P i T P i = α 1 + β 1 M i + γ 1 F i + π 1 N i + X μ x , 1 + ε 1 , i
A V M i C V i C V i = α 2 + β 2 M i + γ 2 F i + π 2 N i + X μ x , 2 + ε 2 , i
A V M i T P i T P i A V M i C V i C V i = α 2 + β 2 M i + γ 2 F i + π 2 N i + X μ x , 2 + ε 2 , i
This study conducted simple regression models Equations (1)–(3), where A V M i T P i and A V M i C V i are the absolute discrepancies of A V M i against transaction prices T P i and against capital values C V i of house i; M i / F i are dummy variables that equal 1 when the buyer is male/female, or 0 otherwise. The base dummy for buyers’ gender (missing variable) is multiple buyers with mixed genders. N i refers to the number of buyers. X is a vector of property characteristics, including number of bedrooms, building cohorts, land area, tenure, etc. ε is the error term.

5. Results

5.1. Comparison of the Magnitudes of Discrepancies in AVM Estimates

The t-test results are shown in Table 2. Panel A tests whether the mean of the signed within-transaction difference equals zero. The estimated mean (11,545) is positive and statistically significant (t = 4.85; p < 0.001), indicating that the AVM deviation associated with TP exceeds that associated with CV. Because the gaps are signed, this test alone does not establish which benchmark yields larger valuation errors. Panel B evaluates differences in error magnitude by testing whether the mean of the within-transaction absolute difference equals zero. The mean (15,454) is positive and highly statistically significant (t = 8.42; p < 0.001), indicating that the absolute AVM deviation relative to TP is systematically larger than that relative to CV for the same transactions. Unlike Panel A, this result directly measures predictive accuracy because absolute differences eliminate directional effects. Figure 1 depicts the results by a box-plot diagram that both the signed and the absolute AVM paired discrepancies against transaction prices are larger than those against capital values.

5.2. Gender Effects on AVM Discrepancies

The absolute AVM discrepancy proportions against transaction prices are found to be dependent on buyer gender after controlling for property characteristics, but not (or less) in the absolute AVM discrepancy proportions against CVs (Table 3). This finding suggests that transaction prices embed buyer-specific behavioural or preference-driven components beyond what can be explained by ex ante observable housing attributes.
For example, Model 1 results show that gender-associated absolute paired discrepancies against TPs are about 7–9%, and the number of co-buyers also imposes about 7% on the discrepancy. Yet most of them become statistically insignificant and of a much smaller magnitude in the absolute paired discrepancies against CVs (model 2). The model 3 results further demonstrate that the differences between the paired discrepancies are associated with buyer gender and number of co-buyers, with almost the same magnitudes in model 1. This implies that the gender-associated paired discrepancies are mainly from the AVM discrepancy against TPs. The effects are statistically significant and persist after controlling for property characteristics, such as number of bedrooms, building floor areas, built years, land size, and tenure type. More control variables, such as time dummies, have been tried, and the results are intact.
This pattern suggests that buyers’ characteristics can significantly explain AVM absolute paired discrepancy against transaction prices, but not against valuation benchmarks that are constructed independently of buyer identity. The results support our argument that capital values, like AVM estimates, are produced without knowledge of the eventual buyer’s attributes. Consistent with this shared information set, the regression results show no or less statistically significant differences in A V M i     C V i C V i gaps across buyer gender categories. This finding provides direct empirical support for the proposition that AVMs and valuers operate within a more similar framework that both estimates value based on known properties and market attributes prior to sale.
These findings do not imply inefficiency or discrimination in the housing market. Rather, they demonstrate that market prices are conditional outcomes, shaped by who the buyer is, not merely what the property is. Such effects are unobservable to both appraisers and AVMs at the time when valuations are produced.

5.3. Robustness Test on Institutional Effects on AVM Discrepancies

The significant effects of the number of co-buyers on AVM discrepancies against transaction prices may indicate institutional effects that certain types of houses are commonly purchased by institutions registering a large number of co-buyers. In order to further test whether institutional buyers are a contributor to AVM discrepancy, we conducted a robustness test by including “companies” in the dataset.
Table 4 reports the results of the robustness test. Company buyers are found to have a much higher AVM absolute paired discrepancy against TPs than non-company buyers (model 4), and the number of co-buyers’ effect is mainly found in company buyers. However, such institutional effects are not found in the AVM absolute paired discrepancy against CVs, and have much smaller magnitudes (Model 5). The model 6 results also confirm that the institution-associated paired discrepancies are mainly due to the AVM discrepancies against TPs, rather than against CVs. This is probably because houses commonly purchased by institutions, such as social housing institutes, companies, trusts, etc., are priced with business considerations. This can be a further study on AVM discrepancy.

6. Discussion

6.1. Market Prices Are Not “True Values”

A widespread but analytically problematic assumption in both academic research and professional practice of AVM is that observed transaction prices represent “true” market values for training AVM. In reality, housing transaction prices are equilibrium outcomes of individualised exchange processes rather than objective measures of underlying asset value. They can be affected by a combination of search frictions, behavioural biases, liquidity constraints, and buyer-specific preferences, many of which operate at the point of transaction and vary across buyers in ways that are not observable before transacting.
These factors may not reflect the true value of the property; some of them can even be noise. Search frictions, for example, imply that buyers and sellers transact under imperfect information and time constraints, which often lead to outcomes that deviate from market valuation. Behavioural factors of buyers, such as over-optimism, anchoring, or loss aversion, can also distort willingness to pay. Liquidity constraints and financing conditions of buyers can either inflate or suppress prices, but are not observable before transactions. Most importantly, buyer-specific preferences introduce idiosyncratic variation that is unique to each transaction and cannot be inferred probabilistically from property transactions alone.
These sources of variation generate “idiosyncratic transaction noise”. Because this noise is not systematic or observable prior to sale, it cannot be learned by statistical or machine learning models. Consequently, deviations between AVM estimates and transaction prices should not be interpreted as valuation errors.

6.2. Why AVMs Fit Appraisals Better than Transactions

The empirical findings of this study support the view that AVMs are structurally aligned with professional appraisals more than with realised transaction prices. Both AVMs and appraisers operate under similar conditions and constraints: they rely exclusively on information observable prior to sale, assume a typical willing buyer, and aim to estimate market value rather than a realised exchange price. From this perspective, a closer match between AVM estimates and appraisal-based valuations is not only expected but theoretically supported.
The empirical evidence from the Auckland housing market further clarifies why deviations between AVMs and transaction prices are unavoidable in transactions involving different buyers. For example, female-only, male-only, and mixed-gender buyers occur in approximately equal proportions (1:1:5). Because buyers’ gender is unobservable at the time of valuation, AVMs cannot condition their estimates on buyer composition. At best, they can implicitly approximate expected transaction prices by forming probability-weighted expectations over an unknown distribution of buyer types.
In other words, an AVM estimate of a house is not a point prediction of realised transaction price, but an expected market value across heterogeneous buyers. Single-gender buyers are found to have an average absolute deviation of about 7–9% between actual transaction prices and AVM estimates, after controlling for other property-related factors. In contrast, the absence of statistically significant gender effects in the difference between AVM estimates and capital values reinforces this interpretation. This indicates that AVM estimates can be less deviated from valuation benchmarks constructed under identical information conditions, at least with respect to buyers’ gender. Thus, market prices incorporate buyer-side idiosyncrasies that are fundamentally unobservable and ex ante non-learnable, implying that training AVMs using transaction prices is intrinsically subject to idiosyncratic transaction noise.

6.3. Implications for AVM Validation and Regulation

These findings have important implications for how AVMs are evaluated and interpreted. First, AVM performance should be assessed primarily against professional appraisal benchmarks rather than against transaction prices alone, if the purpose is to estimate ex-ante market value under appraisal standards rather than to forecast realised exchange prices. Benchmarking against realised prices confuses valuation accuracy with price prediction and unfairly penalises models for differences that are theoretically unavoidable.
Second, valuation standards and regulatory frameworks, such as those promulgated by the International Valuation Standards Council (IVSC) and Royal Institution of Chartered Surveyors (RICS), should more explicitly distinguish between the accuracy of a valuation model and its ability to predict transaction outcomes. Failure to make this distinction may lead to inappropriate expectations of AVM performance and overly restrictive regulatory responses.
Finally, the results call for greater caution in empirical research that interprets AVM discrepancy against transaction prices as evidence of model failure. Such interpretations may mischaracterise the nature of market prices and overlook the role of buyer-specific heterogeneity. Recognising that transaction prices are not synonymous with value provides a more coherent foundation for both AVM research and valuation practice.

7. Conclusions

This paper argues that if AVMs are theoretically designed to approximate appraisal-based valuations rather than realised market transaction prices, then it is more appropriate to train the algorithm by appraised estimates instead of market transaction prices, as market transactions can involve unlearnable factors due to their unpredictability and unobservability before transactions. This study, for example, investigated the association between buyer attributes, such as buyer gender, which are unobservable at the time of valuation, and the estimate discrepancy against transaction prices. These buyer-specific effects introduce irreducible idiosyncratic noise that AVMs cannot learn or that professional appraisal practice cannot incorporate. As a result, discrepancies between AVM estimates and transaction prices should not be interpreted as AVM errors, but as a structural feature of market price formation. Recognising this distinction reframes how AVMs should be evaluated, regulated, and interpreted. AVMs and appraisers operate under similar conditions and information constraints and share the same objective: estimating property value based on observable property and market characteristics prior to sale. Expecting AVMs to accurately predict realised transaction prices confuses valuation accuracy with transaction predictability, risking unavoidable deviations due to model failure.
This is a preliminary yet novel empirical study to illustrate the concept. There are several limitations that point to directions for future research. First, buyer attributes are proxied through gender classification inferred from purchaser name strings using natural language processing techniques. While this approach is viable for identifying buyer-side heterogeneity, it is subject to biases that require further validations and should not be interpreted as precise demographic identification. Second, the empirical analysis focuses on a single metropolitan housing market, namely, Auckland, where the policy-relevant context differs, such that buyer behaviour and market dynamics may differ from those in other cities. In other words, even though the concept is applicable across housing markets in other countries, the empirical magnitude of gender-associated price discrepancies may vary depending on institutional settings, market structure, and buyer composition. Third, the analysis examines only one dimension of buyer heterogeneity; other unobservable attributes, such as risk preferences, financing constraints, or investment motivations, may further contribute to transaction-level price variation. Fourth, due to the limitations of the dataset, some micro-level variables, such as renovation quality, micro-location amenities, contract terms, marketing channel, etc., are not included in the control and fixed effects. This can result in omission variable bias if the buyer gender composition is correlated with the omitted variables.
Further studies could explore alternative proxies for buyer heterogeneity, examine additional demographic or behavioural dimensions, or test the proposed framework across different housing markets and regulatory environments. More broadly, future work could formalise the distinction between valuation and transaction processes within structural models of housing markets, providing deeper theoretical foundations for AVM evaluation. Research comparing AVMs directly with professional appraisals across varying market conditions would also be valuable in refining regulatory benchmarks. Furthermore, the smoothing and lagging issues in appraisal-based series (Geltner, 1993) can also be examined.
Finally, it is useful to situate the findings within the broader literature on AVM uncertainty and calibration. While the present study does not estimate prediction intervals (Krause et al., 2020), the documented gap between AVM outcomes and alternative benchmarks can be interpreted as part of the residual, transaction-level discrepancy that remains after observable information is exhaustively included. From this perspective, the results help clarify what users may regard as an “irreducible” component of AVM disagreement, even in the absence of formally calibrated intervals. In conclusion, this study contributes to a clearer conceptual understanding of what AVMs can and cannot be expected to achieve. By distinguishing AVM estimates from transaction outcomes, it offers a more coherent basis for interpreting AVM performance and for aligning empirical research, professional practice, and regulatory expectations. It provides some initial evidence for valuers to address the latest professional requirements when using AVMs by IVSC (2025) and RICS (2025). Lastly, it is emphasised that our results document an association between inferred buyer gender composition and AVM discrepancies relative to transaction prices; they are not evidence of unfair treatment or discriminatory valuation.

Author Contributions

Conceptualization, C.Y.Y. and K.S.C.; methodology, C.Y.Y. and K.S.C.; validation, C.Y.Y. and K.S.C.; formal analysis, C.Y.Y. and K.S.C.; investigation, C.Y.Y. and K.S.C.; resources, C.Y.Y. and K.S.C.; data curation, C.Y.Y. and K.S.C.; writing—original draft preparation, C.Y.Y.; writing—review and editing, C.Y.Y. and K.S.C.; visualization, C.Y.Y. and K.S.C.; project administration, C.Y.Y.; funding acquisition, C.Y.Y. and K.S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This research was conducted under ethics approval granted by The University of Auckland Human Participants Ethics Committee (UAHPEC), Review Reference UAHPEC27260, under the approved project titled “Biases of ChatGPT: Measuring the Algorithmic Discrimination in Artificial Intelligence.”

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Cotality (2024), Relab (2024) and NZCO (2024) by data subscriptions, which are available at https://www.cotality.com/nz/industries/real-estate and https://relab.co.nz/, accessed on 1 September 2025 with the permission of Cotality, Relab and NZCO via the University of Auckland library’s subscriptions.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT, version 4.0, for the purposes of identifying buyers’ gender. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. Both authors are chartered members of RICS.

Abbreviations

The following abbreviations are used in this manuscript:
AVMAutomated Valuation Model
CVCapital value (referring to council-assessed rating values of properties)
TPTransaction price
IVSCInternational Valuation Standards Council
RICSRoyal Institution of Chartered Surveyors

Appendix A

The data-cleaning process and the sample selection criteria are shown in Table A1 and Figure A1. First, the data-cleaning process removed transactions of incomplete data, “NA” buyers and 9 outliers (transactions of excessively low price), resulting in 10,825 transactions (full sample), with buyer gender estimated. Second, a subsample was built by removing “company” buyers. The subsample had 8692 valid transactions (subsample 1), with the descriptive statistics of the dataset in Table 1. Third, another subsample was created by further removing transactions of same-gender multiple buyers (subsample 2). The sample size was reduced to 6954.
Figure A1. Data cleaning and sample selection process.
Figure A1. Data cleaning and sample selection process.
Jrfm 19 00171 g0a1
Table A1. Sample selection criteria.
Table A1. Sample selection criteria.
Data Sample NameNumber of TransactionsProcessSelection Criteria
Full Sample10,825Data cleaning—New Zealand market transactions of residential properties in freehold interest Remove transactions of incomplete data, remove “NA” buyers, remove 9 outliers of extreme low-price transactions
Subsample 18692Subsampling process 1Exclude transactions of company buyers
Subsample 26954Subsampling process 2Exclude transactions of company buyers and of same-gender multiple buyers

References

  1. Andersen, S., Marx, J., Nielsen, K. M., & Vesterlund, L. (2021). Gender differences in negotiation: Evidence from real estate transactions. The Economic Journal, 131(638), 2304–2332. [Google Scholar] [CrossRef]
  2. Auckland Council. (2024). General property revaluation. Auckland Council, New Zealand. Available online: https://www.aucklandcouncil.govt.nz/en/property-rates-valuations/our-valuation-of-your-property/general-property-revaluation.html (accessed on 31 December 2025).
  3. Bjørgve, E., Oust, A., Pollestad, A. J., Sandnes, C., & Sønstebø, O. J. (2025). Comparing housing valuation techniques and stacked generalization: Exploiting explainable AI. Journal of Real Estate Finance and Economics, 72, 640–665. [Google Scholar] [CrossRef]
  4. Bosshardt, J., Kedia, S., & Zhang, T. (2025). The male and female gap in home appraisals. Working Paper 25-01. Federal Housing Finance Agency (FHFA), US. Available online: https://www.fhfa.gov/document/wp2501.pdf (accessed on 28 January 2026).
  5. Burtch, G., & Zentner, A. (2024). Gender bias and property taxes. arXiv, arXiv:2412.12610. [Google Scholar]
  6. Castillo, M., Petrie, R., Torero, M., & Vesterlund, L. (2013). Gender differences in bargaining outcomes: A field experiment on discrimination. Journal of Public Economics, 99, 35–48. [Google Scholar] [CrossRef]
  7. Cotality. (2024). Property guru—Unlock the future of property data, residential real estate. Available online: https://www.cotality.com/nz/industries/real-estate (accessed on 20 October 2025).
  8. Geltner, D. M. (1993). Estimating market values from appraised values without assuming an efficient market. Journal of Real Estate Research, 8(3), 325–346. [Google Scholar] [CrossRef]
  9. Goldsmith-Pinkham, P., & Shue, K. (2023). The gender gap in housing returns. The Journal of Finance, 78(2), 1097–1145. [Google Scholar] [CrossRef]
  10. Harten, J. G. (2021). Housing single women: Gender in China’s shared rental housing market. Journal of the American Planning Association, 87(1), 85–100. [Google Scholar] [CrossRef]
  11. IVSC. (2024). International valuation standards 2025. International Valuation Standards Council. Available online: https://ivsc.org/new-edition-of-the-international-valuation-standards-ivs-published/ (accessed on 29 January 2026).
  12. IVSC. (2025). Navigating the rise of artificial intelligence in valuation: Opportunities, risks, and standards. Standard Review Board, International Valuation Standards Council. Available online: https://ivsc.org/wp-content/uploads/2025/07/Navigating-the-Rise-of-AI-in-Valuation-Opportunities-Risks-and-Standards.pdf (accessed on 29 January 2026).
  13. Jafary, P., Shojaei, D., Rajabifard, A., & Ngo, T. (2024). Automated land valuation models: A comparative study of four machine learning and deep learning methods based on a comprehensive range of influential factors. Cities, 151, 105115. [Google Scholar] [CrossRef]
  14. Karanikolas, N., Kyriakidou, E., & Athanasouli, E. (2025). Artificial intelligence and Real estate valuation: The design and implementation of a multimodal model. Information, 16(12), 1049. [Google Scholar] [CrossRef]
  15. Kim, M., Norwood, B., O’Connor, S., & Shen, L. (2019). I am Jane. Do I pay more in the housing market? Economics Bulletin, 39(2), 1612–1620. [Google Scholar]
  16. Krause, A., Martin, A., & Fix, M. (2020). Uncertainty in automated valuation models: Error-based versus model-based approaches. Journal of Property Research, 37(4), 308–339. [Google Scholar] [CrossRef]
  17. Li, Y. M., & Ellis, J. L. (2014). Consumers’ willingness to pay using an experimental auction methodology: Applications to brand equity. International Journal of Consumer Studies, 38(4), 435–440. [Google Scholar] [CrossRef]
  18. LINZ. (2026). NZ property titles. Land Information New Zealand, New Zealand Government. Available online: https://data.linz.govt.nz/layer/50804-nz-property-titles/ (accessed on 16 February 2026).
  19. Mugerman, Y., Ofir, M., & Wiener, Z. (2016). How do homeowners choose between fixed and adjustable rate mortgages? The Quarterly Journal of Finance, 6(4), 1650013. [Google Scholar] [CrossRef]
  20. NZCO. (2024). Companies register. New Zealand Companies Office. Available online: https://companies-register.companiesoffice.govt.nz/ (accessed on 24 May 2024).
  21. QV. (2024). Property search—Access property valuation estimates and information for every property in New Zealand. Quotable Value. Available online: https://www.qv.co.nz/property-search/ (accessed on 20 October 2025).
  22. Relab. (2024). Property portal Limited (Relab). Available online: https://relab.co.nz/ (accessed on 20 October 2025).
  23. RICS. (2025). Responsible use of AI—A new RICS professional standard, artificial intelligence in the natural and built environment sector. Royal Institution of Chartered Surveyors. Available online: https://www.rics.org/profession-standards/rics-standards-and-guidance/conduct-competence/responsible-use-of-ai (accessed on 29 January 2026).
  24. Roussille, N. (2024). The role of the ask gap in gender pay inequality. The Quarterly Journal of Economics, 139(3), 1557–1610. [Google Scholar] [CrossRef]
  25. Yiu, C. Y., & Cheung, K. S. (2025). Enhancing explainable ai land valuations reporting for consistency, objectivity, and transparency. Land, 14(5), 927. [Google Scholar] [CrossRef]
Figure 1. Signed and absolute paired discrepancies of (AVMTP) − (AVMCV) and |AVMTP| − |AVMCV|. × and the figure above refers to the mean of the difference.
Figure 1. Signed and absolute paired discrepancies of (AVMTP) − (AVMCV) and |AVMTP| − |AVMCV|. × and the figure above refers to the mean of the difference.
Jrfm 19 00171 g001
Table 1. Descriptive statistics of variables (Subsample 1).
Table 1. Descriptive statistics of variables (Subsample 1).
VariableDescriptionCountMeanSDMin.Max.
A V M i T P i Differences between AVM estimate and transaction price (TP) of house i, in NZD8692−3979.5205,060.0−9,297,0003,921,644
A V M i C V i Differences between AVM estimate and capital value (CV) of house i, in NZD8692−15,525.1133,932.8−2,392,0003,736,644
A V M i T P i A V M i C V i 869211,545.6221,805.3−9,480,0001,850,000
( A V M i T P i ) / T P i Percent differences between AVM estimates and transaction price of house i86920.0450.6330.90335.116
( A V M i C V i ) / C V i Percent differences between AVM estimates and capital value of house i86920.0120.3220.76724.712
A V M i T P i Absolute differences between AVM estimate and transaction price (TP) of house i, in NZD869289,305.2184,632.309,297,000
A V M i C V i Absolute differences between AVM estimate and capital value (CV) of house i, in NZD869273,850.3112,803.203,736,644
A V M i T P i A V M i C V i 869215,454.9171,110.7−1,145,4359,114,000
| A V M i T P i T P i | Absolute percent differences between AVM estimates and transaction price of house i86920.1330.620035.116
| A V M i C V i C V i | Absolute percent differences between AVM estimates and capital value of house i86920.0890.309024.712
G e n d e r i Buyer’s gender of house iDummy variables
F i Female buyer1159 01
M i Male buyer1075 01
M i x i Mixed-gender buyers4720 01
A l l _ F i All-female buyers 697 01
A l l _ M i All-male buyers1041 01
Table 2. Paired t-tests on within-transaction differences (Subsample 1).
Table 2. Paired t-tests on within-transaction differences (Subsample 1).
VariableCountMeanSD
Panel A—Signed Differences
( A V M i T P i ) ( A V M i C V i ) 869211,545.6221,805.3
MethodNull HypothesisValueProb.
t-testMeans are equal4.850.000
Panel B—Absolute Differences
| A V M i T P i | | A V M i C V i | 869215,454.9171,110.7
MethodNull HypothesisValueProb.
t-testMeans are equal8.420.000
Notes: Panel A tests whether the mean of the signed within-transaction difference (AVMTP) − (AVMCV) equals zero. Panel B evaluates differences in error magnitude by testing whether the mean of the within-transaction absolute difference |AVMTP| − |AVMCV| equals zero.
Table 3. Regression model results (Subsample 2).
Table 3. Regression model results (Subsample 2).
Dependent VariableModel 1 Equation (1)Model 2 Equation (2)Model 3 Equation (3)
A V M i T P i T P i A V M i C V i C V i A V M i T P i T P i A V M i C V i C V i
Coeff.
(t-stat)
Coeff.
(t-stat)
Coeff.
(t-stat)
Male Single Buyers, M i 0.09 ***
(3.157)
0.03 ***
(3.489)
0.06 **
(2.012)
Female Single Buyers, F i 0.07 **
(2.558)
0.003
(0.326)
0.07 **
(2.40)
No. of Co-Buyers, N i 0.07 ***
(3.766)
0.007
(1.192)
0.07 ***
(3.310)
No. of Bedrooms, B R i −0.04 ***
(−5.422)
−0.015 ***
(−6.998)
−0.02 ***
(−3.148)
Living Area, A R E A i 0.0004 ***
(2.971)
3.68 × 10−5
(0.914)
0.0003 ***
(2.620)
Land Size, L A N D i 8.85 × 10−7 **
(2.462)
7.50 × 10−8
(0.664)
8.10 × 10−7 **
(2.200)
Year Built, Y B i −3.03 × 10−5 ***
(−2.757)
−1.60 × 10−5 ***
(−4.639)
−1.43 × 10−5
(−1.269)
Constant, α 0.09 *
(1.906)
0.137 ***
(9.116)
−0.046
(−0.934)
Fixed Tenure EffectsYes—Freehold Properties OnlyYes—Freehold Properties OnlyYes—Freehold Properties Only
No. of Observations695469546954
Adj. R-Squared0.0070.0170.003
Notes: ***, ** and * represent statistical significance at the 1%, 5% and 10% levels, respectively. Buyers who are institutions, such as companies and trusts, were excluded from this test.
Table 4. Robustness test results (full sample).
Table 4. Robustness test results (full sample).
Dependent VariableModel 4 Model 5 Model 6
A V M i T P i T P i A V M i C V i C V i A V M i T P i T P i A V M i C V i C V i
Coeff.
(t-stat)
Coeff.
(t-stat)
Coeff.
(t-stat)
No. of Co-Buyers, N i 0.02 *
(1.832)
−0.003
(−0.481)
0.03 *
(1.912)
Company Buyers, C o m i 0.16 ***
(4.432)
0.03
(1.6111)
0.14 ***
(3.486)
N i × C o m i −0.05 ***
(−2.731)
−0.01
(−1.351)
−0.04 **
(−2.001)
No. of Bedrooms, B R i −0.07 ***
(−13.420)
−0.01 ***
(−5.512)
−0.06 ***
(−10.296)
Living Area, A R E A i 0.001 ***
(22.090)
7.73 × 10−5 ***
(3.107)
0.001 ***
(19.394)
Land Size, L A N D i 2.38 × 10−7
(1.025)
2.70 × 10−8
(0.265)
2.11 × 10−7
(0.850)
Year Built, Y B i −6.61 × 10−5 ***
(−7.541)
−2.25 × 10−5 ***
(−5.844)
−4.36 × 10−5 ***
(−4.658)
Constant, α 0.28 ***
(9.491)
0.16 ***
(12.651)
0.12 ***
(3.691)
Fixed Tenure EffectsYes—Freehold Properties OnlyYes—Freehold Properties OnlyYes—Freehold Properties Only
No. of Observations10,82510,82510,825
Adj. R-Squared0.0540.0070.039
***, ** and * represent statistical significance at the 1%, 5% and 10% levels, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yiu, C.Y.; Cheung, K.S. Why Market Prices May Not Be the Best Benchmark for Automated Valuation Models: Empirical Evidence of Ex Ante Unobservability of Gender-Associated Price Discrepancy in the Auckland House Market. J. Risk Financial Manag. 2026, 19, 171. https://doi.org/10.3390/jrfm19030171

AMA Style

Yiu CY, Cheung KS. Why Market Prices May Not Be the Best Benchmark for Automated Valuation Models: Empirical Evidence of Ex Ante Unobservability of Gender-Associated Price Discrepancy in the Auckland House Market. Journal of Risk and Financial Management. 2026; 19(3):171. https://doi.org/10.3390/jrfm19030171

Chicago/Turabian Style

Yiu, Chung Yim, and Ka Shing Cheung. 2026. "Why Market Prices May Not Be the Best Benchmark for Automated Valuation Models: Empirical Evidence of Ex Ante Unobservability of Gender-Associated Price Discrepancy in the Auckland House Market" Journal of Risk and Financial Management 19, no. 3: 171. https://doi.org/10.3390/jrfm19030171

APA Style

Yiu, C. Y., & Cheung, K. S. (2026). Why Market Prices May Not Be the Best Benchmark for Automated Valuation Models: Empirical Evidence of Ex Ante Unobservability of Gender-Associated Price Discrepancy in the Auckland House Market. Journal of Risk and Financial Management, 19(3), 171. https://doi.org/10.3390/jrfm19030171

Article Metrics

Back to TopTop