Next Article in Journal
A Novel Computational Model Enabling Continuous Differentiability in Neural Network Quantization
Previous Article in Journal
Breast Cancer Diagnosis Method Based on Phase Congruency and Dual-Branch Feature Modeling
Previous Article in Special Issue
The Use of Digital Tools by Occupational Health and Safety (OHS) Specialists in the Polish Construction Sector
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms

by
Elżbieta Radziszewska-Zielina
1,
Marcin Waga
2,* and
Bartłomiej Sroka
1
1
Faculty of Civil Engineering, Cracow University of Technology, 31-155 Kraków, Poland
2
Doctoral School, Cracow University of Technology, 31-155 Kraków, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(11), 5274; https://doi.org/10.3390/app16115274
Submission received: 17 March 2026 / Revised: 5 May 2026 / Accepted: 7 May 2026 / Published: 25 May 2026

Abstract

In the decision-making process related to the purchase of land properties intended for construction investments, the Technical Due Diligence (TDD) process plays a key role. In accordance with current market practice, this process precedes both land acquisition and the commencement of a construction investment. Within this process, the feasibility of the planned investment is evaluated. This article analyzes the impact of selected factors affecting the implementation of a future construction investment on the decision-making process regarding the purchase of land properties. To support the decision-making process, the most widely used machine learning algorithms were applied and compared, including Decision Trees, Random Forests, the k-Nearest Neighbors’ method, Support Vector Machines, and Artificial Neural Networks (ANNs). The analysis demonstrated that the highest accuracy, precision, and recall (ACC, PPV, and REC indicators) in making correct purchase decisions were achieved using the ANNs algorithm. Additionally, it should be noted that decision trees are characterized by high interpretability of results, which distinguishes them from other methods. Machine learning methods may be used to develop a system supporting investment decisions related to the purchase of land properties for future construction projects; however, it should be remembered that the final decision will always be made by the investor based on their subjective assessment.

1. Introduction

The term Due Diligence [1] originated in the 1930s in the United States, where it required sellers of financial instruments to disclose information mandated by local regulations. Today, according to the American Dictionary of Legal Terms [2], it refers to an analysis conducted by companies prior to making business decisions, particularly in areas such as mergers and acquisitions or the buying and selling of significant assets. Based on article [3], the primary objective of a due diligence process is to mitigate risk by protecting the buyer from unforeseen costs arising from issues discovered only after the transaction has been completed, when reversing it is no longer feasible.
In accordance with market practice, the purchase of a building plot and the commencement of a construction project are preceded by the preparation of a Technical Due Diligence (TDD) report. The preparation of the report involves addressing a classification problem [4], which entails verifying whether a planned building can be constructed on a given plot in compliance with current regulations and the investor’s requirements. This report is essential for the effective execution of the investment, contributing to the further development of the construction market in Poland, both for investors and construction companies [5]. Figure 1 illustrates the main phases of the TDD process.
During the Technical Due Diligence (TDD) process, two reports are prepared. The preliminary report provides an assessment of legal, technical, environmental, social, and economic constraints that may make the realization of the proposed investment impossible, while the final report is subsequently issued upon completion of the entire analysis.
This article focuses on the TDD process, concentrating primarily on technical issues related to the land property [6]. In line with the RICS (Royal Institution of Chartered Surveyors) [7], the purpose of TDD is to identify physical defects or instances of non-compliance with local regulations prior to the sale of a property, as these may influence its value.
In scientific literature, several methodological approaches to the preparation of the Technical Due Diligence (TDD) process can be identified, including the Analytic Hierarchy Process (AHP), machine learning techniques, expert interviews, document analysis, and land inspections [8]. However, the authors did not find any comparative analysis of machine learning methods that could be applied to the development of a decision-support system for investors regarding the purchase of land property. The lack of dedicated decision-support tools for investors acquiring land for construction projects, combined with the high complexity and risk of this process, justifies the application of machine learning methods to improve the quality of decision-making. Compared to the standard approach, a decision-support system based on machine learning could define a consistent methodology for the preparation of TDD reports and may serve as a tool supporting investment decisions.
The preparation of a TDD report can be reduced to solving a binary classification task [9], in which the input consists of factors affecting the future investment, treated as independent variables, and the output is a recommendation concerning the potential purchase of a given property, which is the dependent variable.
One of the most popular approaches to solving classification tasks is the use of methods based on machine learning algorithms [10]. The aim of this article is to test machine learning models and identify the optimal algorithm for building a classifier that would form the basis of a decision-support system for investors regarding the potential purchase of land property. Figure 2 presents the scheme of the investor decision-support system.

2. Data Set

For the development of the machine learning models, a database consisting of 100 projects was used. The dataset was divided into training (60 projects), validation (20 projects), and test (20 projects) sets. The division into validation and test sets was applied to reliably assess the performance of the individual models and to reduce the risk of overfitting. Each project included in the database was described using 25 factors grouped into five categories: legal, technical, environmental, social, and economic. The database was prepared based on the authors’ professional experience and expert interviews. Based on their own analysis, the authors selected the factors that have the most significant impact on the purchase decision, following the principle that each group should be represented by five factors. All factors used to describe the projects are presented in Table 1.
Based on the above-described database, classifiers supporting investor decision-making were developed. To evaluate the performance of the individual models, cross-validation [11], the confusion matrix [12], and the metrics ACC, PPV, REC, and F1 were used, as described in Table 2, Table 3 and Table 4.
In the case of the database described in this article, the value of k is equal to 4, which means that the dataset was divided into four validation–training folds, each consisting of 20 projects. The remaining group of 20 projects constitutes the test set. The confusion matrix is constructed in accordance with the following formulas:

3. Machine Learning Models

3.1. Decision Trees

One of the most widely used methods for solving classification problems is the application of decision trees [13], which aims to partition data into the most homogeneous possible groups with respect to the dependent variable [14]. A decision tree model splits the training data by asking successive questions about various factors [15]. Each question represents a node in the tree, and the answers lead to subsequent branches. At the end of each branch, there is a decision—e.g., to which class the object belongs. Decision trees are intuitive, easy to interpret, and can be easily presented in a graphical form; however, they are prone to overfitting.
Based on the CART algorithm [16] and a training dataset consisting of 80 cases, a decision tree was generated (shown in Figure 3), which can be used to support investor decision-making. An additional benefit of using the decision tree algorithm is the identification of 13 factors included in the decision tree out of 25, which are key in making a positive purchase decision regarding land property. These identified factors can be analyzed at the preliminary report phase of the TDD process, which allows the TDD process to be carried out more optimally. Additionally, the algorithm highlights factors such as plot price and planning decisions, whose fulfillment is essential for making a positive purchase decision and whose verification should be carried out at the beginning of preparing the preliminary report. The confusion matrix of the decision tree method for the test dataset is presented in Table 5.

3.2. Random Forests

A random forest [17] is a collection of multiple decision trees. Each tree is trained on a slightly different random subset of data and factors. The prediction of the random forest is based on majority voting, where each tree makes an individual prediction, which is then aggregated, and the final prediction is determined by majority vote. For the purposes of calculations in this article, based on tests performed on the validation data, an optimal number of 200 decision trees was assumed.
A single tree is prone to errors, but multiple decision trees reduce overfitting and provide a more stable and accurate result. Random forests are highly effective, resistant to overfitting, and automatically detect the importance of factors; however, they are difficult to interpret. The confusion matrix of the random forest method for the test dataset is presented in Table 6.

3.3. Nearest Neighbors (k-Nearest Neighbors)

KNN (the k-Nearest Neighbors algorithm) [18] is a very simple model: when a new data point appears, the k nearest points from the training dataset are identified, and on this basis a positive or negative decision is made (majority voting). For the purposes of the calculations in this paper, based on tests performed on the validation dataset, the optimal value of k = 3 was adopted. KNN is a simple and intuitive method that does not require model training. Unfortunately, the model is sensitive to irrelevant features. The confusion matrix for the k-Nearest Neighbors Model applied to the test dataset is presented in Table 7.

3.4. Support Vector Machine (SVM)

SVM (Support Vector Machine) [19] identifies a ‘boundary’ (hyperplane) that best separates the classes in the dataset. ‘Best’ means that it maximizes the margin, i.e., the distance to the nearest points from each class (the so-called support vectors). For non-linear data, kernel functions are applied to ‘map’ the data into higher-dimensional spaces, where separation becomes easier. The method is characterized by high effectiveness when dealing with data containing many features and is resistant to overfitting; however, it is difficult to interpret. The confusion matrix of the Support Vector Machine method for the test dataset is presented in Table 8.

3.5. Artificial Neural Networks

ANNs (Artificial Neural Networks) [20] are computational models inspired by the functioning of the human brain, used in machine learning and artificial intelligence. ANNs learn complex non-linear relationships and are characterized by high effectiveness; however, they are difficult to interpret and require large amounts of data. For the purposes of this study, an MLP [21] (Multi-Layer Perceptron) network was applied, consisting of one hidden layer with four neurons and the ReLU activation function. The confusion matrix of the ANNs method for the test dataset is presented in Table 9.

3.6. Summary of Results

Table 10 presents the results obtained for all tested machine learning models on the validation and test datasets.

4. Discussion

The literature includes numerous studies describing decision-support systems across various fields (e.g., decisions of an economic nature [22], decisions made within construction companies [23], and the selection of an optimal subcontractor [24]); however, there is no system specifically dedicated to the purchase of land properties intended for construction investment.
Based on the conducted research, a methodology for creating an investment decision-support system using machine learning methods was proposed. The most widely recognized machine learning techniques were applied to construct the investor decision-support system, including Decision Trees, Random Forests, k-Nearest Neighbors classifiers, Support Vector Machines, and Artificial Neural Networks. The models’ performance was evaluated using validation and test datasets, for which confusion matrices were generated and performance metrics such as ACC (accuracy), PPV (precision), and REC (recall) were calculated.
To mitigate the risk of overfitting, several strategies were applied. The dataset was divided into training, validation, and test subsets, enabling independent model tuning and evaluation. Additionally, k-fold cross-validation (k = 4) was implemented to ensure a more robust assessment of model performance and to reduce dependence on a single data split.
Model complexity was intentionally limited, particularly in the case of the ANN, which was designed with a simple architecture (one hidden layer with four neurons) to match the size of the dataset. Furthermore, model performance was evaluated using multiple metrics (ACC, PPV, REC, and F1) across validation and test datasets, allowing for a comprehensive assessment of generalization ability.
Finally, it should be emphasized that the decision-making process is relatively complex. The twenty-five factors used to describe the project do not exhaust the full range of factors that investors must analyze before making a purchase decision, which constitutes a limitation of the proposed models. However, once the key decision-making factors are identified in cooperation with investors, the developed models may serve as a tool supporting investment decisions.
As part of future research, although the Authors are aware of the challenges associated with collecting appropriate data, it is planned to increase the number of analyzed cases and consider factors in order to improve the system’s accuracy and reduce the number of Type I and Type II errors.

5. Conclusions

Decision-making regarding the purchase of land property for the purpose of construction investment is a complex transaction that, in accordance with market practice, requires conducting a Technical Due Diligence (TDD) process, which is influenced by numerous factors belonging to various fields. The main contribution of this study was to demonstrate that artificial intelligence-based machine learning models can be applied to develop a decision-support model for investors in the process of land acquisition. Additionally, based on the Decision Tree model (built using the CART algorithm), it is possible to identify key factors whose fulfillment is necessary to make a positive purchase decision. Factors such as the price of the land property and planning decisions were identified by the algorithm as essential for making a positive purchase decision.
Furthermore, factors such as the attitude of immediate neighbors, deficiencies in formal documentation related to existing buildings, heritage protection status of the site, land contamination, mining damage, access conditions to public roads, rental market trends, and overall economic conditions were identified as having a significant impact on the decision-making process. Simple method Decision Trees offer full interpretability, allowing investors to understand the model’s decisions, but at the cost of lower accuracy, especially on the test dataset. More complex methods, such as ANNs, provide higher effectiveness, but with limited interpretability.
The following machine learning methods were tested in the study: Decision Trees, Random Forests, k-Nearest Neighbors, Support Vector Machines, and Artificial Neural Networks (ANNs). The highest model accuracy (ACC) for the test dataset was achieved by ANNs at 80%, while for the validation dataset the highest value (79%) was obtained by Random Forests. Additionally, it should be noted that the highest precision (PPV), which determines the magnitude of Type I error, was achieved by the ANN model on the test dataset. This metric is particularly important because it indicates the number of actual negative decisions classified by the model as positive, which may lead to concluding an unfavorable transaction and incurring significant financial losses for the investor.
Regarding recall (REC), which reflects the number of Type II errors (actual positive decisions classified as negative), the highest value was also achieved by ANNs on the test dataset. The F1 score, being the harmonic means of precision and recall, also reached the highest value for the ANN model.
ANNs proved to be the most effective method in the context of supporting investment decision-making in the studied database, due to the model’s ability to capture nonlinear relationships among decision factors. Other methods, such as Decision Trees, Random Forests, SVM, and KNN, showed lower accuracy on the test dataset, which can be attributed, among other things, to their limited ability to capture complex relationships in a relatively small dataset. These methods were more sensitive to the limited number of cases and the uneven distribution of decision-influencing factors, resulting in lower classification performance.
The results obtained in this study are specific to the problem analyzed and may not fully reflect the performance of these methods in other classification tasks. The results depend on the nature of the data, the number of available samples, the number and type of factors, and the degree of nonlinearity in relationships. Therefore, conclusions regarding the superiority of ANNs should be considered specific to the studied dataset and problem, and any generalization should be preceded by testing on larger and more diverse datasets.
For the ANN, the difference between test accuracy (0.80) and validation accuracy (0.71) has been observed. The test set (20 projects) may have been less complex or more homogeneous than the cross-validation folds, which could have resulted in higher performance on this subset.
The observed 9-percentage-point gap may indicate that the model’s performance on unseen data is potentially overestimated. To address this concern, we have emphasized the need for further validation on larger and more diverse datasets in future work.
Due to the small dataset (80 training projects and 20 test projects), all models had low computational requirements and short training times. ANNs required the most computational resources, although with such a small sample the difference in time was minimal. Decision Trees and Random Forests had the shortest prediction times, which may be important for larger datasets.
Decision trees offer a transparent and interpretable structure, allowing investors to directly follow decision rules and understand how specific input factors influence the outcome. This makes them especially useful in situations where explainability and traceability of decisions are required.
ANNs, while often achieving higher predictive accuracy, operate as black-box models. Their output should therefore be interpreted primarily as classification predictions, without direct insight into the internal decision logic. For practical use, this implies that ANNs are more suitable in contexts where predictive performance is prioritized over interpretability.
Decision-support systems for land acquisition may assist investors in making more reliable decisions. However, it should be emphasized that the final decision regarding the purchase of land property is ultimately made by the investor based on their subjective assessment of the factors influencing the future investment.

Author Contributions

Conceptualization, M.W.; methodology, E.R.-Z.; software, M.W.; validation, B.S.; formal analysis, M.W.; resources, M.W.; writing—original draft preparation, M.W.; supervision, E.R.-Z. and B.S.; funding acquisition, E.R.-Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
TDDTechnical Due Diligence
ANNsArtificial Neural Networks
RICSRoyal Institution of Chartered Surveyors
AHPAnalytic Hierarchy Process
CARTClassification and Regression Trees
KNNk-Nearest Neighbors
SVMSupport Vector Machine
MLPMulti-Layer Perceptron

References

  1. Sanz-Prieto, I.; de-la-fuente-Valentín, L.; Ríos-Aguilar, S. Technical Due Diligence as a Methodology for Assessing Risks in Start-up Ecosystems: An Advanced Approach. Inf. Process. Manag. 2021, 58, 102617. [Google Scholar] [CrossRef]
  2. Due Diligence Law and Legal Definition|USLegal, Inc. Available online: https://definitions.uslegal.com/d/due-diligence/#google_vignette (accessed on 27 January 2024).
  3. Kutera, B.; Anysz, H. The Methodology of Technical Due Diligence Report Preparation for an Office, Residential and Industrial Buildings. In Proceedings of the MATEC Web of Conferences, Hong Kong, China, 26–27 April 2016; Volume 86. [Google Scholar]
  4. Rojek, I.; Burduk, R.; Heda, P. Ensemble Selection in One-versus-One Scheme—Case Study for Cutting Tools Classification. Bull. Pol. Acad. Sci. Tech. Sci. 2021, 69, 136044. [Google Scholar] [CrossRef]
  5. Radziszewska-Zielina, E. Analysis of the Impact of the Level of Partnering Relations on the Selected Indexes of Success of Polish Construction Enterprises. Eng. Econ. 2010, 21, 324–335. [Google Scholar] [CrossRef]
  6. Reich, S. Technical Due Diligence; Springer: Cham, Switzerland, 2018; Volume Part F614. [Google Scholar]
  7. New Guidance Note: Building Surveys and Technical Due Diligence of Commercial Property. Struct. Surv. 2011, 29, 39–41. [CrossRef]
  8. Waga, M.; Radziszewska-Zielina, E.; Sroka, B. Review of Methods for Preparing Technical Due Diligence Reports for the Purchase of Commercial Real Estate. Przegląd Bud. 2025, 96, 102–106. [Google Scholar] [CrossRef]
  9. Surma, J. Business Intelligence: Making Decisions Through Data Analytics; Pearson: Harlow, UK, 2011. [Google Scholar]
  10. Osisanwo, F.Y.; Akinsola, J.E.T.; Awodele, O.; Hinmikaiye, J.O.; Olakanmi, O.; Akinjobi, J. Supervised Machine Learning Algorithms: Classification and Comparison. Int. J. Comput. Trends Technol. 2017, 48, 128–138. [Google Scholar] [CrossRef]
  11. Jonathan, O.; Omoregbe, N.; Misra, S. Empirical Comparison of Cross-Validation and Test Data on Internet Traffic Classification Methods. J. Phys. Conf. Ser. 2019, 1299, 012044. [Google Scholar] [CrossRef]
  12. Bhandari, A. Understanding & Interpreting Confusion Matrix in Machine Learning (Updated 2024); Analytics Vidhya: Gurugram, India, 2024. [Google Scholar]
  13. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
  14. Fattah, A.; Fouad, M.M.; Philip, S.Y.; Gharib, T.F. A Decision Tree Classification Model for University Admission System. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 031003. [Google Scholar] [CrossRef]
  15. Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  16. Yao, Y.; Li, J.; Zhang, X.; Duan, P.; Li, S.; Xu, Q. Investigation on the Expansion of Urban Construction Land Use Based on the CART-CA Model. ISPRS Int. J. Geoinf. 2017, 6, 149. [Google Scholar] [CrossRef]
  17. Ali, M.R.; Nipu, S.M.A.; Khan, S.A. A Decision Support System for Classifying Supplier Selection Criteria Using Machine Learning and Random Forest Approach. Decis. Anal. J. 2023, 7, 100238. [Google Scholar] [CrossRef]
  18. Suyal, M.; Goyal, P. A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms Based on Supervised Learning. Int. J. Eng. Trends Technol. 2022, 70, 43–48. [Google Scholar] [CrossRef]
  19. Gandhi, R. Support Vector Machine—Introduction to Machine Learning Algorithms. Towards Data Sci. 2018. Available online: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 (accessed on 6 May 2026).
  20. Qamar, R.; Zardari, B.A. Artificial Neural Networks: An Overview. Mesopotamian J. Comput. Sci. 2023, 2023, 130–139. [Google Scholar] [CrossRef] [PubMed]
  21. Molina, E.; Parraga-Alava, J. Artificial Neural Networks for Classification Tasks: A Systematic Literature Review. Enfoque UTE 2024, 15, 1058. [Google Scholar] [CrossRef]
  22. Liu, B.; Sun, Z. Global Economic Market Forecast and Decision System for IoT and Machine Learning. Mob. Inf. Syst. 2022, 2022, 8344791. [Google Scholar] [CrossRef]
  23. Radziszewska-Zielina, E. Fuzzy Control of Partnering Relations of a Construction Enterprise. J. Civ. Eng. Manag. 2011, 17, 5–15. [Google Scholar] [CrossRef]
  24. Radziszewska-Zielina, E. The Application of Multi-Criteria Analysis in the Evaluation of Partering Relations and the Selection of a Construction Company for the Purposes of Cooperation. Arch. Civ. Eng. 2016, 62, 167–182. [Google Scholar] [CrossRef]
Figure 1. Phases of the TDD process implementation for a construction plot.
Figure 1. Phases of the TDD process implementation for a construction plot.
Applsci 16 05274 g001
Figure 2. Operation scheme of the investor decision-support system.
Figure 2. Operation scheme of the investor decision-support system.
Applsci 16 05274 g002
Figure 3. Decision tree generated using the CART algorithm for 80 projects.
Figure 3. Decision tree generated using the CART algorithm for 80 projects.
Applsci 16 05274 g003
Table 1. List of factors characterizing the projects.
Table 1. List of factors characterizing the projects.
Group of
Factors.
FactorsDescriptionFactor
Symbol
TechnicalSoil conditions
  • The building’s foundation may be classified as geotechnical category I or II—favorable.
  • The building’s foundation may be classified as geotechnical category III—unfavorable.
X1
Connection conditions to public roads
  • The costs related to connection to public roads are accepted by the investor—favorable.
  • The costs related to connection to public roads are not accepted by the investor—unfavorable.
X2
Utility connection requirements
  • The costs of implementing all utility connections are accepted by the investor—favorable.
  • The costs of implementing all utility connections are not accepted by the investor—unfavorable.
X3
Public transport
  • Availability of public transport—enabling employees to commute to work—favorable.
  • Insufficient public transport access for employees—unfavorable.
X4
Mining damages
  • The building will be founded in mining damage category I–II—favorable.
  • The building will be founded in mining damage category III–V—unfavorable.
X5
EnvironmentalGreenery inventory
  • The study does not identify any trees that could be classified as natural monuments—favorable.
  • The inventory includes trees that may be regarded by authorities as natural monuments, and permission for their removal may be denied, which could restrict the buildable area—unfavorable.
X6
Site contamination
  • There is no evidence of soil or groundwater contamination on the building plot—favorable.
  • The site is contaminated and necessitates remediation works—unfavorable.
X7
Carbon footprint
  • The investor is not obligated to reduce the carbon footprint during the project—favorable.
  • The investor is obligated not to exceed the carbon footprint limit per square meter of the future building’s usable floor area—unfavorable.
X8
Flood risk
  • The project site is not within flood risk zones—favorable.
  • Location in flood risk areas—unfavorable.
X9
Environmental certification
  • Project complies with preliminary assessment requirements without extra costs—favorable.
  • Project needs extra funding to comply with certification requirements—unfavorable.
X10
LegalDevelopment conditions decision
  • Planning decisions or an extract from the local zoning plan permit development consistent with the investor’s expectations—favorable.
  • Planning decisions or extracts from the local zoning plan do not permit development consistent with the investor’s expectations—unfavorable.
X11
Conservation protection
  • The building plot is not situated within a heritage protection area—favorable.
  • The building plot is located within a heritage protection area—unfavorable.
X12
Legal status of adjacent land parcels
  • The legal status of adjacent plots is identified—favorable.
  • The legal status of adjacent plots is unclear (ownership of the plots is not clearly established)—unfavorable.
X13
Need to obtain deviations from the building regulations
  • No need to obtain deviations from technical conditions—favorable.
  • Need to apply for deviations from technical conditions—unfavorable.
X14
Lack of formal documents
  • Formal documents exist for the existing buildings on the plot—favorable.
  • No formal documents exist for the existing buildings on the plot—unfavorable.
X15
EconomicPurchase price of the plot
  • The plot price is acceptable to the Investor—favorable.
  • The plot price is not acceptable to the Investor—unfavorable.
X16
Trend in the construction market
  • Trends in the construction market are favorable for investors—favorable (the number of construction investments is decreasing compared to the previous year).
  • Trends in the construction market are unfavorable—unfavorable (the number of construction investments is increasing compared to the previous year).
X17
Global economic situation
  • The global economic situation is favorable.
  • The global economic situation is unfavorable.
X18
Building size in relation to the leasable area
  • The building size is optimal in terms of construction costs—favorable.
  • The building size is not optimal in terms of construction costs—unfavorable.
X19
Trends in the rental market
  • Trends in the rental market are favorable (demand for rental space is increasing compared to the previous year).
  • Trends in the rental market are unfavorable (demand for rental space is decreasing compared to the previous year).
X20
SocialAttitude of the immediate neighbors toward the project
  • Neighbors’ attitude is favorable—favorable (issued administrative decisions were not protested).
  • Neighbors’ attitude is unfavorable—unfavorable (issued administrative decisions were protested by the neighbors).
X21
Attitude of the local community
  • The attitude of the local community is favorable (local associations did not protest the issued administrative decisions).
  • The attitude of the local community is unfavorable (local associations protested the issued administrative decisions).
X22
Attitude of the city authorities
  • The city authorities’ attitude is favorable for the investment.
  • The city authorities’ attitude is unfavorable for the investment.
X23
Social benefits for the city
  • The implementation of the investment is socially favorable.
  • The implementation of the investment is socially unfavorable.
X24
Aesthetic benefits for the city
  • The investment will contribute aesthetic value to the urban fabric—favorable.
  • The investment will not provide aesthetic benefits to the built environment—unfavorable.
X25
Table 2. Cross-validation. Definition of the confusion matrix and the ACC, PPV, REC, and F1 metrics (Table 3 and Table 4).
Table 2. Cross-validation. Definition of the confusion matrix and the ACC, PPV, REC, and F1 metrics (Table 3 and Table 4).
IterationTraining DataValidation Data Confusion Matrix
1 iteration xApplsci 16 05274 i001Confusion matrix no 1ACC, PPV, REC, F1
2 iterations x Applsci 16 05274 i002Confusion matrix no 2ACC, PPV, REC, F1
3 iterations x Applsci 16 05274 i003Confusion matrix no 3ACC, PPV, REC, F1
4 iterations x Applsci 16 05274 i004Confusion matrix no 4ACC, PPV, REC, F1
k iterationsx Applsci 16 05274 i005Confusion matrix no kACC, PPV, REC, F1
Training dataTraining and validation dataApplsci 16 05274 i006 ACC, PPV, REC, F1
Table 3. Confusion matrix.
Table 3. Confusion matrix.
Confusion Matrix Construction.
Predicted positive classPredicted negative classMeaning:
  • TP (True Positive)—the model correctly identified a positive case.
  • TN (True Negative)—the model correctly identified a negative case.
  • FP (False Positive)—the model incorrectly predicted a positive class; the number of negative cases wrongly classified as belonging to the positive class. These are Type I errors.
  • FN (False Negative)—the model failed to detect a positive class that actually occurred; the number of positive cases incorrectly classified as belonging to the negative class. FN errors are called Type II errors.
Actual positive classTrue Positive (TP)—correct classificationFalse Negative (FN)—incorrect classification
Actual negative classFalse Positive (FP)—incorrect classificationTrue Negative (TN)—correct classification
Table 4. Definitions of the ACC, PPV, REC, and F1 metrics.
Table 4. Definitions of the ACC, PPV, REC, and F1 metrics.
SymbolDescriptionFormulaFormula for the Validation Dataset
ACCAccuracy T P + T N ) / ( T P + T N + F P + F N m e a n   A C C = 1 k A C C k
PPVPrecision T P / ( T P + F P ) m e a n   P R C = 1 k P P V k
RECRecall/Sensitivity T P / ( T P + F N ) m e a n   P E C = 1 k P E C k
F1F1-score (harmonic mean of precision and recall) 2 x P P V x R E C / ( P P V + R E C ) m e a n   F 1 = 1 k F 1 k
Table 5. Confusion matrix for the Decision Tree model—test dataset.
Table 5. Confusion matrix for the Decision Tree model—test dataset.
Model Prediction—Decision Tree.
Actual Value
PositiveNegative
Positive41
Negative69
Table 6. Confusion matrix for the Random Forest model—test dataset.
Table 6. Confusion matrix for the Random Forest model—test dataset.
Model Prediction—Random Forests.
Actual Value
PositiveNegative
Positive62
Negative48
Table 7. Confusion matrix for the k-Nearest Neighbors method—test data.
Table 7. Confusion matrix for the k-Nearest Neighbors method—test data.
k-Nearest Neighbors Model Prediction
Actual Value PositiveNegative
Positive40
Negative610
Table 8. Confusion matrix for the Support Vector Machine—test data.
Table 8. Confusion matrix for the Support Vector Machine—test data.
Model Prediction—SVM
Actual Value PositiveNegative
Positive51
Negative59
Table 9. Confusion matrix for ANNs—test data.
Table 9. Confusion matrix for ANNs—test data.
Model Prediction—ANNs
Actual Value PositiveNegative
Positive71
Negative39
Table 10. ACC, PPV, REC, and F1 values for individual machine learning methods on the validation and test datasets.
Table 10. ACC, PPV, REC, and F1 values for individual machine learning methods on the validation and test datasets.
Validation Data
Decision TreesRandom Forestsk-Nearest Neighbors MethodSupport Vector Machine MethodArtificial Neural Networks
Mean ACC (accuracy)0.740.790.560.680.71
Mean PPV (precision)0.770.750.430.670.70
Mean REC (sensitivity)0.730.830.590.680.72
Mean F1 (harmonic mean of PPV and REC)0.750.780.500.670.71
Test Data
Decision TreesRandom Forestsk-Nearest Neighbors MethodSupport Vector Machine MethodArtificial Neural Networks
ACC (accuracy)0.650.700.700.700.80
PPV (precision)0.400.600.400.500.70
REC (sensitivity)0.800.7510.830.88
Mean F1 (harmonic mean of PPV and REC)0.530.670.570.630.78
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Radziszewska-Zielina, E.; Waga, M.; Sroka, B. Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Appl. Sci. 2026, 16, 5274. https://doi.org/10.3390/app16115274

AMA Style

Radziszewska-Zielina E, Waga M, Sroka B. Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Applied Sciences. 2026; 16(11):5274. https://doi.org/10.3390/app16115274

Chicago/Turabian Style

Radziszewska-Zielina, Elżbieta, Marcin Waga, and Bartłomiej Sroka. 2026. "Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms" Applied Sciences 16, no. 11: 5274. https://doi.org/10.3390/app16115274

APA Style

Radziszewska-Zielina, E., Waga, M., & Sroka, B. (2026). Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Applied Sciences, 16(11), 5274. https://doi.org/10.3390/app16115274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop