Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms

Radziszewska-Zielina, Elżbieta; Waga, Marcin; Sroka, Bartłomiej

doi:10.3390/app16115274

Open AccessArticle

Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms

by

Elżbieta Radziszewska-Zielina

¹

,

Marcin Waga

^2,*

and

Bartłomiej Sroka

¹

Faculty of Civil Engineering, Cracow University of Technology, 31-155 Kraków, Poland

²

Doctoral School, Cracow University of Technology, 31-155 Kraków, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5274; https://doi.org/10.3390/app16115274

Submission received: 17 March 2026 / Revised: 5 May 2026 / Accepted: 7 May 2026 / Published: 25 May 2026

(This article belongs to the Special Issue Current Technological, Methodological, and Organizational Research Trends in the Construction Industry, Third Edition)

Download

Browse Figures

Versions Notes

Abstract

In the decision-making process related to the purchase of land properties intended for construction investments, the Technical Due Diligence (TDD) process plays a key role. In accordance with current market practice, this process precedes both land acquisition and the commencement of a construction investment. Within this process, the feasibility of the planned investment is evaluated. This article analyzes the impact of selected factors affecting the implementation of a future construction investment on the decision-making process regarding the purchase of land properties. To support the decision-making process, the most widely used machine learning algorithms were applied and compared, including Decision Trees, Random Forests, the k-Nearest Neighbors’ method, Support Vector Machines, and Artificial Neural Networks (ANNs). The analysis demonstrated that the highest accuracy, precision, and recall (ACC, PPV, and REC indicators) in making correct purchase decisions were achieved using the ANNs algorithm. Additionally, it should be noted that decision trees are characterized by high interpretability of results, which distinguishes them from other methods. Machine learning methods may be used to develop a system supporting investment decisions related to the purchase of land properties for future construction projects; however, it should be remembered that the final decision will always be made by the investor based on their subjective assessment.

Keywords:

technical due diligence; machine learning; construction plot

1. Introduction

The term Due Diligence [1] originated in the 1930s in the United States, where it required sellers of financial instruments to disclose information mandated by local regulations. Today, according to the American Dictionary of Legal Terms [2], it refers to an analysis conducted by companies prior to making business decisions, particularly in areas such as mergers and acquisitions or the buying and selling of significant assets. Based on article [3], the primary objective of a due diligence process is to mitigate risk by protecting the buyer from unforeseen costs arising from issues discovered only after the transaction has been completed, when reversing it is no longer feasible.

In accordance with market practice, the purchase of a building plot and the commencement of a construction project are preceded by the preparation of a Technical Due Diligence (TDD) report. The preparation of the report involves addressing a classification problem [4], which entails verifying whether a planned building can be constructed on a given plot in compliance with current regulations and the investor’s requirements. This report is essential for the effective execution of the investment, contributing to the further development of the construction market in Poland, both for investors and construction companies [5]. Figure 1 illustrates the main phases of the TDD process.

During the Technical Due Diligence (TDD) process, two reports are prepared. The preliminary report provides an assessment of legal, technical, environmental, social, and economic constraints that may make the realization of the proposed investment impossible, while the final report is subsequently issued upon completion of the entire analysis.

This article focuses on the TDD process, concentrating primarily on technical issues related to the land property [6]. In line with the RICS (Royal Institution of Chartered Surveyors) [7], the purpose of TDD is to identify physical defects or instances of non-compliance with local regulations prior to the sale of a property, as these may influence its value.

In scientific literature, several methodological approaches to the preparation of the Technical Due Diligence (TDD) process can be identified, including the Analytic Hierarchy Process (AHP), machine learning techniques, expert interviews, document analysis, and land inspections [8]. However, the authors did not find any comparative analysis of machine learning methods that could be applied to the development of a decision-support system for investors regarding the purchase of land property. The lack of dedicated decision-support tools for investors acquiring land for construction projects, combined with the high complexity and risk of this process, justifies the application of machine learning methods to improve the quality of decision-making. Compared to the standard approach, a decision-support system based on machine learning could define a consistent methodology for the preparation of TDD reports and may serve as a tool supporting investment decisions.

The preparation of a TDD report can be reduced to solving a binary classification task [9], in which the input consists of factors affecting the future investment, treated as independent variables, and the output is a recommendation concerning the potential purchase of a given property, which is the dependent variable.

One of the most popular approaches to solving classification tasks is the use of methods based on machine learning algorithms [10]. The aim of this article is to test machine learning models and identify the optimal algorithm for building a classifier that would form the basis of a decision-support system for investors regarding the potential purchase of land property. Figure 2 presents the scheme of the investor decision-support system.

2. Data Set

For the development of the machine learning models, a database consisting of 100 projects was used. The dataset was divided into training (60 projects), validation (20 projects), and test (20 projects) sets. The division into validation and test sets was applied to reliably assess the performance of the individual models and to reduce the risk of overfitting. Each project included in the database was described using 25 factors grouped into five categories: legal, technical, environmental, social, and economic. The database was prepared based on the authors’ professional experience and expert interviews. Based on their own analysis, the authors selected the factors that have the most significant impact on the purchase decision, following the principle that each group should be represented by five factors. All factors used to describe the projects are presented in Table 1.

Based on the above-described database, classifiers supporting investor decision-making were developed. To evaluate the performance of the individual models, cross-validation [11], the confusion matrix [12], and the metrics ACC, PPV, REC, and F1 were used, as described in Table 2, Table 3 and Table 4.

In the case of the database described in this article, the value of k is equal to 4, which means that the dataset was divided into four validation–training folds, each consisting of 20 projects. The remaining group of 20 projects constitutes the test set. The confusion matrix is constructed in accordance with the following formulas:

3. Machine Learning Models

3.1. Decision Trees

One of the most widely used methods for solving classification problems is the application of decision trees [13], which aims to partition data into the most homogeneous possible groups with respect to the dependent variable [14]. A decision tree model splits the training data by asking successive questions about various factors [15]. Each question represents a node in the tree, and the answers lead to subsequent branches. At the end of each branch, there is a decision—e.g., to which class the object belongs. Decision trees are intuitive, easy to interpret, and can be easily presented in a graphical form; however, they are prone to overfitting.

Based on the CART algorithm [16] and a training dataset consisting of 80 cases, a decision tree was generated (shown in Figure 3), which can be used to support investor decision-making. An additional benefit of using the decision tree algorithm is the identification of 13 factors included in the decision tree out of 25, which are key in making a positive purchase decision regarding land property. These identified factors can be analyzed at the preliminary report phase of the TDD process, which allows the TDD process to be carried out more optimally. Additionally, the algorithm highlights factors such as plot price and planning decisions, whose fulfillment is essential for making a positive purchase decision and whose verification should be carried out at the beginning of preparing the preliminary report. The confusion matrix of the decision tree method for the test dataset is presented in Table 5.

3.2. Random Forests

A random forest [17] is a collection of multiple decision trees. Each tree is trained on a slightly different random subset of data and factors. The prediction of the random forest is based on majority voting, where each tree makes an individual prediction, which is then aggregated, and the final prediction is determined by majority vote. For the purposes of calculations in this article, based on tests performed on the validation data, an optimal number of 200 decision trees was assumed.

A single tree is prone to errors, but multiple decision trees reduce overfitting and provide a more stable and accurate result. Random forests are highly effective, resistant to overfitting, and automatically detect the importance of factors; however, they are difficult to interpret. The confusion matrix of the random forest method for the test dataset is presented in Table 6.

3.3. Nearest Neighbors (k-Nearest Neighbors)

KNN (the k-Nearest Neighbors algorithm) [18] is a very simple model: when a new data point appears, the k nearest points from the training dataset are identified, and on this basis a positive or negative decision is made (majority voting). For the purposes of the calculations in this paper, based on tests performed on the validation dataset, the optimal value of k = 3 was adopted. KNN is a simple and intuitive method that does not require model training. Unfortunately, the model is sensitive to irrelevant features. The confusion matrix for the k-Nearest Neighbors Model applied to the test dataset is presented in Table 7.

3.4. Support Vector Machine (SVM)

SVM (Support Vector Machine) [19] identifies a ‘boundary’ (hyperplane) that best separates the classes in the dataset. ‘Best’ means that it maximizes the margin, i.e., the distance to the nearest points from each class (the so-called support vectors). For non-linear data, kernel functions are applied to ‘map’ the data into higher-dimensional spaces, where separation becomes easier. The method is characterized by high effectiveness when dealing with data containing many features and is resistant to overfitting; however, it is difficult to interpret. The confusion matrix of the Support Vector Machine method for the test dataset is presented in Table 8.

3.5. Artificial Neural Networks

ANNs (Artificial Neural Networks) [20] are computational models inspired by the functioning of the human brain, used in machine learning and artificial intelligence. ANNs learn complex non-linear relationships and are characterized by high effectiveness; however, they are difficult to interpret and require large amounts of data. For the purposes of this study, an MLP [21] (Multi-Layer Perceptron) network was applied, consisting of one hidden layer with four neurons and the ReLU activation function. The confusion matrix of the ANNs method for the test dataset is presented in Table 9.

3.6. Summary of Results

Table 10 presents the results obtained for all tested machine learning models on the validation and test datasets.

4. Discussion

The literature includes numerous studies describing decision-support systems across various fields (e.g., decisions of an economic nature [22], decisions made within construction companies [23], and the selection of an optimal subcontractor [24]); however, there is no system specifically dedicated to the purchase of land properties intended for construction investment.

Based on the conducted research, a methodology for creating an investment decision-support system using machine learning methods was proposed. The most widely recognized machine learning techniques were applied to construct the investor decision-support system, including Decision Trees, Random Forests, k-Nearest Neighbors classifiers, Support Vector Machines, and Artificial Neural Networks. The models’ performance was evaluated using validation and test datasets, for which confusion matrices were generated and performance metrics such as ACC (accuracy), PPV (precision), and REC (recall) were calculated.

To mitigate the risk of overfitting, several strategies were applied. The dataset was divided into training, validation, and test subsets, enabling independent model tuning and evaluation. Additionally, k-fold cross-validation (k = 4) was implemented to ensure a more robust assessment of model performance and to reduce dependence on a single data split.

Model complexity was intentionally limited, particularly in the case of the ANN, which was designed with a simple architecture (one hidden layer with four neurons) to match the size of the dataset. Furthermore, model performance was evaluated using multiple metrics (ACC, PPV, REC, and F1) across validation and test datasets, allowing for a comprehensive assessment of generalization ability.

Finally, it should be emphasized that the decision-making process is relatively complex. The twenty-five factors used to describe the project do not exhaust the full range of factors that investors must analyze before making a purchase decision, which constitutes a limitation of the proposed models. However, once the key decision-making factors are identified in cooperation with investors, the developed models may serve as a tool supporting investment decisions.

As part of future research, although the Authors are aware of the challenges associated with collecting appropriate data, it is planned to increase the number of analyzed cases and consider factors in order to improve the system’s accuracy and reduce the number of Type I and Type II errors.

5. Conclusions

Decision-making regarding the purchase of land property for the purpose of construction investment is a complex transaction that, in accordance with market practice, requires conducting a Technical Due Diligence (TDD) process, which is influenced by numerous factors belonging to various fields. The main contribution of this study was to demonstrate that artificial intelligence-based machine learning models can be applied to develop a decision-support model for investors in the process of land acquisition. Additionally, based on the Decision Tree model (built using the CART algorithm), it is possible to identify key factors whose fulfillment is necessary to make a positive purchase decision. Factors such as the price of the land property and planning decisions were identified by the algorithm as essential for making a positive purchase decision.

Furthermore, factors such as the attitude of immediate neighbors, deficiencies in formal documentation related to existing buildings, heritage protection status of the site, land contamination, mining damage, access conditions to public roads, rental market trends, and overall economic conditions were identified as having a significant impact on the decision-making process. Simple method Decision Trees offer full interpretability, allowing investors to understand the model’s decisions, but at the cost of lower accuracy, especially on the test dataset. More complex methods, such as ANNs, provide higher effectiveness, but with limited interpretability.

The following machine learning methods were tested in the study: Decision Trees, Random Forests, k-Nearest Neighbors, Support Vector Machines, and Artificial Neural Networks (ANNs). The highest model accuracy (ACC) for the test dataset was achieved by ANNs at 80%, while for the validation dataset the highest value (79%) was obtained by Random Forests. Additionally, it should be noted that the highest precision (PPV), which determines the magnitude of Type I error, was achieved by the ANN model on the test dataset. This metric is particularly important because it indicates the number of actual negative decisions classified by the model as positive, which may lead to concluding an unfavorable transaction and incurring significant financial losses for the investor.

Regarding recall (REC), which reflects the number of Type II errors (actual positive decisions classified as negative), the highest value was also achieved by ANNs on the test dataset. The F1 score, being the harmonic means of precision and recall, also reached the highest value for the ANN model.

ANNs proved to be the most effective method in the context of supporting investment decision-making in the studied database, due to the model’s ability to capture nonlinear relationships among decision factors. Other methods, such as Decision Trees, Random Forests, SVM, and KNN, showed lower accuracy on the test dataset, which can be attributed, among other things, to their limited ability to capture complex relationships in a relatively small dataset. These methods were more sensitive to the limited number of cases and the uneven distribution of decision-influencing factors, resulting in lower classification performance.

The results obtained in this study are specific to the problem analyzed and may not fully reflect the performance of these methods in other classification tasks. The results depend on the nature of the data, the number of available samples, the number and type of factors, and the degree of nonlinearity in relationships. Therefore, conclusions regarding the superiority of ANNs should be considered specific to the studied dataset and problem, and any generalization should be preceded by testing on larger and more diverse datasets.

For the ANN, the difference between test accuracy (0.80) and validation accuracy (0.71) has been observed. The test set (20 projects) may have been less complex or more homogeneous than the cross-validation folds, which could have resulted in higher performance on this subset.

The observed 9-percentage-point gap may indicate that the model’s performance on unseen data is potentially overestimated. To address this concern, we have emphasized the need for further validation on larger and more diverse datasets in future work.

Due to the small dataset (80 training projects and 20 test projects), all models had low computational requirements and short training times. ANNs required the most computational resources, although with such a small sample the difference in time was minimal. Decision Trees and Random Forests had the shortest prediction times, which may be important for larger datasets.

Decision trees offer a transparent and interpretable structure, allowing investors to directly follow decision rules and understand how specific input factors influence the outcome. This makes them especially useful in situations where explainability and traceability of decisions are required.

ANNs, while often achieving higher predictive accuracy, operate as black-box models. Their output should therefore be interpreted primarily as classification predictions, without direct insight into the internal decision logic. For practical use, this implies that ANNs are more suitable in contexts where predictive performance is prioritized over interpretability.

Decision-support systems for land acquisition may assist investors in making more reliable decisions. However, it should be emphasized that the final decision regarding the purchase of land property is ultimately made by the investor based on their subjective assessment of the factors influencing the future investment.

Author Contributions

Conceptualization, M.W.; methodology, E.R.-Z.; software, M.W.; validation, B.S.; formal analysis, M.W.; resources, M.W.; writing—original draft preparation, M.W.; supervision, E.R.-Z. and B.S.; funding acquisition, E.R.-Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TDD	Technical Due Diligence
ANNs	Artificial Neural Networks
RICS	Royal Institution of Chartered Surveyors
AHP	Analytic Hierarchy Process
CART	Classification and Regression Trees
KNN	k-Nearest Neighbors
SVM	Support Vector Machine
MLP	Multi-Layer Perceptron

References

Sanz-Prieto, I.; de-la-fuente-Valentín, L.; Ríos-Aguilar, S. Technical Due Diligence as a Methodology for Assessing Risks in Start-up Ecosystems: An Advanced Approach. Inf. Process. Manag. 2021, 58, 102617. [Google Scholar] [CrossRef]
Due Diligence Law and Legal Definition|USLegal, Inc. Available online: https://definitions.uslegal.com/d/due-diligence/#google_vignette (accessed on 27 January 2024).
Kutera, B.; Anysz, H. The Methodology of Technical Due Diligence Report Preparation for an Office, Residential and Industrial Buildings. In Proceedings of the MATEC Web of Conferences, Hong Kong, China, 26–27 April 2016; Volume 86. [Google Scholar]
Rojek, I.; Burduk, R.; Heda, P. Ensemble Selection in One-versus-One Scheme—Case Study for Cutting Tools Classification. Bull. Pol. Acad. Sci. Tech. Sci. 2021, 69, 136044. [Google Scholar] [CrossRef]
Radziszewska-Zielina, E. Analysis of the Impact of the Level of Partnering Relations on the Selected Indexes of Success of Polish Construction Enterprises. Eng. Econ. 2010, 21, 324–335. [Google Scholar] [CrossRef]
Reich, S. Technical Due Diligence; Springer: Cham, Switzerland, 2018; Volume Part F614. [Google Scholar]
New Guidance Note: Building Surveys and Technical Due Diligence of Commercial Property. Struct. Surv. 2011, 29, 39–41. [CrossRef]
Waga, M.; Radziszewska-Zielina, E.; Sroka, B. Review of Methods for Preparing Technical Due Diligence Reports for the Purchase of Commercial Real Estate. Przegląd Bud. 2025, 96, 102–106. [Google Scholar] [CrossRef]
Surma, J. Business Intelligence: Making Decisions Through Data Analytics; Pearson: Harlow, UK, 2011. [Google Scholar]
Osisanwo, F.Y.; Akinsola, J.E.T.; Awodele, O.; Hinmikaiye, J.O.; Olakanmi, O.; Akinjobi, J. Supervised Machine Learning Algorithms: Classification and Comparison. Int. J. Comput. Trends Technol. 2017, 48, 128–138. [Google Scholar] [CrossRef]
Jonathan, O.; Omoregbe, N.; Misra, S. Empirical Comparison of Cross-Validation and Test Data on Internet Traffic Classification Methods. J. Phys. Conf. Ser. 2019, 1299, 012044. [Google Scholar] [CrossRef]
Bhandari, A. Understanding & Interpreting Confusion Matrix in Machine Learning (Updated 2024); Analytics Vidhya: Gurugram, India, 2024. [Google Scholar]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
Fattah, A.; Fouad, M.M.; Philip, S.Y.; Gharib, T.F. A Decision Tree Classification Model for University Admission System. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 031003. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Yao, Y.; Li, J.; Zhang, X.; Duan, P.; Li, S.; Xu, Q. Investigation on the Expansion of Urban Construction Land Use Based on the CART-CA Model. ISPRS Int. J. Geoinf. 2017, 6, 149. [Google Scholar] [CrossRef]
Ali, M.R.; Nipu, S.M.A.; Khan, S.A. A Decision Support System for Classifying Supplier Selection Criteria Using Machine Learning and Random Forest Approach. Decis. Anal. J. 2023, 7, 100238. [Google Scholar] [CrossRef]
Suyal, M.; Goyal, P. A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms Based on Supervised Learning. Int. J. Eng. Trends Technol. 2022, 70, 43–48. [Google Scholar] [CrossRef]
Gandhi, R. Support Vector Machine—Introduction to Machine Learning Algorithms. Towards Data Sci. 2018. Available online: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 (accessed on 6 May 2026).
Qamar, R.; Zardari, B.A. Artificial Neural Networks: An Overview. Mesopotamian J. Comput. Sci. 2023, 2023, 130–139. [Google Scholar] [CrossRef] [PubMed]
Molina, E.; Parraga-Alava, J. Artificial Neural Networks for Classification Tasks: A Systematic Literature Review. Enfoque UTE 2024, 15, 1058. [Google Scholar] [CrossRef]
Liu, B.; Sun, Z. Global Economic Market Forecast and Decision System for IoT and Machine Learning. Mob. Inf. Syst. 2022, 2022, 8344791. [Google Scholar] [CrossRef]
Radziszewska-Zielina, E. Fuzzy Control of Partnering Relations of a Construction Enterprise. J. Civ. Eng. Manag. 2011, 17, 5–15. [Google Scholar] [CrossRef]
Radziszewska-Zielina, E. The Application of Multi-Criteria Analysis in the Evaluation of Partering Relations and the Selection of a Construction Company for the Purposes of Cooperation. Arch. Civ. Eng. 2016, 62, 167–182. [Google Scholar] [CrossRef]

Figure 1. Phases of the TDD process implementation for a construction plot.

Figure 2. Operation scheme of the investor decision-support system.

Figure 3. Decision tree generated using the CART algorithm for 80 projects.

Table 1. List of factors characterizing the projects.

Group of Factors.	Factors	Description	Factor Symbol
Technical	Soil conditions	The building’s foundation may be classified as geotechnical category I or II—favorable. The building’s foundation may be classified as geotechnical category III—unfavorable.	X1
	Connection conditions to public roads	The costs related to connection to public roads are accepted by the investor—favorable. The costs related to connection to public roads are not accepted by the investor—unfavorable.	X2
	Utility connection requirements	The costs of implementing all utility connections are accepted by the investor—favorable. The costs of implementing all utility connections are not accepted by the investor—unfavorable.	X3
	Public transport	Availability of public transport—enabling employees to commute to work—favorable. Insufficient public transport access for employees—unfavorable.	X4
	Mining damages	The building will be founded in mining damage category I–II—favorable. The building will be founded in mining damage category III–V—unfavorable.	X5
Environmental	Greenery inventory	The study does not identify any trees that could be classified as natural monuments—favorable. The inventory includes trees that may be regarded by authorities as natural monuments, and permission for their removal may be denied, which could restrict the buildable area—unfavorable.	X6
	Site contamination	There is no evidence of soil or groundwater contamination on the building plot—favorable. The site is contaminated and necessitates remediation works—unfavorable.	X7
	Carbon footprint	The investor is not obligated to reduce the carbon footprint during the project—favorable. The investor is obligated not to exceed the carbon footprint limit per square meter of the future building’s usable floor area—unfavorable.	X8
	Flood risk	The project site is not within flood risk zones—favorable. Location in flood risk areas—unfavorable.	X9
	Environmental certification	Project complies with preliminary assessment requirements without extra costs—favorable. Project needs extra funding to comply with certification requirements—unfavorable.	X10
Legal	Development conditions decision	Planning decisions or an extract from the local zoning plan permit development consistent with the investor’s expectations—favorable. Planning decisions or extracts from the local zoning plan do not permit development consistent with the investor’s expectations—unfavorable.	X11
	Conservation protection	The building plot is not situated within a heritage protection area—favorable. The building plot is located within a heritage protection area—unfavorable.	X12
	Legal status of adjacent land parcels	The legal status of adjacent plots is identified—favorable. The legal status of adjacent plots is unclear (ownership of the plots is not clearly established)—unfavorable.	X13
	Need to obtain deviations from the building regulations	No need to obtain deviations from technical conditions—favorable. Need to apply for deviations from technical conditions—unfavorable.	X14
	Lack of formal documents	Formal documents exist for the existing buildings on the plot—favorable. No formal documents exist for the existing buildings on the plot—unfavorable.	X15
Economic	Purchase price of the plot	The plot price is acceptable to the Investor—favorable. The plot price is not acceptable to the Investor—unfavorable.	X16
	Trend in the construction market	Trends in the construction market are favorable for investors—favorable (the number of construction investments is decreasing compared to the previous year). Trends in the construction market are unfavorable—unfavorable (the number of construction investments is increasing compared to the previous year).	X17
	Global economic situation	The global economic situation is favorable. The global economic situation is unfavorable.	X18
	Building size in relation to the leasable area	The building size is optimal in terms of construction costs—favorable. The building size is not optimal in terms of construction costs—unfavorable.	X19
	Trends in the rental market	Trends in the rental market are favorable (demand for rental space is increasing compared to the previous year). Trends in the rental market are unfavorable (demand for rental space is decreasing compared to the previous year).	X20
Social	Attitude of the immediate neighbors toward the project	Neighbors’ attitude is favorable—favorable (issued administrative decisions were not protested). Neighbors’ attitude is unfavorable—unfavorable (issued administrative decisions were protested by the neighbors).	X21
	Attitude of the local community	The attitude of the local community is favorable (local associations did not protest the issued administrative decisions). The attitude of the local community is unfavorable (local associations protested the issued administrative decisions).	X22
	Attitude of the city authorities	The city authorities’ attitude is favorable for the investment. The city authorities’ attitude is unfavorable for the investment.	X23
	Social benefits for the city	The implementation of the investment is socially favorable. The implementation of the investment is socially unfavorable.	X24
	Aesthetic benefits for the city	The investment will contribute aesthetic value to the urban fabric—favorable. The investment will not provide aesthetic benefits to the built environment—unfavorable.	X25

Table 2. Cross-validation. Definition of the confusion matrix and the ACC, PPV, REC, and F1 metrics (Table 3 and Table 4).

Iteration	Training Data				Validation Data	Confusion Matrix
1 iteration					x	Confusion matrix no 1	ACC, PPV, REC, F1
2 iterations				x		Confusion matrix no 2	ACC, PPV, REC, F1
3 iterations			x			Confusion matrix no 3	ACC, PPV, REC, F1
4 iterations		x				Confusion matrix no 4	ACC, PPV, REC, F1
…					…
…					…
k iterations	x					Confusion matrix no k	ACC, PPV, REC, F1
Training data	Training and validation data						ACC, PPV, REC, F1

Table 3. Confusion matrix.

Confusion Matrix Construction.
	Predicted positive class	Predicted negative class	Meaning: TP (True Positive)—the model correctly identified a positive case. TN (True Negative)—the model correctly identified a negative case. FP (False Positive)—the model incorrectly predicted a positive class; the number of negative cases wrongly classified as belonging to the positive class. These are Type I errors. FN (False Negative)—the model failed to detect a positive class that actually occurred; the number of positive cases incorrectly classified as belonging to the negative class. FN errors are called Type II errors.
Actual positive class	True Positive (TP)—correct classification	False Negative (FN)—incorrect classification
Actual negative class	False Positive (FP)—incorrect classification	True Negative (TN)—correct classification

Table 4. Definitions of the ACC, PPV, REC, and F1 metrics.

Symbol	Description	Formula	Formula for the Validation Dataset
ACC	Accuracy	$(T P + T N) / (T P + T N + F P + F N)$	$m e a n A C C = \frac{1}{k} \sum {A C C}_{k}$
PPV	Precision	$T P / (T P + F P)$	$m e a n P R C = \frac{1}{k} \sum {P P V}_{k}$
REC	Recall/Sensitivity	$T P / (T P + F N)$	$m e a n P E C = \frac{1}{k} \sum {P E C}_{k}$
F1	F1-score (harmonic mean of precision and recall)	$2 x P P V x R E C / (P P V + R E C)$	$m e a n F 1 = \frac{1}{k} \sum {F 1}_{k}$

Table 5. Confusion matrix for the Decision Tree model—test dataset.

	Model Prediction—Decision Tree.
Actual Value		Positive	Negative
	Positive	4	1
	Negative	6	9

Table 6. Confusion matrix for the Random Forest model—test dataset.

	Model Prediction—Random Forests.
Actual Value		Positive	Negative
	Positive	6	2
	Negative	4	8

Table 7. Confusion matrix for the k-Nearest Neighbors method—test data.

	k-Nearest Neighbors Model Prediction
Actual Value		Positive	Negative
	Positive	4	0
	Negative	6	10

Table 8. Confusion matrix for the Support Vector Machine—test data.

	Model Prediction—SVM
Actual Value		Positive	Negative
	Positive	5	1
	Negative	5	9

Table 9. Confusion matrix for ANNs—test data.

	Model Prediction—ANNs
Actual Value		Positive	Negative
	Positive	7	1
	Negative	3	9

Table 10. ACC, PPV, REC, and F1 values for individual machine learning methods on the validation and test datasets.

	Validation Data
	Decision Trees	Random Forests	k-Nearest Neighbors Method	Support Vector Machine Method	Artificial Neural Networks
Mean ACC (accuracy)	0.74	0.79	0.56	0.68	0.71
Mean PPV (precision)	0.77	0.75	0.43	0.67	0.70
Mean REC (sensitivity)	0.73	0.83	0.59	0.68	0.72
Mean F1 (harmonic mean of PPV and REC)	0.75	0.78	0.50	0.67	0.71
	Test Data
	Decision Trees	Random Forests	k-Nearest Neighbors Method	Support Vector Machine Method	Artificial Neural Networks
ACC (accuracy)	0.65	0.70	0.70	0.70	0.80
PPV (precision)	0.40	0.60	0.40	0.50	0.70
REC (sensitivity)	0.80	0.75	1	0.83	0.88
Mean F1 (harmonic mean of PPV and REC)	0.53	0.67	0.57	0.63	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Radziszewska-Zielina, E.; Waga, M.; Sroka, B. Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Appl. Sci. 2026, 16, 5274. https://doi.org/10.3390/app16115274

AMA Style

Radziszewska-Zielina E, Waga M, Sroka B. Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Applied Sciences. 2026; 16(11):5274. https://doi.org/10.3390/app16115274

Chicago/Turabian Style

Radziszewska-Zielina, Elżbieta, Marcin Waga, and Bartłomiej Sroka. 2026. "Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms" Applied Sciences 16, no. 11: 5274. https://doi.org/10.3390/app16115274

APA Style

Radziszewska-Zielina, E., Waga, M., & Sroka, B. (2026). Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms. Applied Sciences, 16(11), 5274. https://doi.org/10.3390/app16115274

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Decision-Making Within Technical Due Diligence for Land Development Using Machine Learning Algorithms

Abstract

1. Introduction

2. Data Set

3. Machine Learning Models

3.1. Decision Trees

3.2. Random Forests

3.3. Nearest Neighbors (k-Nearest Neighbors)

3.4. Support Vector Machine (SVM)

3.5. Artificial Neural Networks

3.6. Summary of Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI