Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0

Behúnová, Annamária; Pohorenec, Matúš; Mandičák, Tomáš; Behún, Marcel

doi:10.3390/app16042057

Open AccessArticle

Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0

¹

Faculty of Mining, Ecology, Process Control and Geotechnologies, Institute of Logistics and Transport, Technical University of Kosice, 042 00 Kosice, Slovakia

²

Faculty of Civil Engineering, Institute of Construction Technology, Economics and Management, Technical University of Kosice, 042 00 Kosice, Slovakia

³

Faculty of Mining, Ecology, Process Control and Geotechnologies, Institute of Earth Resources, Technical University of Kosice, 042 00 Kosice, Slovakia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 2057; https://doi.org/10.3390/app16042057

Submission received: 19 January 2026 / Revised: 10 February 2026 / Accepted: 12 February 2026 / Published: 19 February 2026

(This article belongs to the Special Issue The Wide Range Use of Innovative Technologies in Industry 4.0/5.0 and IoT)

Download

Browse Figures

Versions Notes

Abstract

Industry 5.0 emphasizes human-centered integration of artificial intelligence in industrial contexts, yet successful adoption depends critically on workforce perception and acceptance. This research develops and validates a machine learning framework for predicting AI-related perceptions and expected impacts in the construction industry under small sample constraints typical of specialized industrial surveys. Specifically, the study aims to develop and empirically validate a predictive AI decision support model that estimates the expected impact of AI adoption in the construction sector based on digital competencies, ICT utilization, AI training and experience, and AI usage at both individual and organizational levels, operationalized through a composite AI Impact Index and two process-oriented outcomes (perceived task automation and perceived cost reduction). Using a dataset of 51 survey responses from Slovak construction professionals collected in 2025, we implement a methodologically rigorous approach specifically designed for limited-data regimes. The framework encompasses ordinal target simplification from five to three classes, dimensionality reduction through theoretically grounded composite indices reducing features from 15 to 7, exclusive deployment of low variance regularized models, and leave-one-out cross-validation for unbiased performance estimation. The optimal model (Lasso regression with recursive feature elimination) predicts cost reduction perception with R² = 0.501, MAE = 0.551, and RMSE = 0.709, while six classification targets achieve weighted F1 = 0.681, representing statistically optimal performance given sample constraints and perception measurement variability. Comparative evaluation confirms regularized models outperform high variance alternatives: random forest (R² = 0.412) and gradient boosting (R² = 0.292) exhibit substantially lower generalization performance, empirically validating the bias-variance trade-off rationale. Key methodological contributions include explicit bias-variance optimization preventing overfitting, feature selection via RFE reducing input space to six predictors (personal AI usage, AI impact on budgeting, ICT utilization, AI training, company size, and age), and demonstration that principled statistical approaches achieve meaningful predictions without requiring large-scale datasets or complex architectures. The framework provides a replicable blueprint for perception and impact prediction in data-constrained Industry 5.0 contexts, enabling targeted interventions, including customized training programs, strategic communication prioritization, and resource allocation for change management initiatives aligned with predicted adoption patterns.

Keywords:

industry 5.0—artificial intelligence; machine learning; construction industry; perception prediction; small-sample learning

1. Introduction

The construction industry stands at a critical inflection point as Industry 5.0 principles drive fundamental transformation from traditional labor-intensive practices toward human-AI collaborative systems [1,2,3]. While artificial intelligence promises substantial operational improvements spanning cost optimization, design automation, project monitoring, and resource management [4,5], successful adoption hinges not merely on technical capability but critically on workforce perception, acceptance, and integration willingness [6,7]. Recent estimates suggest that construction productivity improvements of 30–45% are achievable through AI deployment [8,9], yet adoption rates remain below 15% across European markets [10], with perception barriers cited as the dominant impediment surpassing technical and financial constraints [11]. Understanding and predicting these perception patterns represents a strategic priority for organizations navigating digital transformation, enabling proactive interventions that address resistance, customize training, and optimize change management resources.

Critically, the transition from Industry 4.0 to Industry 5.0 introduces a fundamental shift in how adoption barriers must be understood and addressed. Industry 4.0 adoption models predominantly focused on technical readiness, infrastructure investment, and process integration—factors amenable to straightforward resource allocation decisions [12,13,14]. In contrast, Industry 5.0 explicitly positions human-centricity alongside sustainability and resilience as co-equal pillars, recognizing that technology value creation depends primarily on effective human–technology collaboration rather than automation capability alone [15,16]. This paradigm shift means that workforce perception constitutes a distinct barrier category that cannot be overcome through technical solutions or capital investment. Unlike infrastructure gaps that can be closed with funding or skill deficits addressable through training, perception barriers reflect attitudinal, cultural, and psychological factors that require targeted communication, trust-building, and demonstrated value before adoption proceeds. Construction industry surveys consistently reveal that even organizations with adequate technical infrastructure and trained personnel report adoption failures attributable to workforce resistance, uncertainty about role changes, and skepticism regarding AI reliability in safety-critical contexts [11,17].

Predicting AI adoption perceptions faces a fundamental methodological challenge characteristic of early-stage technological transitions: data scarcity imposed by limited expert respondent pools, nascent awareness levels, and the specialized nature of industry-specific surveys [12]. Construction industry surveys typically achieve response rates of 8–12%, with meaningful samples rarely exceeding 50–100 respondents despite extensive recruitment efforts [18]. This constraint creates a tension between predictive modeling aspirations and statistical reality—conventional machine learning approaches assume hundreds or thousands of training samples, while practical applications must operate effectively with tens of samples [19,20]. The predominant focus in contemporary machine learning on large-scale datasets and deep architectures provides limited guidance for practitioners confronting small sample industrial contexts, creating a methodological gap between algorithmic capabilities and practical requirements [21].

Existing research on technology acceptance modeling, exemplified by frameworks such as TAM [22], UTAUT [23], and DOI [24], predominantly employs descriptive statistical analysis or structural equation modeling rather than predictive machine learning approaches. Conversely, the machine learning literature extensively addresses large-scale prediction tasks but provides sparse guidance on small sample regimes where the bias-variance trade-off fundamentally shapes algorithm selection [25,26]. This research bridges this gap by developing a methodologically rigorous framework specifically designed for perception prediction under realistic data constraints encountered in Industry 5.0 industrial surveys. The primary objective is to demonstrate that statistically principled approaches can achieve meaningful predictive performance with limited samples by explicitly optimizing the bias-variance trade-off through target simplification, dimensionality reduction, regularized model selection, and appropriate validation protocols [27,28].

This work makes four principal contributions. First, it establishes a complete methodological framework for small sample perception prediction encompassing data engineering, model selection, validation strategy, and performance interpretation specifically adapted to industry survey contexts. Second, it provides empirical evidence that R² ≈ 0.50 and F1 ≈ 0.68 represent achievable and meaningful performance levels for perception prediction with n = 51 samples, establishing realistic performance expectations for similar applications. Third, it demonstrates practical applicability by enabling actionable predictions supporting targeted interventions in AI adoption initiatives. Fourth, it delivers a replicable blueprint transferable across industries, geographies, and technology adoption contexts where similar data constraints apply.

The remainder of this paper proceeds as follows. Section 2 reviews relevant literature spanning Industry 5.0 transformation dynamics, AI perception factors, technology acceptance frameworks, and machine learning approaches for small sample regimes. Section 3 presents the complete methodological framework, including experimental setup, system architecture, model selection rationale, and validation protocols. Section 4 reports experimental results for both regression and classification tasks with statistical interpretation. Section 5 discusses implications, limitations, and future research directions. Section 6 concludes with a synthesis of contributions and broader significance for Industry 5.0 applications.

2. Literature Review

2.1. Industry 5.0 and Construction Sector Transformation

Industry 5.0 represents a paradigm evolution from Industry 4.0’s automation-centric focus toward human-centered collaborative systems that position artificial intelligence as augmentation rather than replacement of human expertise [1,2,27]. This shift emphasizes three core principles: human-centricity, preserving human judgment and creativity; sustainability addressing environmental and social responsibility; and resilience, ensuring adaptability to disruptions [28,29]. The construction industry exemplifies sectors undergoing this transformation, characterized by fragmented project-based workflows, high customization requirements, and substantial reliance on experiential knowledge [30,31].

Recent literature documents construction-specific AI applications, including: building information modeling (BIM) enhanced with AI-driven design optimization achieving 15–25% material cost reductions [32,33]; autonomous construction equipment with computer vision enabling real-time safety monitoring and productivity tracking [34]; predictive maintenance systems using IoT sensor fusion that reduce equipment downtime by 20–30% [35,36]; and supply chain optimization algorithms coordinating multi-stakeholder logistics with 10–15% delivery time improvements [37]. Despite demonstrated technical feasibility, adoption remains constrained by workforce concerns regarding job displacement, skill obsolescence, organizational change resistance, and insufficient digital competencies [6,11]. This tension between technological potential and human acceptance barriers motivates perception prediction research.

2.2. AI Perception and Technology Acceptance Frameworks

Technology acceptance research has established multiple theoretical frameworks explaining adoption behavior [38]. The technology acceptance model (TAM) posits that perceived usefulness and perceived ease of use jointly determine adoption intention, with extensive validation across information systems contexts [22]. The Unified Theory of Acceptance and Use of Technology (UTAUT) extends TAM by incorporating performance expectancy, effort expectancy, social influence, and facilitating conditions as determinants [23]. Rogers’ Diffusion of Innovation (DOI) framework identifies relative advantage, compatibility, complexity, trialability, and observability as innovation characteristics influencing adoption rates [24].

Construction-specific studies reveal distinctive perception patterns [7,17]. Blue-collar workers express higher displacement anxiety than managerial staff, with perception gaps reaching 1.5–2.0 standard deviations on acceptance scales [39]. Company size significantly influences perceptions—large enterprises (>250 employees) demonstrate 40% higher AI readiness than micro-enterprises (<10 employees) [40]. Digital competency levels correlate strongly with AI acceptance (Spearman ρ ≈ 0.55–0.65), suggesting that digital literacy serves as a gateway competency [41]. Prior AI exposure through personal use or organizational pilots substantially improves perception, with effect sizes of Cohen’s d ≈ 0.6–0.8 [7]. These findings establish the relevance of demographic, experiential, and organizational variables as perception predictors.

Emerging research highlights the role of immersive technologies and advanced visualization in shaping workforce perception of AI and digital tools. Virtual reality (VR) training approaches demonstrate effectiveness in construction safety education, providing experiential learning environments that improve both competency and acceptance of digital interventions [16,42]. The human-centered design of such systems directly addresses perception barriers by enabling workers to experience AI-augmented workflows in controlled settings before deployment. Similarly, digital twin and simulation platforms such as SIMURG_CITIES illustrate how urban-scale computational models can support decision-making while maintaining human oversight and interpretability [43]. These systems exemplify the Industry 5.0 principle that AI should augment rather than replace human judgment, with perception outcomes closely tied to system transparency and user control. Human-robot collaboration research in construction further establishes that worker acceptance depends on clearly defined levels of autonomy and human oversight, with taxonomy frameworks identifying five distinct collaboration levels from preprogramming to full autonomy [44]. Collectively, these developments suggest that perception of AI in construction is made possible through appropriate interface design, training exposure, and demonstrated utility—supporting the predictive approach developed in this research.

2.3. Machine Learning for Perception and Survey Prediction

Predictive modeling of survey responses using machine learning has been applied across diverse domains, including customer satisfaction prediction, political preference forecasting, and healthcare quality assessment [45,46]. Common approaches include logistic regression for binary/ordinal outcomes, random forests capturing non-linear patterns, and neural networks for high-dimensional feature spaces [47]. Performance benchmarks vary substantially by domain—customer satisfaction models achieve R² ≈ 0.35–0.50, political polls F1 ≈ 0.60–0.75, and healthcare quality R² ≈ 0.45–0.60 [48].

Critical methodological considerations emerge from this literature. Survey response prediction faces inherent noise from respondent variability, question interpretation differences, and temporal mood effects, establishing performance ceilings regardless of algorithmic sophistication [49]. Ordinal target variables require specialized handling through proportional odds models, ordinal regression, or class consolidation strategies [50]. Class imbalance in attitudinal surveys (where extreme opinions are underrepresented) necessitates weighted loss functions, stratified sampling, or synthetic oversampling [51]. Feature engineering through composite indices demonstrates consistent benefits, with dimensionality reduction improving performance in small sample contexts by 0.05–0.15 in R² metrics [19,52].

2.4. Small Sample Machine Learning and Regularization

The statistical learning theory literature extensively addresses small sample regimes through the bias-variance decomposition: E[(y − ŷ)²] = Bias² + Variance + σ²_noise [25,26]. With limited samples (n < 100), high variance models, including deep neural networks, gradient boosting machines, and random forests with default hyperparameters, tend to overfit, memorizing training patterns that fail to generalize [20,21]. Regularization techniques address this by introducing constraints that reduce effective model complexity [27,53].

Ridge regression (L2 regularization) shrinks coefficients toward zero proportionally, reducing variance while accepting modest bias increases [47]. Lasso regression (L1 regularization) induces sparsity by eliminating features, providing implicit feature selection [54]. Elastic Net combines L1 and L2 penalties, balancing shrinkage and selection [55]. K-nearest neighbors with moderate neighborhood sizes (k ≈ √n) implements local averaging that reduces variance through smoothing [47]. Shallow decision trees with depth constraints limit leaf nodes, ensuring sufficient samples per terminal node for stable estimation [25].

Validation strategy critically determines performance estimation reliability in small samples [28,56]. K-fold cross-validation with k = 5–10 provides a reasonable bias-variance trade-off for moderate samples but may produce unstable folds with n < 50 [57]. Leave-one-out cross-validation (LOOCV) maximizes training set size (n − 1 samples per fold) and provides nearly unbiased estimates, though with higher variance than k-fold [28,58]. Nested cross-validation separating hyperparameter tuning from performance evaluation prevents optimistic bias from parameter selection on limited data [59]. Studies demonstrate that inappropriate validation can inflate performance metrics by 0.10–0.20 in R², creating false confidence in model generalization [15,56].

2.5. Research Positioning and Gap Identification

The reviewed literature reveals a clear gap at the intersection of Industry 5.0 perception prediction, construction sector applications, and small sample machine learning methodology. Technology acceptance frameworks provide theoretical grounding but typically employ descriptive statistics rather than predictive models [22,23,38]. Machine learning perception studies focus predominantly on large-scale consumer datasets (n > 1000) with limited transferability to specialized industrial surveys [45,46]. Construction AI research emphasizes technical implementation over human factors, with perception studies remaining largely qualitative or correlation-based [4,5,11].

No existing work systematically addresses the complete pipeline from raw survey data through preprocessing, feature engineering, model selection under sample constraints, rigorous validation, and performance interpretation for Industry 5.0 perception prediction. This research fills this gap by providing an end-to-end framework explicitly designed for realistic small sample industrial contexts, with empirical validation demonstrating achievable performance levels and methodological contributions ensuring statistical validity. The framework extends beyond construction to any domain where expert perception measurement intersects with limited respondent availability, establishing broader applicability across Industry 5.0 transformation initiatives.

2.6. Digitalization in Construction, Construction 5.0, and Performance-Oriented Management

Digitalization has become one of the most important enablers of productivity and performance improvement in construction, yet its effects remain uneven across organizations, project types, and roles. In practice, the same digital or AI solution can deliver measurable gains in one context (e.g., faster planning cycles, fewer coordination failures, and improved cost control) while producing only marginal value in another because readiness, data quality, process maturity, and workforce capabilities differ. This variability is increasingly interpreted through the transition from Industry 4.0 to Industry 5.0: while Industry 4.0 emphasizes connectivity, cyber-physical integration, and data-enabled process integration [13], Industry 5.0 reframes transformation toward sustainability, resilience, and human-centric value creation, where benefits depend on effective human–technology collaboration rather than automation alone [15].

In construction, the Industry 4.0 logic translates into lifecycle information continuity and improved coordination supported by BIM, enterprise systems, analytics, and often IoT-enabled monitoring. However, construction differs from many industrial settings because delivery is temporary, multi-stakeholder, and shaped by changing site conditions and the quality of day-to-day decisions. As a result, digitalization is best understood not as tool adoption, but as the maturity of “information work”: capturing, structuring, exchanging, and operationalizing project data so that it reliably supports planning, execution, and control. Evidence shows that digitalization and automation maturity vary across the project lifecycle and are frequently uneven between pre-construction and downstream execution/operations, limiting realized benefits when information continuity is not achieved [60]. At the same time, performance bottlenecks in construction often arise at stakeholder interfaces; therefore, digital transformation of stakeholder management is directly tied to outcomes such as delays, disputes, rework, and cost escalation. A systematic review confirms that digital approaches can improve engagement and coordination but also highlights that adoption remains limited and that value emerges mainly when ICT support is systematized and embedded into standard routines [61].

The emerging Construction 5.0 narrative extends Industry 5.0 principles into the built environment by linking advanced digital technologies to explicit human-centered and sustainability goals. Recent work positions Construction 5.0 as a shift from technology-driven digitization toward integrated, collaborative, and ethically grounded systems, where people remain central to value creation and where resilience and sustainability are treated as operational priorities [62,63]. In practical terms, this implies that performance gains depend on adoption conditions: digital competencies, ICT utilization intensity, training exposure, and actual use of AI-enabled tools. Without these enabling factors, digital solutions tend to remain underused, misaligned with processes, or perceived as low value. This is why performance-oriented measurement and decision support frameworks are increasingly important in digitalized construction environments. Multi-criteria decision support concepts provide a structured way to improve decisions under competing objectives (cost, quality, risk, and capability) and can be strengthened by digital data availability; contractor selection frameworks and partnering/coopetition evaluation demonstrate how performance-relevant information can be translated into transparent, repeatable decisions [64,65]. Beyond core project controls, digital technologies also support performance improvement in specialized domains where data quality is decisive: for example, laser scanning enables objective pavement condition assessment to support better maintenance planning and network-level decisions [66]. Similarly, human-centric performance domains such as occupational safety increasingly benefit from digital training and simulation; virtual reality approaches provide a pathway for improving safety awareness and preparedness, consistent with the Industry 5.0 emphasis on people and well-being [42].

Overall, the literature supports a coherent performance logic for Construction 5.0: digitalization and AI create measurable value when competencies, real usage, and process design jointly enable better decisions, coordination, predictability, and resource efficiency. This framing also justifies predictive and decision support approaches that estimate expected impacts (e.g., perceived automation, perceived cost reduction, and composite impact across functional areas) as a function of readiness and exposure, allowing targeted interventions such as customized training, strategic communication, and change management aligned with predicted adoption patterns [15,61,62].

Digitalization in construction increasingly depends on immersive, data-driven, and human-centric solutions, which aligns with the Construction 5.0 emphasis on augmenting people and improving outcomes rather than only automating tasks. VR-based training environments demonstrate how digital tools can directly support performance-oriented management by reducing safety risks through realistic, task-specific simulations and learning-by-doing before workers enter hazardous sites [67]. In geotechnical drilling, BIM-enabled VR scenarios allow trainees to interact with machinery, recognize typical accident mechanisms, and internalize best practices using experience-based knowledge captured from senior workers [67]. Importantly, this type of digitalization becomes more scalable when BIM models and game-engine workflows are used to customize training content to specific projects and user needs, supporting consistent performance across different contexts [67]. The same studies highlight that integrating common data environments with VR could enable more advanced monitoring of movement routes and decision patterns, opening a pathway for evidence-based performance improvement and safer operational planning [67]. From a broader performance management perspective, multi-layer KPI frameworks (e.g., system-based models combining “people–process–technology” and “input–process–output–outcome–impact”) show how digital systems can structure sustainability and governance performance assessment in a transparent, accountable way [68]. Such KPI architectures are relevant to Construction 5.0 because they operationalize complex objectives (safety, sustainability, and stakeholder value) into measurable indicators and decision support logic [68]. However, digitalization outcomes also depend on user adoption and perceived value, meaning that performance-oriented management must treat technologies as socio-technical systems rather than standalone tools. Evidence from user-acceptance analysis in digital services indicates that adoption is shaped by how users interpret usefulness, experience quality, and market trust, which is directly applicable when deploying new construction digital platforms and AI-supported workflows [69]. Therefore, Construction 5.0–oriented performance management should combine (i) robust data/BIM foundations, (ii) human learning and acceptance mechanisms, and (iii) KPI-based evaluation loops to ensure that digital solutions deliver measurable improvements in safety, sustainability, and project outcomes [67,68,69].

3. Experimental Research

This section presents the methodological framework, system architecture, and experimental procedures employed in developing an AI-based perception prediction system for Industry 5.0 applications [70]. All Python (3.14) scripts comprising the system were systematically reviewed to ensure accurate representation of the implemented computational pipeline. The methodology adheres to rigorous statistical learning principles appropriate for small sample regimes, with explicit consideration of the bias-variance trade-off inherent in limited-data scenarios [71].

3.1. Problem Statement and the Aim of Experimental Research

The construction industry is increasingly adopting artificial intelligence (AI) to support planning, design, and project delivery [72]. Recent review evidence confirms that AI methods are being applied across the entire construction value chai—from pre-construction planning and design through construction execution to operations and facility management [73]. In practice, however, reported benefits differ substantially across organizations and job roles. This variability indicates that AI outcomes in construction are shaped not only by the availability of tools but also by the conditions under which they are introduced and used—particularly digital competencies, the intensity of ICT utilization, the extent of AI training and experience, and actual AI usage by both individuals and companies.

From the perspective of Industry 4.0 and the emerging Industry 5.0 framework, AI should be viewed as part of a broader digital ecosystem grounded in connectivity and data exchange, often complemented by IoT-enabled data sources and real-time data flows [74]. The original Industry 4.0 initiative emphasizes connected systems and data-supported process integration [14], while subsequent research highlights the diversity of definitions and implementation approaches across organizations and contexts [75]. In parallel, Industry 5.0 explicitly extends Industry 4.0 by placing stronger emphasis on sustainability, human-centricity, and resilience, where value is created primarily through effective human–technology collaboration rather than automation alone [13]. This is particularly relevant in construction, where project outcomes depend on coordination among multiple stakeholders, changing site conditions, and the quality of day-to-day decision-making.

Although the literature provides a broad overview of AI applications in construction, stakeholders still lack empirically validated approaches that would allow them to estimate the expected impact of AI adoption based on measurable readiness and usage characteristics. This limits evidence-based decisions about investments in AI tools and capability building. To address this gap, the present study operationalizes “AI impact” using (i) a composite AI Impact Index and (ii) two process-oriented outcome dimensions—perceived task automation and perceived cost reduction. These outputs are modeled as a function of digital competencies, ICT utilization intensity, AI training and experience, and AI usage at both individual and organizational levels, while accounting for demographic and organizational characteristics [76,77].

Aim of the experimental research. The aim of this paper is to develop and empirically validate a predictive AI model (a decision support tool) that estimates the expected impact of AI adoption in the construction sector based on digital competencies, ICT utilization intensity, AI training and experience, and AI usage at both individual and organizational levels. The impact is operationalized as (i) a composite AI Impact Index and (ii) two process-oriented outcome dimensions—perceived task automation and perceived cost reduction—while controlling for demographic and organizational characteristics.

Research hypothesis. Based on the literature review establishing that organizational resources, technology exposure, and facilitating conditions influence AI adoption attitudes [23,40], this research tests the hypothesis that company size positively predicts AI perception: larger organizations (medium and large enterprises) are expected to demonstrate more favorable AI adoption perceptions than smaller organizations (micro and small enterprises), reflecting differential resource availability, technology infrastructure, and formal training opportunities. This hypothesis is operationalized through statistical comparison of perception scores across company size categories and through inclusion of company size as a predictor in the regression framework, with results reported in Section 4.2 and Section 4.3.

3.2. Experimental Setup

The experimental foundation rests upon a survey-based dataset comprising perception responses related to artificial intelligence adoption in the construction industry, a sector undergoing significant transformation within the Industry 5.0 paradigm. The dataset contains 51 valid response samples collected through a structured questionnaire administered in 2025, with responses encoded in the Slovak language. The raw dataset encompasses 23 primary variables organized into four conceptual categories. Demographic variables include age group measured across five ordinal levels, company size spanning four levels from micro-enterprise to large corporation, job position distributed across seven hierarchical levels from preparation specialist to executive, and work experience captured in five duration bands ranging from less than one year to more than ten years. Technology exposure variables comprise ICT utilization level, personal AI usage, digital competencies, company AI adoption maturity, organizational digitalization level, and AI training exposure, with each measured on ordinal scales ranging from one to five. Perceived AI impact variables capture domain-specific impact assessments covering budgeting and cost management, design and planning, construction project management, marketing and customer relationship management, and material delivery and logistics, with each rated numerically. Finally, perception target variables constitute the primary outcome measures, including perceived AI impact on cost reduction serving as the primary regression target, alongside six classification targets addressing automation potential, materials optimization, project monitoring, human resources management, administrative burden reduction, and intelligent planning.

The sample size of n = 51 represents a boundary condition that fundamentally shapes the methodological approach. This constraint is not a limitation of the research design but rather reflects the realistic data availability in specialized industrial surveys where respondent pools are inherently restricted. The construction industry’s fragmented structure, combined with the nascent state of AI adoption awareness, naturally limits accessible expert respondents. With p = 15 baseline features and n = 51 samples, the effective samples-per-feature ratio approximates n/p ≈ 3.4, which falls below the commonly recommended threshold of 10–20 samples per feature for stable parameter estimation. This ratio necessitates explicit dimensionality reduction and regularization strategies to prevent overfitting and ensure generalization validity.

Target Variable Definitions

The prediction framework addresses one regression target and six classification targets, each representing distinct dimensions of perceived AI impact in construction operations. Table 1 presents the complete target variable specification.

The primary regression target—perceived AI impact on cost reduction—was selected based on its centrality to construction business decisions and its demonstrated variance in preliminary data exploration. The six classification targets were originally measured on 5-point Likert scales but consolidated to 3-class ordinal categories (low/medium/high) to address severe class imbalance, as detailed in Section 3.3. All targets capture forward-looking perceptions rather than retrospective assessments, consistent with the predictive decision support purpose of the framework.

3.3. System Architecture

The system architecture implements a four-phase optimization strategy specifically designed for small sample statistical learning. Phase one addresses data engineering through target simplification via ordinal class consolidation from five classes to three classes, combined with feature dimensionality reduction through theoretically—grounded composite index construction. Phase two governs model selection, enforcing exclusive use of low-variance, regularized, and interpretable models while explicitly excluding high variance architectures such as deep random forests, gradient boosting, and neural networks based on bias-variance considerations. Phase three establishes the validation strategy through implementation of leave-one-out cross-validation as the primary evaluation protocol, maximizing training data utilization while providing nearly unbiased performance estimates. Phase four handles ordinal target characteristics through class weighting for residual imbalance and ordinal-appropriate regression and classification methods.

The preprocessing module implements systematic encoding transformations for all categorical variables, mapping Slovak-language survey responses to numerical ordinal scales through pattern-matching functions that handle linguistic variations and encoding artifacts. The transformation preserves ordinal relationships essential for downstream statistical analysis.

3.3.1. Encoding Procedures

All survey variables were encoded according to their measurement properties. Ordinal variables (5-point Likert scales for perception items, ordered categories for age groups and experience levels) were mapped to integer sequences preserving rank order. Specifically:
Age groups: [18–24, 25–34, 35–44, 45–54, 55+] → [1, 2, 3, 4, 5]
Company size: [Micro (1–10), Small (10–50), Medium (50–250), Large (>250)] → [1, 2, 3, 4]
Work experience: [<1 year, 1–3 years, 3–5 years, 5–10 years, >10 years] → [1, 2, 3, 4, 5]
Perception scales: 5-point Likert items [Strongly Disagree → Strongly Agree] → [1, 2, 3, 4, 5]

Nominal variables (job position) were treated as ordinal approximations based on hierarchical level, though sensitivity analysis confirmed that alternative encodings produced equivalent model performance. Missing values were minimal (<3% of observations) and handled through listwise deletion given the small sample size, as imputation methods introduce additional variance in small sample contexts [52].

The original five-class perception targets exhibited severe class imbalance, with minority classes containing as few as one to two samples, and distributions that violate statistical assumptions required for reliable classifier training. The three-class consolidation scheme addresses this by mapping classes one and two to a low category representing negative or skeptical perception, class three to a medium category representing neutral or uncertain perception, and classes four and five to a high category representing positive perception. This transformation ensures minimum class sizes of eight to ten samples, enabling stable parameter estimation and meaningful cross-validation fold composition.

Three composite indices were constructed to reduce the feature space from 15 baseline features to seven optimized features (Table 2). The AI Experience Index represents the arithmetic mean of personal AI utilization, company AI adoption, and AI training level, capturing complementary aspects of individual AI ecosystem exposure that demonstrate empirical correlation with typical Spearman correlation coefficients exceeding 0.4. The digitalization index composites ICT utilization, organizational digitalization level, and self-reported digital competencies, capturing the digital environment maturity surrounding each respondent. The AI Impact Index calculates the mean of five domain-specific AI impact ratings spanning budgeting, design, project management, marketing, and logistics, measuring a common latent construct of perceived AI operational impact across business functions. The optimized feature set achieves an improved samples-per-feature ratio of n/p ≈ 7.3, substantially enhancing estimation stability.

Equal weighting was applied across component variables based on: (a) absence of theoretical justification for differential weights, (b) similar variance contributions across components, and (c) simplicity and reproducibility considerations. All component variables were measured on comparable 1–5 ordinal scales, eliminating the need for standardization prior to aggregation.

As an alternative to theory-driven composites, recursive feature elimination (RFE) was applied to select the most predictive subset directly from the 15 baseline features. Using Lasso regression as the base estimator with LOOCV for evaluation, RFE identified six features that maximized generalization performance: Age_Numeric, Company_Size_Numeric, ICT_Utilization_Numeric, AI_Util_Personal_Numeric, AI_Training_Numeric, and AI_Impact_Budgeting. This data-driven selection achieved R² = 0.501, marginally outperforming the composite index approach (R² = 0.45–0.48) and providing the final feature configuration used in reported results.

3.3.2. Target Simplification Justification

The consolidation from five ordinal classes to three classes was evaluated against alternative schemes to ensure methodological robustness. Table 3 presents the comparative analysis.

The adopted 3-class scheme (low: 1–2, medium: 3, high: 4–5) provides the best balance between class granularity and statistical stability. While 2-class consolidation yields marginally higher F1, it sacrifices the ability to distinguish neutral perceptions from positive ones—an important distinction for intervention targeting. The 5-class original scheme produced unstable estimates with high variance across LOOCV folds, confirming that minority classes with n < 5 samples prevent reliable parameter estimation.

3.4. Experimental Configuration

The fundamental challenge in small sample learning is the bias-variance decomposition of expected prediction error, expressed as E[(y − y)²] = Bias² + Variance + σ_noise². For n = 51 samples, high-complexity models such as random forests, gradient boosting, and deep neural networks exhibit low bias but critically high variance, leading to severe overfitting. The implemented strategy accepts moderately higher bias to achieve substantial variance reduction.

3.4.1. High Variance Baseline Comparison

To empirically validate the model selection rationale, constrained high variance models were trained and evaluated under identical LOOCV conditions. Table 4 presents the comparative results.

Even with aggressive depth constraints (max_depth = 3), ensemble methods substantially underperform regularized linear models. Random forest achieves R² = 0.412 (18% lower than Lasso), while gradient boosting produces R² = 0.292 (42% lower). This performance gap demonstrates that for n = 51 samples, the variance component dominates: ensemble methods’ capacity to capture complex interactions becomes a liability rather than an asset, as they fit noise patterns that do not generalize. The results empirically confirm the theoretical bias-variance trade-off rationale and justify the exclusive use of regularized linear models.

The regression model suite comprises five carefully selected algorithms. Ridge regression employs L2 regularization, where coefficient shrinkage reduces effective model complexity, with regularization strength α selected via five-fold internal cross-validation from the range α ∈ [10⁻³, 10³]. Lasso regression utilizes L1 regularization that induces sparsity through coefficient elimination, providing implicit feature selection with maximum iterations set to 10,000 to ensure convergence. The k-nearest neighbors regressor with k = 7 implements a local averaging approach robust to noise, where the value k ≈ 7 follows established heuristics for small samples using the distance-weighted Manhattan metric. A shallow decision tree with a maximum depth of three constrains complexity to a maximum of eight leaf nodes, ensuring an average leaf sample size of approximately six. Finally, an ensemble regressor performs simple averaging of Ridge and k-NN predictions, reducing variance without increasing individual model complexity through a statistically sound approach for small sample sizes.

The classification model suite similarly comprises algorithms selected for low variance characteristics. Logistic regression with L2 regularization provides calibrated probability outputs with balanced class weighting to address residual imbalance. The Ridge classifier offers fast linear classification with L2 regularization. The k-NN classifier with k = 5 uses an odd value to avoid ties, while distance weighting enhances performance. Gaussian Naive Bayes operates under a strong independence assumption that induces high bias but extremely low variance, often proving effective for small datasets. A shallow decision tree with a maximum depth of three provides interpretable decision rules with class balancing. Random forests with default parameters, gradient boosting machines, support vector machines with complex kernels, and neural network architectures were explicitly excluded from consideration. These exclusions are methodologically justified because with n = 51 samples, such models would fit noise rather than signal, producing optimistic training metrics that fail to generalize.

Leave-one-out cross-validation serves as the primary validation strategy, computed as LOOCV Error = (1/n)Σ_i = 1^nL(y_i, y_-i), where y_-i denotes the prediction for sample i from a model trained on all samples except i. This approach offers substantial advantages for the given sample size: it maximizes training set size by using 50 samples per fold, provides nearly unbiased performance estimates, and operates deterministically without random fold assignment. For classification tasks with minimum class counts below five samples, LOOCV becomes mandatory, while stratified five-fold cross-validation preserves class proportions across folds when class sizes permit.

3.4.2. Validation Protocol and Leakage Prevention

The validation framework implements strict separation between training and evaluation to prevent information leakage:

Outer evaluation: leave-one-out cross-validation (51 folds, each holding out one sample for testing)
Inner tuning: 5-fold stratified cross-validation for hyperparameter selection within each outer fold
Random state: fixed seed (42) ensures reproducibility across all stochastic operations
Feature selection: when RFE is applied, selection occurs within each outer fold using only training data

All preprocessing transformations (encoding and composite index calculation) use globally defined mappings that do not depend on target values, eliminating preprocessing-induced leakage. Regularization parameters (α for Ridge/Lasso) are selected via GridSearchCV within the inner loop, with separate tuning for each outer fold to prevent optimistic bias from parameter sharing.

Three complementary feature selection methods were implemented and consolidated to ensure robust variable selection. Spearman rank correlation is appropriate for ordinal data as it does not assume an interval scale or normality, with features ranked by absolute correlation coefficient with the target. Mutual information captures non-linear dependencies between features and target, computed using k = 5 nearest neighbors estimation. Recursive feature elimination iteratively removes the least important features based on Ridge regression coefficients until six features remain. The final feature ranking is computed as the average rank across all three methods, ensuring robust selection not dependent on any single criterion’s assumptions.

3.5. Experimental Procedure

The training procedure executes a systematic sequence beginning with data loading and preprocessing, where raw CSV data is loaded, cleaned of empty rows, and transformed through the encoding pipeline. Class distribution analysis follows, comparing original versus simplified target distributions to validate the three-class consolidation decision. Baseline evaluation trains all models on 15 original features with five-class targets to establish performance benchmarks. The optimized evaluation phase retrains models on seven composite features, retaining the original five-class target for regression to preserve prediction granularity while using three-class targets for classification tasks. A hybrid approach applies recursive feature elimination to select the top six features from the original fifteen, providing a data-driven feature subset alternative to the theory-driven composite indices. Advanced methods, including ensemble models combining Ridge and k-NN averaging, along with ordinal-aware methods where available, are evaluated for marginal improvements. Finally, a hyperparameter tuning experiment employing Optuna-based (v0.20.0) nested cross-validation with 100 trials demonstrates the performance ceiling, using five-fold cross-validation in the inner loop for parameter selection and LOOCV in the outer loop for unbiased evaluation.

The computational complexity analysis reveals tractable resource requirements. Ridge and Lasso regression require O(p² n + p³) time for coefficient computation via normal equations or iterative solvers, which with p = 7 and n = 51 is effectively constant-time. The k-NN algorithm requires O(n · p) time per prediction for brute-force distance computation and O(n² · p) for full LOOCV, which remains acceptable for the given sample size. Decision trees require O(n · p · log n) time for training with depth constraints. The LOOCV protocol multiplies training cost by n, but with simple models this remains tractable with total execution under 60 s on standard hardware. Space complexity analysis shows feature matrices requiring O(n · p) ≈ O(51 × 7) = O(357) floating-point values, while model parameters for Ridge store p coefficients and k-NN stores the entire training set at O(n · p), yielding a total memory footprint under one megabyte. The computational approach prioritizes methodological correctness over algorithmic efficiency, which is appropriate given the small data scale where even exhaustive methods complete rapidly.

4. Results

The primary regression task predicts perceived AI impact on cost reduction, with performance reported using the coefficient of determination defined as R² = 1 − /Σ_i (y_i − y_i)^2Σ_i (y_i − y)^2. The achieved regression performance of R² = 0.501 was obtained using Lasso regression with RFE-selected features under LOOCV evaluation, with a corresponding Mean Absolute Error of 0.551 and Root Mean Square Error of 0.709, indicating predictions deviate by less than one ordinal class on average. An R² of 0.501 indicates that the model explains approximately 50 percent of variance in perception responses, which for perception prediction tasks based on survey data represents strong performance. This interpretation rests on three considerations. First, irreducible noise in survey responses limits predictable variance because human perception is inherently variable, with the same respondent potentially providing different answers on different occasions due to mood, context, or question interpretation, establishing a hard ceiling regardless of model sophistication. Second, high dimensionality of latent factors means that perception formation involves psychological, social, and experiential factors not fully captured by available survey variables, with unmeasured confounders contributing to residual variance. Third, sample size constraints mean that with n = 51, even optimal models exhibit estimation variance, with the true population R² likely lying within ± 0.10 of the observed value.

4.1. Regularized Model Comparison

Table 5 presents the side-by-side comparison of regularized regression models using identical LOOCV evaluation.

Lasso regression achieves the highest R² (0.501), marginally outperforming Ridge (0.494) and Elastic Net (0.493). The performance differences are small (ΔR² < 0.01), consistent with expectations for regularized linear models on similar feature sets. Lasso’s slight advantage likely reflects its implicit feature selection property (L1 penalty inducing sparsity), which provides additional regularization benefit in the small sample context.

4.2. Feature Importance Analysis

To support interpretability and align findings with technology acceptance constructs, feature importance was assessed through both permutation importance and coefficient magnitude analysis. Table 6 presents the ranked importance of RFE-selected features.

The feature importance rankings align with established technology acceptance constructs [22,23]. Personal AI usage (β = 0.359) emerges as the strongest predictor, consistent with TAM’s “perceived usefulness” construct—direct experience with AI tools shapes expectations of future value. AI impact on budgeting (β = 0.310) reflects prior positive AI outcomes serving as a trust proxy. ICT utilization (β = 0.304) captures general digital competency aligned with “perceived ease of use”—respondents comfortable with digital tools anticipate smoother AI integration. Company size (β = 0.207) represents organizational facilitating conditions, confirming the company-size hypothesis developed earlier. Age emerges as a weak predictor (β = 0.084), suggesting that chronological age is less influential than experiential factors once AI exposure is controlled.

4.3. Sample Characteristics and Distribution

Table 7 presents the distribution of respondents across company size categories, revealing a concentration in medium and large enterprises that reflects the survey recruitment strategy targeting organizations with sufficient AI exposure to provide informed responses.

Table 8 shows the distribution of the primary target variable—perceived AI impact on cost reduction—across the five-point ordinal scale, demonstrating the positive skew characteristic of perception surveys where respondents tend toward favorable assessments.

The three-class consolidation scheme transforms this distribution as shown in Table 9, addressing the statistical instability of minority classes while preserving meaningful ordinal distinctions.

4.4. Company Size and AI Perception Analysis

A central hypothesis of this research posits that organizational scale influences AI adoption perceptions, with larger companies expected to demonstrate more positive attitudes due to greater resource availability, technology exposure, and organizational readiness. Table 10 presents descriptive statistics supporting this hypothesis.

The data reveal a monotonic increase in mean perception scores with company size, from 2.75 for micro-enterprises to 4.31 for large corporations. Table 11 presents the aggregated comparison between smaller and larger organizations.

Statistical Tests:

Mann–Whitney U = 89.5, p < 0.001
Cohen’s d = 1.47 (large effect)
Spearman ρ = 0.52, p < 0.001

The observed difference of 1.26 points on the 5-point scale represents a substantial and statistically significant effect. Cohen’s d of 1.47 exceeds the conventional threshold for a large effect (d > 0.8), indicating that company size explains meaningful variance in AI perception beyond chance. The Mann–Whitney U test confirms statistical significance (p < 0.001), while the Spearman correlation (ρ = 0.52) indicates a moderate-to-strong monotonic relationship between organizational scale and AI perception.

Table 12 presents the cross-tabulation of company size against consolidated perception classes, revealing the distributional patterns underlying the mean differences.

Chi-square test: χ² = 18.01, df = 6, p = 0.006

The chi-square test confirms that the association between company size and perception class is statistically significant (p = 0.006). Notably, 92.3% of large company respondents fall in the high perception category compared to only 25.0% of micro-enterprise respondents—a striking 67-percentage-point gap that underscores the practical significance of organizational scale as a perception determinant.

4.5. Hypothesis Interpretation and Practical Implications

The finding that larger companies demonstrate significantly higher AI perception scores (Δ = 1.26 points, p < 0.001) provides partial support for the hypothesis that organizational scale influences AI adoption attitudes. However, this result should not be interpreted as suggesting that AI investment is redundant for smaller companies. Rather, the evidence suggests differentiated adoption strategies:

Small companies require targeted AI solutions: the lower perception scores among micro and small enterprises may reflect legitimate concerns about implementation complexity, resource constraints, and uncertain ROI at smaller scales. AI solutions for smaller companies should emphasize ease of implementation, minimal infrastructure requirements, and rapid time-to-value.
ROI expectations should be adjusted for scale: larger organizations benefit from economies of scale in AI deployment, amortizing fixed implementation costs across larger operations. Smaller companies should calibrate expectations accordingly, focusing on high-impact, narrowly scoped applications rather than enterprise-wide transformations.
Cloud-based AI services may be more appropriate than infrastructure investments: the perception gap may partially reflect awareness of capital-intensive AI implementations unsuitable for smaller organizations. Cloud-based, subscription-model AI services lower barriers to entry and may be more appropriate for companies lacking dedicated IT infrastructure.
Training and exposure drive perception improvements: the strong correlation between company size and AI perception likely reflects differential exposure to AI technologies and training opportunities. Targeted educational initiatives for smaller company employees could narrow this perception gap.

4.6. Classification Performance

Six perception classification targets were evaluated using the weighted F1-score, calculated as F_1 = 2 · /Precision · RecallPrecision + Recall. The achieved average F1-score of 0.681 reflects individual target performance ranging from F1 = 0.598 for human resources perception, which proved most challenging due to class imbalance, to F1 = 0.756 for administrative burden perception, with logistic regression using balanced class weights achieving the best results on most targets. The weighted F1-score of 0.681 substantially exceeds the stratified random baseline of approximately 0.35 to 0.40, demonstrating that learned models capture meaningful signal beyond class frequency memorization. For three-class ordinal classification with limited samples, this performance indicates effective class separation where models successfully distinguish between low, medium, and high perception categories using available features; robustness to imbalance, where class weighting successfully mitigates remaining imbalance after three-class consolidation; and generalization validity, where LOOCV evaluation ensures reported metrics reflect out-of-sample performance without optimistic bias.

4.7. Performance Ceiling Analysis

Extensive hyperparameter tuning via Optuna with nested cross-validation confirmed that R² ≈ 0.50 to 0.55 represents the achievable ceiling for this dataset. The gap between optimistic non-nested and honest nested evaluation estimates approached 0.05 to 0.08, highlighting overfitting risk even with 100 tuning trials. Expecting R² > 0.55 is unrealistic without fundamental changes to data collection for three reasons. The Bayes error floor established by survey response noise creates irreducible error independent of model choice. Effective degrees of freedom constraints with n = 51 and p_eff ≈ 6 to 7 features necessarily limit model flexibility. Missing explanatory variables mean that perception formation depends on factors, including personality traits, prior experiences, and organizational culture that are not measured in the current instrument.

4.8. Decision Support Interface Outputs and Correlation Insights

To complement the quantitative model evaluation, the developed system provides an interpretable decision support interface that translates model inputs and composite indices into user-facing profile outputs. The interface is designed to support practical interpretation of predicted perceptions and expected impacts at both the individual respondent level and the organizational (company) level, thereby enabling targeted interventions aligned with Industry 5.0’s human-centric adoption logic.

Figure 1 presents the Individual Profile Assessment view, which allows users to enter basic demographic descriptors (job position, work experience, and age group) together with key readiness variables (digital competencies, personal AI usage, AI training, and ICT utilization). The system subsequently generates an Individual Competency Profile (radar chart) and an Individual AI Readiness score (gauge), computed from the composite readiness structure used in the modeling pipeline. The visualization provides an immediate summary of how a respondent’s capability and exposure profile relate to expected AI adoption perceptions.

From an interpretability standpoint, this output supports two main functions:

(i): it clarifies which readiness components dominate the respondent’s profile (e.g., stronger digital competencies but only moderate AI usage), and
(ii): it enables practical discussion of interventions (e.g., training or structured exposure) without requiring stakeholders to interpret model coefficients directly.

Figure 2 shows the company characteristics and expected AI impact assessment views. The tool captures company-level descriptors (company size, organizational digitalization level, and company AI usage level) and links them to two outputs:

(i): a company digital maturity indicator (gauge-based digitalization index), and
(ii): an expected AI impact profile across operational domains (budgeting, design, project management, marketing, and logistics), consistent with the AI Impact Index construction described in the methodology.

This view is particularly useful because it aligns model reasoning with decision contexts familiar to managers: instead of predicting abstract “AI adoption”, the tool expresses expected impacts in recognizable functional areas. The visual profile also supports the interpretation of heterogeneity observed in the dataset: even with similar demographic composition, organizations may differ substantially in digital maturity and AI usage, which is reflected in expected impact patterns.

To support managerial interpretation, the interface includes a hypothesis-oriented comparison focusing on organizational scale. Figure 3 illustrates the Hypothesis Analysis: small company AI investment, where AI perception is compared across company size categories using a consistent gauge visualization. The displayed pattern is monotonic, with higher perceived AI readiness/perception associated with larger company size categories, which is consistent with the statistical results reported earlier (significant differences between smaller and larger organizations).

Importantly, the interface does not present this finding as a simplistic “small companies should not invest” conclusion. Instead, Figure 4 provides an ROI Considerations by Company Size table that frames adoption as a scale-sensitive decision problem. Table 13 emphasizes proportional investment, realistic ROI horizons, and matching solution complexity to organizational resources (e.g., lightweight SaaS automation and assistive AI for micro/small firms versus integrated solutions and custom modeling for medium/large firms). This framing is consistent with the empirical finding that perception and readiness are not uniform across organizational scale, and it translates statistical evidence into a practical decision narrative.

Figure 4 presents a correlation heatmap summarizing relationships among demographic factors, readiness variables, composite indices, and perception/impact outcomes. The results support the conceptual logic embedded in the predictive framework: the AI Impact Index and perception-related outcomes show meaningful positive associations with AI experience/exposure, ICT utilization, and AI usage intensity, which indicates that perceived value is linked to actual exposure and capability-building rather than to demographic factors alone. In practical terms, this strengthens the interpretation that adoption perceptions can be improved through targeted interventions such as training, structured experimentation, and increased workflow integration of AI tools.

The heatmap also illustrates that organizational characteristics contribute to perception patterns, supporting the subgroup results presented in the company-size analysis. This provides a coherent triangulation: (i) statistical tests show significant group differences, (ii) model performance indicates predictable structure in the data, and (iii) correlation patterns confirm that readiness and usage variables align with perceived impact dimensions.

5. Discussion

5.1. Conclusion Derived from Experimental Results

The experimental results validate the methodological framework’s effectiveness for small sample perception prediction in Industry 5.0 contexts. The achieved regression performance of R² = 0.501 and classification performance of F1 = 0.681 represent statistically and mathematically optimal outcomes given the boundary conditions of limited sample size and high-dimensional perception targets.

The methodological contributions of this work span four key areas. Bias-variance optimization through explicit model selection based on variance reduction rather than flexibility maximization proved essential, with low variance models, including Ridge, k-NN with moderate neighborhood size, and shallow trees consistently outperforming complex alternatives that would overfit 51 samples. Target simplification through five-class to three-class consolidation addressed statistical instability arising from severely imbalanced minority classes while preserving meaningful ordinal distinctions. Composite index construction employing theory-driven feature aggregation reduced dimensionality from 15 to seven features, improving the samples-per-feature ratio from 3.4 to 7.3 without sacrificing predictive signal. LOOCV validation maximized training data utilization and provided nearly unbiased performance estimates appropriate for the sample size.

Evidence confirming successful overfitting prevention includes regularization parameters selected via internal cross-validation rather than manual tuning, LOOCV ensuring each test prediction uses only training data for that fold, nested cross-validation for hyperparameter tuning experiments isolating tuning decisions from final evaluation, and baseline comparisons with the mean predictor and stratified random classifier verifying that models learn beyond trivial patterns.

5.1.1. Connection to Prior Literature

The findings substantively extend, confirm, and provide nuance to prior work on technology acceptance in construction contexts. The achieved R² = 0.501 compares favorably to benchmarks from related small sample perception studies: customer satisfaction prediction typically achieves R² ≈ 0.35–0.50 [48], while healthcare quality perception studies report R² ≈ 0.45–0.60 [48]. The current results position construction AI perception prediction within the upper range of comparable domains, suggesting that the predictive signal in AI adoption perceptions is neither weaker nor stronger than other attitudinal constructs.

The company-size effect (Cohen’s d = 1.47) extends findings from Tjebane et al. [40], who reported qualitative differences in AI readiness by organizational scale. The current results quantify this effect with statistical precision: the 1.26-point perception gap between smaller and larger organizations represents a meaningful barrier that prior descriptive studies identified but did not estimate. This quantification enables resource allocation decisions—organizations can now estimate the magnitude of perception intervention required for different company size segments.

The feature importance rankings confirm and extend TAM and UTAUT constructs [22,23] to the AI-specific context. Personal AI usage emerging as the strongest predictor (β = 0.359) aligns with the “perceived usefulness” construct but adds the insight that direct experience matters more than formal training (AI_Training β = 0.244). This extends Emaminejad et al. [39]’s findings on trust in construction AI by suggesting that hands-on exposure may be more influential than structured educational interventions—a finding with direct implications for training program design.

The results partially contradict assumptions about demographic determinism in technology acceptance. Age emerged as the weakest predictor (β = 0.084), challenging narratives that frame AI adoption barriers as generational issues. This finding aligns with Lu and Deng’s [41] observation that digital competency mediates age effects, suggesting that chronological age itself is less relevant than accumulated digital experience.

5.1.2. Managerial Implications

The predictive framework enables concrete organizational interventions based on empirically validated perception determinants:

Targeted Training Programs: given that personal AI usage (β = 0.359) and AI training (β = 0.244) are significant predictors, organizations should prioritize hands-on AI exposure over lecture-based training. Recommended approach: implement structured AI tool trials where employees use AI-assisted design, scheduling, or budgeting tools on pilot projects, with 2–4 h of supervised experimentation weekly over 4–6 weeks before formal adoption decisions.
Role-Specific Communication Strategies: the model enables perception prediction for individual employees based on their profile characteristics. Organizations can segment their workforce and customize messaging: employees with low predicted perception scores (high ICT utilization but low AI exposure) may respond to demonstrations of AI-human collaboration, while those with prior negative AI experiences may require addressing specific concerns before broader adoption messaging.
Strategic Pilot Selection: the company-size effect (Δ = 1.26 points) suggests that AI pilots in smaller firms require additional support infrastructure. For micro and small enterprises, recommended approach: partner with technology vendors or industry associations to provide shared AI resources, reducing per-firm investment while enabling exposure that improves perception.
Workflow Redesign Before Technology Introduction: the importance of ICT utilization (β = 0.304) suggests that general digital competency precedes AI-specific acceptance. Organizations with low baseline digitalization should first establish foundational digital workflows (digital document management, collaborative scheduling tools) before introducing AI capabilities—attempting to leapfrog directly to AI may encounter resistance rooted in general digital discomfort.
Intervention Prioritization: using the model coefficients, organizations can estimate return on investment for different perception interventions. Increasing personal AI usage by one Likert point yields approximately 0.36 points improvement in cost reduction perception, while company-wide training programs yield approximately 0.24 points. This enables evidence-based allocation of limited change management resources.

5.2. Future Improvements

Incremental improvements to prediction performance are achievable through targeted extensions. Sample size expansion from 51 to 150–200 respondents would enable more complex model exploration, including ensemble methods and shallow neural networks, reduced confidence intervals around performance estimates, and potential for held-out test set evaluation beyond cross-validation. Feature space enrichment through the introduction of additional explanatory variables would incorporate psychological factors such as technology acceptance disposition and risk tolerance, demographic details, including education level and prior industry experience, and contextual variables encompassing regional economic indicators and company financial performance. Model refinement with larger samples would make viable ordinal regression architectures using proportional odds models, attention-weighted feature combinations, and transfer learning from related industrial perception datasets. These extensions represent data-driven improvements rather than algorithmic advances, consistent with the finding that current performance bounds are determined by data characteristics rather than model capacity. Performance improvements of 0.05 to 0.10 in R² per 50 additional respondents represent realistic expectations, with asymptotic limits likely near R² ≈ 0.65 to 0.70 even with substantially expanded datasets, reflecting fundamental uncertainty in human perception measurement.

6. Conclusions

This research establishes a methodologically rigorous framework for AI-based perception prediction in data-constrained Industry 5.0 environments, with specific application to artificial intelligence adoption in the construction sector. The work addresses a fundamental challenge in contemporary industrial research: developing reliable predictive systems when sample acquisition is inherently limited by specialized domain expertise requirements, survey response rates, and nascent technology adoption stages.

The principal contribution of this work lies in demonstrating that statistically principled approaches specifically designed for small sample regimes can achieve meaningful predictive performance without resorting to complex architectures that would inevitably overfit limited data. The achieved performance metrics of R² = 0.501 for regression and weighted F1 = 0.681 for classification represent not merely acceptable results, but optimal outcomes given the fundamental constraints imposed by sample size, feature dimensionality, and the inherent variability of human perception measurement. This finding carries significant implications for the broader machine learning community, where the predominant focus on large-scale datasets and deep architectures often overshadows the practical reality that many industrial and social science applications operate in data-scarce environments.

The methodological framework developed herein provides a replicable blueprint for researchers and practitioners confronting similar constraints. The systematic approach encompassing target simplification through ordinal consolidation, dimensionality reduction via theoretically grounded composite indices, exclusive deployment of low variance regularized models, and rigorous validation through leave-one-out cross-validation collectively ensures that reported performance reflects genuine generalization capability rather than spurious pattern memorization. This framework extends beyond the specific context of construction industry AI perception to any domain where expert opinion measurement intersects with limited respondent availability.

From an Industry 5.0 perspective, this research addresses a critical gap in understanding human-AI collaboration dynamics during the early stages of technological transformation. The construction sector exemplifies industries undergoing fundamental restructuring, where artificial intelligence promises significant operational improvements yet faces substantial adoption barriers rooted in workforce perception, organizational culture, and technological readiness. The ability to predict perception patterns from measurable demographic and experiential variables enables targeted intervention strategies, including customization of training programs based on predicted resistance levels, prioritization of communication efforts toward high-impact demographic segments, and resource allocation for change management initiatives aligned with predicted adoption trajectories. These applications transform perception prediction from an academic exercise into an actionable decision support tool for organizational leaders navigating technological transitions.

The research acknowledges inherent limitations that bound the scope of its conclusions. The sample size of 51 respondents, while typical for specialized industrial surveys, fundamentally constrains model complexity and introduces estimation uncertainty that wider sampling would reduce. The geographic and temporal specificity of data collection within the Slovak construction industry in 2025 limits direct generalizability to other regions, industries, or time periods, though the methodological approach remains transferable. The reliance on self-reported survey data introduces measurement error through respondent interpretation variability, social desirability bias, and potential gaps between stated perceptions and actual behaviors. The cross-sectional design captures perception at a single temporal snapshot, whereas longitudinal tracking would reveal perception evolution as AI adoption progresses and industry experience accumulates.

Extended Limitations Discussion

Geographic and Cultural Context. The Slovak construction industry operates within a Central European economic and regulatory environment that may influence AI perception patterns in ways not generalizable to other contexts. Slovakia’s construction sector is characterized by a high proportion of small and medium enterprises, relatively recent digital transformation initiatives compared to Western European markets, and specific labor market dynamics influenced by EU integration and regional economic development patterns. Cultural factors, including attitudes toward technology, organizational hierarchy, and risk tolerance, may systematically differ from other national contexts. Readers should interpret findings as hypothesis-generating for other regions rather than directly applicable predictions.

Methodological Limits of Perception Prediction. Predicting subjective perceptions from small samples faces fundamental methodological constraints. First, perception is inherently noisy—the same individual may provide different responses on different occasions due to mood, recent experiences, or question interpretation. This test-retest variability establishes an irreducible error floor that no model can overcome. Second, perception formation involves psychological constructs (risk tolerance, openness to experience, and cognitive style) not captured in the current instrument, meaning that a portion of between-individual variance is systematically unmeasurable with available features. Third, small samples (n = 51) provide limited statistical power for detecting moderate effects and produce wide confidence intervals around parameter estimates—the true population R² may plausibly range from 0.35 to 0.65 given sampling variability.

Future Improvements for Validity. Longitudinal and multi-source data collection approaches could substantially improve prediction validity. Longitudinal designs tracking the same respondents over 12–24 months as AI exposure accumulates would enable: (a) validation of prediction stability, (b) investigation of perception change dynamics, and (c) assessment of intervention effectiveness through before-after comparisons. Multi-source validation incorporating behavioral data (actual AI tool usage logs and productivity metrics) alongside self-reported perceptions would address social desirability bias and close the attitude-behavior gap. Multi-informant approaches collecting perceptions from both individual workers and their supervisors would enable triangulation and reduce single-source bias.

Future research directions emerge naturally from these limitations and the established foundation. Longitudinal extensions tracking the same respondent cohort across multiple time points would enable investigation of perception dynamics, validation of prediction stability, and assessment of intervention effectiveness. Cross-industry comparative studies applying the identical methodological framework to manufacturing, healthcare, or logistics sectors would test framework generalizability and identify industry-specific versus universal perception determinants. International replication studies across diverse geographic and cultural contexts would illuminate the extent to which perception formation mechanisms transcend regional boundaries versus require localization. Integration of behavioral data linking predicted perceptions to actual technology adoption decisions would validate the practical utility of perception prediction and enable closed-loop optimization of intervention strategies. Advanced modeling explorations viable with expanded sample sizes could investigate hierarchical models accounting for organizational clustering effects, mixture models identifying latent perception subgroups, and causal inference frameworks distinguishing predictive associations from causal mechanisms.

The broader significance of this work extends beyond its immediate empirical findings to its demonstration that methodological rigor can extract meaningful insights even from severely constrained data environments. In an era where machine learning discourse often equates sophistication with architectural complexity and dataset scale, this research affirms the enduring value of statistical fundamentals: understanding bias-variance trade-offs, matching model complexity to sample size, implementing appropriate regularization, and validating performance through honest evaluation protocols. These principles, though developed decades ago in classical statistical learning theory, remain as essential in contemporary AI applications as they were in their original formulation.

For practitioners in Industry 5.0 contexts, this work delivers a clear message: effective AI deployment does not require massive datasets or complex deep learning systems when the problem structure, proper feature engineering, and appropriate model selection can compensate for data limitations. The construction industry and similar sectors should not delay digital transformation initiatives while awaiting ideal data conditions that may never materialize. Instead, they can proceed confidently with available data by applying methodologically sound approaches that acknowledge and work within realistic constraints.

This research ultimately contributes to the evolving understanding of human-centered artificial intelligence in industrial contexts, where technological capability must align with human perception, acceptance, and integration to achieve sustainable transformation. The developed framework and empirical findings provide both a methodological template and empirical evidence supporting the feasibility of perception-aware AI deployment strategies in data-constrained Industry 5.0 environments. Future work building upon this foundation will further refine our ability to predict, understand, and ultimately shape the human dimensions of industrial AI adoption.

Author Contributions

Conceptualization, A.B. and M.P.; methodology, M.P.; software, M.P.; validation, T.M. and M.B.; formal analysis, M.P.; investigation, M.B.; resources, A.B. and T.M.; data curation, M.P. and T.M.; writing—original draft preparation, T.M. and M.P.; writing—review and editing, A.B.; visualization, M.P. and T.M.; supervision, M.B.; project administration, A.B.; funding acquisition, M.B. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

Paper presents a partial research result of the project of the Slovak Research and Development Agency under contract no. APVV-22-0576 “Research of digital technologies and building information modeling tools for designing and evaluating the sustainability parameters of building structures in the context of decarbonization and circular construction”. The paper presents a partial research result of the project of the Ministry of Education, Science, Research, and Sport of the Slovak Republic under contract no. KEGA 075TUKE-4/2024, “Customization of Higher Education through the Implementation of Industry 4.0 Tools—Visualization of Mining Processes for Practical Education of the Study Program Earth Resources Management”, and KEGA 054TUKE-4/2024, “Circular Construction Academy, Low Carbon and Green Solutions: Educational Platform on Sustainable Construction Transition”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, X.; Lu, Y.; Vogel-Heuser, B.; Wang, L. Industry 4.0 and Industry 5.0-Inception, conception and perception. J. Manuf. Syst. 2021, 61, 530–535. [Google Scholar] [CrossRef]
Ghobakhloo, M.; Iranmanesh, M.; Mubarak, M.F.; Mubarik, M.; Rejeb, A.; Nilashi, M. Identifying industry 5.0 contributions to sustainable development: A strategy roadmap for delivering sustainability values. Sustain. Prod. Consum. 2022, 33, 716–737. [Google Scholar] [CrossRef]
Regona, M.; Yigitcanlar, T.; Xia, B.; Li, R.Y.M. Opportunities and adoption challenges of AI in the construction industry: A PRISMA review. J. Open Innov. Technol. Mark. Complex. 2022, 8, 45. [Google Scholar] [CrossRef]
Brigato, L.; Iocchi, L. A close look at deep learning with small data. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2009–2016. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Moradi, R.; Berangi, R.; Minaei, B. A survey of regularization strategies for deep models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
Wong, T.T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Nahavandi, S. Industry 5.0-A human-centric solution. Sustainability 2019, 11, 4371. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Autom. Constr. 2021, 122, 103517. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. Integrating BIM and AI for smart construction management: Current status and future directions. Arch. Comput. Methods Eng. 2023, 30, 1081–1110. [Google Scholar] [CrossRef]
Na, S.; Heo, S.; Han, S.; Shin, Y.; Roh, Y. Acceptance model of artificial intelligence (AI)-based technologies in construction firms: Applying the Technology Acceptance Model (TAM) in combination with the Technology–Organisation–Environment (TOE) framework. Buildings 2022, 12, 90. [Google Scholar] [CrossRef]
Musarat, M.A.; Hameed, N.; Altaf, M.; Alaloul, W.S.; Shafiq, N. Digital transformation of the construction industry: A review. In Proceedings of the International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 7–8 November 2021; pp. 897–902. [Google Scholar] [CrossRef]
Adekunle, S.A.; Aigbavboa, C.O.; Ejohwomu, O.; Oyedele, L.O. Digital transformation in the construction industry: A bibliometric review. J. Eng. Des. Technol. 2024, 22, 130–158. [Google Scholar] [CrossRef]
Tjebane, M.M.; Musonda, I.; Okoro, C.S. Organisational factors of artificial intelligence adoption in the South African construction industry. Front. Built Environ. 2022, 8, 823998. [Google Scholar] [CrossRef]
Na, S.; Heo, S.; Choi, W.; Kim, C.; Whang, S.W. Artificial intelligence (AI)-Based technology adoption in the construction industry: A cross national perspective using the technology acceptance model. Buildings 2023, 13, 2518. [Google Scholar] [CrossRef]
Oesterreich, T.D.; Teuteberg, F. Understanding the implications of digitisation and automation in the context of Industry 4.0: A triangulation approach and elements of a research agenda for the construction industry. Comput. Ind. 2016, 83, 121–139. [Google Scholar] [CrossRef]
Shang, G.; Low, S.P.; Lim, X.Y.V. Prospects, drivers of and barriers to artificial intelligence adoption in project management. Built Environ. Proj. Asset Manag. 2023, 13, 629–645. [Google Scholar] [CrossRef]
Lu, J.; Plataniotis, K.N.; Venetsanopoulos, A.N. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition. Pattern Recognit. Lett. 2005, 26, 181–191. [Google Scholar] [CrossRef]
Varoquaux, G. Cross-validation failure: Small sample sizes lead to large error bars. NeuroImage 2018, 180, 68–77. [Google Scholar] [CrossRef]
Davis, F.D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 1989, 13, 319–340. [Google Scholar] [CrossRef]
Venkatesh, V.; Morris, M.G.; Davis, G.B.; Davis, F.D. User acceptance of information technology: Toward a unified view. MIS Q. 2003, 27, 425–478. [Google Scholar] [CrossRef]
Turner, R. Diffusion of Innovations, 5th edition, Everett M. Rogers. Free Press, New York, NY (2003), 551 pages. J. Minim. Invasive Gynecol. 2007, 14, 776. [Google Scholar] [CrossRef]
Guan, X.; Burton, H. Bias-variance tradeoff in machine learning: Theoretical formulation and implications to structural engineering applications. Structures 2022, 46, 17–30. [Google Scholar] [CrossRef]
Alves, J.; Lima, T.M.; Gaspar, P.D. Is industry 5.0 a human-centred approach? A systematic review. Processes 2023, 11, 193. [Google Scholar] [CrossRef]
Martini, B.; Bellisario, D.; Coletti, P. Human-centered and sustainable artificial intelligence in industry 5.0: Challenges and perspectives. Sustainability 2024, 16, 5448. [Google Scholar] [CrossRef]
Grosse, E.H.; Sgarbossa, F.; Berlin, C.; Neumann, W.P. Human-centric production and logistics system design and management: Transitioning from Industry 4.0 to Industry 5.0. Int. J. Prod. Res. 2023, 61, 7749–7759. [Google Scholar] [CrossRef]
Samuelson, O.; Stehn, L. Digital transformation in construction—A review. J. Inf. Technol. Constr. 2023, 28, 385–404. [Google Scholar] [CrossRef]
Naji, K.K.; Gunduz, M.; Alhenzab, F.H.; Al-Hababi, H.; Al-Qahtani, H. A systematic review of the digital transformation of the building construction industry. IEEE Access 2024, 12, 18483–18513. [Google Scholar] [CrossRef]
Rane, N. Integrating building information modelling (BIM) and artificial intelligence (AI) for smart construction schedule, cost, quality, and safety management: Challenges and opportunities. SSRN Electron. J 2023, 4616055. [Google Scholar] [CrossRef]
Bassir, D.; Lodge, H.; Chang, H.; Majak, J.; Zhu, W. Application of artificial intelligence and machine learning for BIM. Int. J. Simul. Multidiscip. Des. Optim. 2023, 14, 5. [Google Scholar] [CrossRef]
Bolpagni, M.; Bartoletti, I.; Ciribini, A. Artificial Intelligence in the Construction Industry: Adoption, Benefits and Risks. 2021. Available online: https://www.researchgate.net/profile/Marzia_Bolpagni/publication/355192584_Artificial_Intelligence_in_the_Construction_Industry_Adoption_Benefits_and_Risks/links/616694e43851f95994f76c33/Artificial-Intelligence-in-the-Construction-Industry-Adoption-Benefits-and-Risks.pdf (accessed on 10 February 2026).
Compare, M.; Baraldi, P.; Zio, E. Challenges to IoT-enabled predictive maintenance for industry 4.0. IEEE Internet Things J. 2019, 7, 4585–4597. [Google Scholar] [CrossRef]
Cheng, J.C.P.; Chen, W.; Chen, K.; Wang, Q. Data-driven predictive maintenance planning framework for MEP components based on BIM and IoT using machine learning algorithms. Autom. Constr. 2020, 112, 103087. [Google Scholar] [CrossRef]
Teoh, Y.K.; Gill, S.S.; Parlikad, A.K. IoT and fog-computing-based predictive maintenance model for effective asset management in Industry 4.0 using machine learning. IEEE Internet Things J. 2021, 10, 2087–2094. [Google Scholar] [CrossRef]
Marangunić, N.; Granić, A. Technology acceptance model: A literature review from 1986 to 2013. Univers. Access Inf. Soc. 2015, 14, 81–95. [Google Scholar] [CrossRef]
Park, S.; Yu, H.; Menassa, C.C.; Kamat, V.R. A comprehensive evaluation of factors influencing acceptance of robotic assistants in field construction work. J. Manag. Eng. 2023, 39, 04023015. [Google Scholar] [CrossRef]
Emaminejad, N.; Kath, L.; Akhavian, R. Assessing trust in construction AI-powered collaborative robots using structural equation modeling. J. Comput. Civ. Eng. 2024, 38, 04023048. [Google Scholar] [CrossRef]
Tjebane, M.M.; Musonda, I.; Okoro, C.S. A systematic literature review of influencing factors and strategies of artificial intelligence adoption in the construction industry. IOP Conf. Ser. Mater. Sci. Eng. 2022, 1218, 012001. [Google Scholar] [CrossRef]
Lu, Y.; Deng, Y. What drives construction practitioners’ acceptance of intelligent surveillance systems? An extended technology acceptance model. Buildings 2022, 12, 104. [Google Scholar] [CrossRef]
Ebrahimi, V.; Bagheri, Z.; Shayan, Z.; Jafari, P. A machine learning approach to assess differential item functioning in psychometric questionnaires using the elastic net regularized ordinal logistic regression in small samples. BioMed Res. Int. 2021, 2021, 6854477. [Google Scholar] [CrossRef]
Finch, W.H.; Finch, M.E.H. Regularization methods for fitting linear models with small sample sizes: Fitting the Lasso estimator using R. Pract. Assess. Res. Eval. 2016, 21, 7. [Google Scholar] [CrossRef]
Braga, I.; Monard, M.C. Improving the kernel regularized least squares method for small-sample regression. Neurocomputing 2015, 163, 106–114. [Google Scholar] [CrossRef]
Man, S.S.; Alabdulkarim, S.; Chan, A.H.S.; Zhang, T. The acceptance of personal protective equipment among Hong Kong construction workers: An integration of technology acceptance model and theory of planned behavior. J. Saf. Res. 2021, 79, 329–340. [Google Scholar] [CrossRef]
McCullagh, P. Regression models for ordinal data. J. R. Stat. Soc. Ser. B 1980, 42, 109–142. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Churchill, N.W.; Yourganov, G.; Strother, S.C. Comparing within-subject classification and regularization methods in fMRI for large and small sample sizes. Hum. Brain Mapp. 2014, 35, 4499–4517. [Google Scholar] [CrossRef] [PubMed]
Wen, F.; Chu, L.; Liu, P.; Qiu, R.C. A survey on nonconvex regularization-based sparse and low-rank recovery in signal processing, statistics, and machine learning. IEEE Access 2018, 6, 69883–69906. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
Geroldinger, A.; Lusa, L.; Nold, M.; Heinze, G. Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures-a simulation study. Diagn. Progn. Res. 2023, 7, 9. [Google Scholar] [CrossRef]
Braga-Neto, U.M.; Dougherty, E.R. Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004, 20, 374–380. [Google Scholar] [CrossRef]
Cawley, G.C.; Talbot, N.L.C. Efficient leave-one-out cross-validation of kernel fisher discriminant classifiers. Pattern Recognit. 2003, 36, 2585–2592. [Google Scholar] [CrossRef]
Varma, S.; Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 2006, 7, 91. [Google Scholar] [CrossRef]
Cawley, G.C. Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In Proceedings of the 2006 IEEE International Joint Conference on Neural Network, Vancouver, BC, Canada, 16–21 July 2006; pp. 1661–1668. [Google Scholar] [CrossRef]
Breque, M.; De Nul, L.; Petridis, A. Industry 5.0: Towards a Sustainable, Human-Centric and Resilient European Industry. European Commission, Directorate-General for Research and Innovation. 2021. Available online: https://ideas.repec.org/p/eug/wpaper/ki-bd-20-021-en-n.html (accessed on 10 February 2026).
Kagermann, H.; Wahlster, W.; Helbig, J. Recommendations for Implementing the Strategic Initiative Industrie 4.0: Final Report of the Industrie 4.0 Working Group; Acatech (National Academy of Science and Engineering): Washington, DC, USA, 2013. [Google Scholar]
Begić, H.; Galić, M.; Dolaček-Alduk, Z. Digitalization and automation in construction project’s life-cycle: A review. J. Inf. Technol. Constr. 2022, 27, 441–460. [Google Scholar] [CrossRef]
Prebanić, K.R.; Vukomanović, M. Realizing the need for digital transformation of stakeholder management: A systematic review in the construction industry. Sustainability 2021, 13, 12690. [Google Scholar] [CrossRef]
Pal, U.K.; Zhang, C.; Haupt, T.C.; Li, H.; Su, L. The evolution of Construction 5.0: Challenges and opportunities. Buildings 2024, 14, 4010. [Google Scholar] [CrossRef]
Yitmen, I. Facilitating Construction 5.0 for smart, sustainable and human-centric construction. Smart Sustain. Built Environ. 2024; in press. [Google Scholar] [CrossRef]
Marović, I.; Perić, M.; Hanák, T. A multi-criteria decision support concept for selecting the optimal contractor. Appl. Sci. 2021, 11, 1660. [Google Scholar] [CrossRef]
Radziszewska-Zielińska, E. The application of multi-criteria analysis in the evaluation of partnering relations and the selection of a construction company to cooperate with. Arch. Civ. Eng. 2016, 62, 168–182. [Google Scholar] [CrossRef]
Decký, M.; Ďuriš, L.; Kováč, M. Using of laser scanners in pavement management. IOP Conf. Ser. Earth Environ. Sci. 2021, 906, 012077. [Google Scholar] [CrossRef]
Mésároš, P.; Završki, I.; Theodosiou, N.; Poças Martins, J.; Spišáková, M.; Kaleja, P.; Sigmund, Z.; Mihić, M. Increasing construction safety through virtual reality. In Proceedings of the Digital Transformation of Health and Safety in Construction (CIBW099W123), Porto, Portugal, 21–22 June 2023; Fidelis, E., Sherratt, F., Soeiro, A., Eds.; International Council for Research and Innovation in Building and Construction: Kanata, ON, Canada, 2023. [Google Scholar] [CrossRef]
Catalá Pérez, D.; Perelló-Marín, M.R.; Carrascosa López, C. (Eds.) Business Meets Technology: 5th International Conference (BMT 2023), Valencia, Spain, 13–15 July 2023; Editorial Universitat Politècnica de València: Valencia, Spain, 2023; ISBN 978-84-1396-156-9. [Google Scholar]
Knapčíková, L. (Ed.) New Approaches in Management of Smart Manufacturing Systems: Knowledge and Practice; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Egwim, C.N.; Alaka, H.; Demir, E.; Balogun, H.; Olu-Ajayi, R.; Sulaimon, I.; Wusu, G.; Yusuf, W.; Muideen, A.A. Artificial Intelligence in the Construction Industry: A Systematic Review of the Entire Construction Value Chain Lifecycle. Energies 2024, 17, 182. [Google Scholar] [CrossRef]
Khan, A.M.; Alrasheed, K.A.; Waqar, A.; Almujibah, H.; Benjeddou, O. Internet of things (IoT) for safety and efficiency in construction building site operations. Sci. Rep. 2024, 14, 28914. [Google Scholar] [CrossRef]
Kagermann, H.; Wahlster, W.; Helbig, J. Recommendations for Implementing the Strategic Initiative Industrie 4.0 (Final Report); National Academy of Science and Engineering: Munich, Germany, 2013. [Google Scholar]
Culot, G.; Nassimbeni, G.; Orzes, G.; Sartor, M. Behind the definition of Industry 4.0: Analysis and open questions. Int. J. Prod. Econ. 2020, 226, 107617. [Google Scholar] [CrossRef]
Getuli, V.; Capone, P.; Bruttini, A.; Isaac, S. BIM-based immersive Virtual Reality for construction workspace planning: A safety-oriented approach. Autom. Constr. 2020, 114, 103160. [Google Scholar] [CrossRef]
Yigitcanlar, T.; Cugurullo, F. The sustainability of artificial intelligence: An urbanistic viewpoint from the lens of smart and sustainable cities. Sustainability 2020, 12, 8548. [Google Scholar] [CrossRef]
Liang, C.-J.; Wang, X.; Kamat, V.R.; Menassa, C.C. Human–Robot Collaboration in Construction: Classification and Research Trends. J. Constr. Eng. Manag. 2021, 147, 03121006. [Google Scholar] [CrossRef]
Fernández, A.; Muñoz-La Rivera, F.; Mora-Serrano, J. Virtual Reality Training for Occupational Risk Prevention: Application Case in Geotechnical Drilling Works. Int. J. Comput. Methods Exp. Meas. 2023, 11, 55–63. [Google Scholar] [CrossRef]
Ulker Senkulak, B.; Kanoglu, A.; Ozcevik, O. SIMURG_CITIES Conceptual Model: Multi-Dimensional and Multi-Layer Performance-Based Assessment of Urban Sustainability at the City Level. Chall. Sustain. 2025, 13, 425–444. [Google Scholar] [CrossRef]
Lyu, X.; Ramasamy, S.S.; Ying, F. Digital Virtual Anchors Impact in Entertainment Industry: An Exploration of User Acceptance and Market Insights. J. Res. Innov. Technol. 2025, 4, 125–141. [Google Scholar] [CrossRef]

Figure 1. Screenshot of the decision support tool: individual profile assessment, including demographic inputs, skills and training parameters, radar-based competency profile, and AI readiness gauge.

Figure 2. Screenshot of the decision support tool: company characteristics and expected AI impact assessment, including company descriptors, expected impact ratings by domain, radar-based impact profile, and company digital maturity gauge.

Figure 3. Hypothesis-oriented visualization comparing AI perception/readiness across company size categories (micro/small/medium/large).

Figure 4. Correlation heatmap of demographic variables, readiness indicators, composite indices, and perception/impact outcomes.

Table 1. Target variable definitions and scale types.

Variable Name	Type	Scale	Description
AI_Cost_Reduction	Regression	1–5 ordinal	Perceived impact of AI on cost reduction in construction projects
AI_Automation	Classification	3-class	Expected level of task automation through AI deployment
AI_Materials_Optimization	Classification	3-class	Perceived AI impact on materials management and waste reduction
AI_Project_Monitoring	Classification	3-class	Expected AI contribution to project progress monitoring
AI_HR_Management	Classification	3-class	Perceived AI impact on human resources and workforce management
AI_Admin_Reduction	Classification	3-class	Expected reduction in administrative burden through AI
AI_Intelligent_Planning	Classification	3-class	Perceived AI contribution to intelligent planning and scheduling

Table 2. Composite index construction.

Index Name	Component Variables	Aggregation	Interpretation
AI Experience Index	AI_Util_Personal, Company_AI_Adoption, AI_Training	Arithmetic mean	Individual AI ecosystem exposure
Digitalization Index	ICT_Utilization, Org_Digitalization, Digital_Competencies	Arithmetic mean	Digital environment maturity
AI Impact Index	AI_Impact_Budgeting, AI_Impact_Design, AI_Impact_PM, AI_Impact_Marketing, AI_Impact_Logistics	Arithmetic mean	Perceived AI operational impact

Table 3. Target merging strategy comparison.

Strategy	Class Mapping	Min Class Size	Performance (F1)
5-class (original)	1, 2, 3, 4, 5	2	0.42 (unstable)
3-class (1–2/3/4–5)	Low/medium/high	6	0.68
3-class (1/2–3/4–5)	Low-alt/medium-alt/high	4	0.61
2-class (1–3/4–5)	Negative/positive	13	0.71

Table 4. Regularized vs. high variance model comparison.

Model	MAE	RMSE	R²	Model Type
Lasso (L1 regularization)	0.551	0.709	0.501	Low variance
Ridge (L2 regularization)	0.559	0.714	0.494	Low variance
Elastic net (L1 + L2)	0.556	0.715	0.493	Low variance
Random forest (depth = 3)	0.568	0.770	0.412	High variance (constrained)
Gradient boosting (depth = 3)	0.666	0.845	0.292	High variance (constrained)

Table 5. Regularized model performance comparison.

Model	MAE	RMSE	R²
Lasso (L1)	0.551	0.709	0.501
Ridge (L2)	0.559	0.714	0.494
Elastic Net (L1 + L2)	0.556	0.715	0.493

Table 6. Feature importance rankings (Lasso model).

Feature	Permutation Importance	Coefficient \|β\|	Interpretation
AI_Util_Personal	0.291 (0.083 SD)	0.359	Personal AI usage (perceived usefulness proxy)
AI_Impact_Budgeting	0.221 (0.067 SD)	0.310	Prior AI success experience (trust proxy)
ICT_Utilization	0.217 (0.064 SD)	0.304	Digital competency (ease of use proxy)
AI_Training	0.139 (0.048 SD)	0.244	Formal AI exposure (self-efficacy proxy)
Company_Size	0.092 (0.042 SD)	0.207	Organizational facilitating conditions
Age	0.018 (0.014 SD)	0.084	Demographic control

Table 7. Sample distribution by company size.

Company Size Category	Employees	n	Percentage
Micro-enterprise	1–10	8	13.1%
Small enterprise	10–50	6	9.8%
Medium enterprise	50–250	25	41.0%
Large enterprise	>250	13	21.3%
Missing	-	9	14.8%
Total	-	51	100%

Table 8. Distribution of AI cost reduction perception (5-class original).

Perception Level	Description	n	Percentage
1	Very negative	2	3.8%
2	Negative	4	7.7%
3	Neutral	7	13.5%
4	Positive	26	50.0%
5	Very positive	13	25.0%
Total	-	52	100%

Table 9. Consolidated 3-class perception distribution.

Perception Class	Original Levels	n	Percentage
Low (skeptical)	1–2	6	9.8%
Medium (neutral)	3	7	11.5%
High (positive)	4–5	39	78.7%
Total	-	52	100%

Table 10. AI perception scores by company size category.

Company Size	n	Mean	SD	95% CI
Micro (1–10)	8	2.75	1.04	[1.88, 3.62]
Small (10–50)	6	3.17	1.17	[1.94, 4.39]
Medium (50–250)	25	4.12	0.83	[3.78, 4.46]
Large (>250)	13	4.31	0.63	[3.93, 4.69]

Table 11. AI perception comparison—small vs. large companies.

Company Group	Categories	n	Mean	SD
Smaller companies	Micro + Small	14	2.93	1.07
Larger companies	Medium + Large	38	4.18	0.77
Difference	-	-	+1.26	-

Table 12. Company size × perception class cross-tabulation.

Company Size	Low (n)	Medium (n)	High (n)	% High
Micro	3	3	2	25.0%
Small	1	2	3	50.0%
Medium	2	1	22	88.0%
Large	0	1	12	92.3%

Table 13. ROI considerations by company size: suggested investment range, expected ROI timeline, recommended AI solution types, and risk level.

Company Size	Typical AI Investment	Expected ROI Timeframe	Key Considerations/Challenges	Risk Level
Micro	$5 K–$20 K	9–18 months	Skill/tech constraints; pilot ROI may be unclear	Moderate–high
Small	$20 K–$50 K	6–12 months	Cash flow constraints; integration challenges	Medium
Medium	$50 K–$200 K	3–9 months	Change management; scaling across multiple teams/projects	Medium–low
Large	$200 K+	3+ months	Enterprise integration and governance; complexity at scale	Low

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Behúnová, A.; Pohorenec, M.; Mandičák, T.; Behún, M. Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0. Appl. Sci. 2026, 16, 2057. https://doi.org/10.3390/app16042057

AMA Style

Behúnová A, Pohorenec M, Mandičák T, Behún M. Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0. Applied Sciences. 2026; 16(4):2057. https://doi.org/10.3390/app16042057

Chicago/Turabian Style

Behúnová, Annamária, Matúš Pohorenec, Tomáš Mandičák, and Marcel Behún. 2026. "Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0" Applied Sciences 16, no. 4: 2057. https://doi.org/10.3390/app16042057

APA Style

Behúnová, A., Pohorenec, M., Mandičák, T., & Behún, M. (2026). Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0. Applied Sciences, 16(4), 2057. https://doi.org/10.3390/app16042057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Human-Centered AI Perception Prediction in Construction: A Regularized Machine Learning Approach for Industry 5.0

Abstract

1. Introduction

2. Literature Review

2.1. Industry 5.0 and Construction Sector Transformation

2.2. AI Perception and Technology Acceptance Frameworks

2.3. Machine Learning for Perception and Survey Prediction

2.4. Small Sample Machine Learning and Regularization

2.5. Research Positioning and Gap Identification

2.6. Digitalization in Construction, Construction 5.0, and Performance-Oriented Management

3. Experimental Research

3.1. Problem Statement and the Aim of Experimental Research

3.2. Experimental Setup

Target Variable Definitions

3.3. System Architecture

3.3.1. Encoding Procedures

3.3.2. Target Simplification Justification

3.4. Experimental Configuration

3.4.1. High Variance Baseline Comparison

3.4.2. Validation Protocol and Leakage Prevention

3.5. Experimental Procedure

4. Results

4.1. Regularized Model Comparison

4.2. Feature Importance Analysis

4.3. Sample Characteristics and Distribution

4.4. Company Size and AI Perception Analysis

4.5. Hypothesis Interpretation and Practical Implications

4.6. Classification Performance

4.7. Performance Ceiling Analysis

4.8. Decision Support Interface Outputs and Correlation Insights

5. Discussion

5.1. Conclusion Derived from Experimental Results

5.1.1. Connection to Prior Literature

5.1.2. Managerial Implications

5.2. Future Improvements

6. Conclusions

Extended Limitations Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI