Next Article in Journal
Green Taxation, Urban Investment Platform Debt, and Urban Green Transformation
Previous Article in Journal
Use-Case-Driven Architectures for Data Platforms in Manufacturing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management

Dipartimento di Ingegneria dell’Innovazione, Università del Salento, 73100 Lecce, Italy
*
Author to whom correspondence should be addressed.
Platforms 2025, 3(3), 16; https://doi.org/10.3390/platforms3030016
Submission received: 2 June 2025 / Revised: 8 August 2025 / Accepted: 29 August 2025 / Published: 4 September 2025

Abstract

The increasing complexity of managing maintenance activities across geographically dispersed hospitality facilities necessitates advanced digital solutions capable of effectively balancing operational costs and service quality. This study addresses this challenge by designing and validating an intelligent Prescriptive Maintenance module, leveraging advanced Reinforcement Learning (RL) techniques within a Digital Twin (DT) infrastructure, specifically tailored for luxury hospitality networks characterized by high standards and demanding operational constraints. The proposed framework is based on an RL agent trained through Proximal Policy Optimization (PPO), which allows the system to dynamically prescribe preventive and corrective maintenance interventions. By adopting such an AI-driven approach, platforms are the enablers to minimize service disruptions, optimize operational efficiency, and proactively manage resources in dynamic and extended operational contexts. Experimental validation highlights the potential of the developed solution to significantly enhance resource allocation strategies and operational planning compared to traditional preventive approaches, particularly under varying resource availability conditions. By providing a comprehensive and generalizable representation model of maintenance management, this study delivers valuable insights for both researchers and industry practitioners aiming to leverage digital transformation and AI for sustainable and resilient hospitality operations.

1. Introduction

In the hospitality industry, the growing adoption of digital technologies and online platforms is raising a broad array of scientific research questions and opportunities [1]. On one hand, digital transformation prompts issues related to effectively integrating tools like artificial intelligence, big data, and augmented reality into both operational and strategic management for hotels, travel agencies, and related businesses. The need for specialized expertise and substantial investment can lead to disparities between large chains and small family-run establishments, signaling a clear area of inquiry on how to reduce the technological divide and foster inclusive digitization [2]. Moreover, the intensive use of automated systems and online services poses questions regarding the ethical handling of sensitive customer data, potential invasions of privacy, and the delicate balance between offer personalization and the protection of individual rights [1,3]. Another key aspect is the reorganization of power dynamics within the tourism supply chain, where digital platforms influence competition and collaboration models among different stakeholders. Market share distribution, platform-levied commissions, and growing dependence on online intermediaries raise questions about how to structure a sustainable business model, especially for smaller-scale operators [1,4]. At the same time, real-time data availability and advanced analytics solutions offer new research avenues for creating more flexible, adaptable services that can swiftly respond to external shocks like public health emergencies or sudden changes in tourism demand [3,5]. These topics concern not only the corporate sphere but also public policy, as government regulations on platforms, best practices for cybersecurity, and incentives for investment in green technologies are all critical to ensuring industry growth that emphasizes both efficiency and social and environmental responsibility. From a methodological perspective, the variety of potential approaches—from quantitative analyses of big data to qualitative case studies—opens up interesting research possibilities. Using multiple methods may allow for deeper insights into the internal dynamics of technological innovation processes and their external impacts on the labor market, local communities, and territorial cohesion. For instance, exploring how new digital skill requirements influence workforce training and role restructuring in the hotel sector could provide practical ideas for more effective professional development programs [3]. Meanwhile, there is room to design models for measuring customer satisfaction in highly automated environments where chatbots and IoT devices continually interface with users [6]. Recognizing that different business functions are part of a broader digital ecosystem also encourages investigation into governance mechanisms and partnership strategies with external actors. Faced with the complexity of global markets, hospitality businesses must forge agreements with software developers, cloud service providers, payment platforms, and digital marketing agencies, leading to new value chains and challenges in revenue-sharing and accountability [5]. While this network of interdependencies complicates resource management, it simultaneously creates opportunities for open innovation, joint research and development efforts, and exchanges of best practices that can accelerate the spread of more modern and scalable business models. Taken as a whole, it becomes clear that technical, organizational, legal, and ethical considerations are interlinked and demand a multidisciplinary research approach. From a scientific point of view, examining how a hotel or tourism operator can transform into a service provider on digital platforms requires rethinking relationships with customers, brand identity, and even economic and financial structures. Findings in this domain extend beyond the competitiveness of individual businesses, helping outline a smarter, more inclusive, and more sustainable hospitality ecosystem [3]. The uncertainty associated with the rapid pace of technological change, as well as heterogeneous adoption strategies, justifies further research that can guide managerial decisions, public policies, and training strategies, with the ultimate aim of creating shared value for industry stakeholders and for the communities in which they operate [5].
While the digitalization of the luxury hospitality sector is widely discussed in the literature, specific operational gaps remain. The majority of existing studies emphasize predictive maintenance at the single-hotel level, overlooking the integration of advanced, adaptive AI methodologies into maintenance planning, particularly within luxury hospitality settings characterized by high customer expectations and significant revenue risks associated with room downtime. Therefore, despite a broad exploration of maintenance strategies and digital tools, few studies rigorously address how intelligent, prescriptive policies that exploit real-time data streams can dynamically balance operational costs with room availability, and there is still a lack of rigorous frameworks that tackle the combined challenge of maintenance planning and optimal allocation of shared technical teams across geographically dispersed hotels.
This study addresses this gap by designing and validating an intelligent Prescriptive Maintenance module that can be integrated into existing Digital Twin (DT) infrastructure. Assuming such an infrastructure already provides real-time data and a modeling substrate, the research exclusively concentrates on developing a Reinforcement Learning (RL) agent that generates maintenance policies compliant with operational constraints, explicitly focusing on preventive and corrective interventions. The objectives are therefore twofold: firstly, to show how the system’s efficiency varies as operational constraints change, specifically in relation to the number of maintenance teams available for the entire network; secondly, to evaluate this AI component in a luxury hospitality context, demonstrating that RL can effectively address the complexity and dynamic nature of Prescriptive Maintenance decisions, thus offering a scalable and practical service model for hospitality platforms.

2. Background and Literature Review

2.1. Digital Platforms and Smart Tourism

The systematic representation of the linkage of successive technological waves to the evolution of hospitality business models, illustrating how the rise of digital platforms has reshaped roles, processes, and competitive strategies in the industry, has been reported in [1]. Specifically, the author reveals a progression from a traditional model heavily reliant on offline channels and intermediaries to a paradigm that emphasizes innovation and personalized customer experiences. By mapping out the successive “revolutions” in technology—ranging from early transport and distribution breakthroughs to advanced AI and immersive tools—the analysis demonstrates how new actors and practices have emerged, prompting firms to restructure their organizational models. Moreover, the article offers a forward-looking perspective by highlighting how the convergence of robotics, artificial intelligence, and virtual reality may further transform service offerings and customer relationships, while simultaneously reinforcing the emphasis on sustainability and responsible tourism. Overall, this work moves beyond mere technological implications, documenting how these shifts cascade across strategic, operational, and market dimensions of the hospitality sector. The article identifies three key limitations. First, it relies on a systematic literature review without empirical data, so future investigations should use data-driven methodologies to offer concrete evidence of how technology impacts hospitality business models. Second, its broad analysis of industry-wide transformations does not consider regional or platform-specific variations, suggesting that narrower studies could illuminate differences across countries or technologies. Third, while the paper discusses the benefits of digitization, it does not deeply analyze potential risks, such as ethical dilemmas and data privacy challenges, highlighting the need for additional research on unintended consequences. Addressing these gaps would yield more nuanced findings, informing both academic inquiry and practical implementation. The authors of [7] investigate how small and medium enterprises prepare for and implement business model innovations in the context of Industry 4.0, emphasizing that SMEs often approach technological shifts cautiously and must balance limited resources with competitive pressures. Their study highlights the importance of adaptability and flexibility as factors that influence the success of such initiatives. By examining multiple case studies, they find that clear leadership and vision can mitigate barriers to Industry 4.0 adoption, concluding that SMEs committed to ongoing learning and planning tend to benefit most from these transformations. The authors of [8] explore how servitization converges with Industry 4.0, transforming the strategies of product-oriented firms. They demonstrate that the shift toward services—driven by digital technologies—generates additional revenue streams and strengthens customer relationships. Their findings show the pivotal role of data analytics, cloud computing, and IoT in enabling value-added services, while also noting the challenges arising from changes in organizational culture and workforce skills. The authors of [9] analyze the ways in which digital platform-based ecosystems reshape collaboration and competition in sectors like hospitality, emphasizing the need for incumbent firms to adapt quickly to emerging platform-based players with direct customer access. They suggest that boundaries between industries are dissolving as platforms establish new market areas and value networks, arguing that ecosystem governance and strategic partnerships are essential for succeeding in such dynamic settings. The authors of [10] review existing literature on industrial digital platforms, proposing a business model perspective on how platforms can drive innovation. They point to the interdependent roles of platform owners, complementors, and end-users, stressing that robust governance structures and a clear strategic vision are critical for preventing conflicts and ensuring balanced value exchange among participants. The authors of [11] present a systematic review of digital technology and business model innovation, arguing that the rapid pace of technological progress obliges organizations to proactively reinvent internal structures and processes. Their study shows how data-driven insights can fuel more agile, consumer-focused models, particularly when organizations combine ambidexterity and foresight with openness to external partnerships. The authors of [12] examine how AI solutions help tourism businesses guide travellers from mere satisfaction to long-term service usage, emphasizing that chatbots and personalized recommendations significantly boost engagement in the pre-trip phase. Their findings illustrate that operational efficiency improves as AI automates routine tasks, although trust and transparency remain key to customer acceptance of AI-driven services. The authors of [13] look at the potential of Augmented Reality (AR) for smaller tourism enterprises, noting that immersive AR experiences can create a competitive advantage but also involve technological barriers and development costs. By reviewing real-world AR applications, they argue that thoughtful planning, intuitive interfaces, and a clear value proposition are crucial for maximizing impact. The authors of [14] examine how COVID-19 has accelerated digital transformation in hospitality, with hotels adopting remote interactions, contactless technology, and flexible booking policies, while underlining how the need for transparent safety protocols has increased trust-building efforts. They conclude that agile adaptation during the pandemic offers long-term competitive benefits. The authors of [5] investigate how industrial service providers transition to platform-based business models, proposing a set of strategic options including the creation of multi-sided platforms that orchestrate data exchange across entire value chains. They focus on governance and transparency as vital elements for aligning stakeholder incentives in such platforms, indicating that, although new revenue opportunities arise, substantial organizational changes are required. Finally, the authors of [15] analyze business model innovation in hospitality amid the COVID-19 crisis, showing that while tactical shifts like contactless check-ins or digital payments can mitigate short-term losses, deeper cultural and technological transformations are necessary for long-term resilience. They argue that having a flexible mindset supported by technology investments and risk management helps hospitality firms adapt to uncertainty and maintain a strategic edge. In [3], the authors provide a comprehensive viewpoint on how the synergy between AI and IoT elevates both operational efficiency and sustainability in the hotel industry. Through a survey of hotel managers and subsequent structural equation modelling, the authors illustrate that AI’s data-driven insights—together with IoT’s real-time monitoring capabilities—lead to notable reductions in resource consumption and environmental impact. Notably, they confirm that predictive analytics and automation can ease mundane tasks, allowing staff to refocus on higher-value responsibilities and thereby improving guest experiences. At the same time, the study acknowledges that adopting these advanced technologies entails organizational changes and potential pitfalls, including investment costs, data privacy issues, and the need for staff training. Nonetheless, the findings suggest that these challenges are outweighed by benefits such as robust energy management, minimized waste, and the capacity to meet emergent global sustainability benchmarks. By highlighting the importance of strategic planning and systemic integration, the research underscores that AI–IoT deployment, when carefully implemented, not only enhances operational processes but also positions hotels to gain significant competitive advantages while honoring principles of environmental stewardship. Recent works further elaborate on the strategic adoption of AI in hospitality. The authors of [16] analyze the opportunities and barriers to AI integration, highlighting both operational benefits and ethical or organizational challenges. The authors of [17] propose a framework to assess AI adoption susceptibility, showing that tools such as booking engines and chatbots have the highest potential due to clear advantages and low complexity. In [4], the authors provide a comprehensive look at the growing “platformization” of the tourism industry, illustrating both the historical development of this phenomenon and its practical consequences for market concentration and regulation in Europe. They begin by tracing how Global Distribution Systems (GDSs) laid the groundwork for digital intermediation, explaining that although GDSs were initially designed to manage airline reservations more efficiently, they evolved into large-scale business-to-business platforms connecting airlines, hotels, and travel agents. The emergence of internet-based Online Travel Agencies (OTAs) such as Booking.com and Expedia in the 1990s and early 2000s led to an even more pronounced consolidation of market power, with these two key players progressively absorbing or outcompeting smaller competitors. The authors point out that the so-called “sharing economy” platforms further disrupted the sector by introducing new, peer-to-peer accommodations and services, yet eventually many of these enterprises—Airbnb in particular—also acquired considerable market share and contributed to broader discussions about fair competition, data transparency, and local housing shortages.
From a regulatory standpoint, the authors of [4] highlight the tension between the positive impacts of digital innovation—improved consumer choice, potential economic gains for hosts, and data-driven decision-making—and the emergence of oligopolistic or even monopolistic tendencies that threaten smaller businesses and may harm consumers in the long run. They cite empirical data showing that a small group of major platforms controls a significant portion of online bookings, illustrating how “price parity” clauses and algorithmic ranking systems give these companies strong leverage over accommodation providers. The article also examines examples of European regulatory responses, from the gradual phasing out of broad parity clauses to more recent legislative frameworks (such as the Digital Markets Act) designed to ensure greater competition and fairness within the platform economy. Ultimately, the authors argue that while platformization brings undeniable efficiency gains and revenue opportunities for both large and small enterprises, policymaking must strike a delicate balance: promoting technological innovation and improved services on the one hand, while safeguarding market plurality, consumer welfare, and local communities on the other. In [18], the author explores how digital twins can serve as strategic tools for facing current and emerging challenges in the development of Smart Tourist Destinations (STDs). The paper begins by defining a smart tourist destination as a place that consistently employs technology, innovation, sustainability, and inclusivity to strengthen residents’ quality of life and enhance tourists’ overall experiences. Against this backdrop, digital twins are presented as virtual replicas of real-world systems or locations that can operate in sync with physical counterparts, allowing governments and tourism professionals to run simulations, predict disruptions, and discover new opportunities for seamless and sustainable operations. The research highlights various scenarios where digital twins support STD activities, including improving urban planning, bolstering environmental management, and optimizing visitor flow. By continuously gathering real-time data through an ecosystem of tools—such as the Internet of Things (IoT), Artificial Intelligence (AI), and cloud computing—digital twins can dynamically model traffic patterns, tourist behaviors, and energy consumption. This high-level data integration, the paper stresses, enables decision-makers to spot bottlenecks, craft more targeted marketing strategies, strengthen data-driven security, and introduce proactive sustainability measures (like water or resource conservation). Several case studies are referenced to illustrate how digital twin platforms offer city managers a clearer perspective on environmental threats, capacity thresholds, or future climate impacts. The article concludes that while digital twins are still an emerging concept in tourism, they have significant potential to reshape decision-making at STDs: from predicting how infrastructure will withstand natural disasters to improving marketing reach by customizing travel itineraries for different visitor segments. Ultimately, the paper underscores that realizing this potential requires strong collaboration among public administrations, private tourism stakeholders, and communities to ensure that data flows securely, ethically, and in a way that fosters responsible urban and tourism development. The authors of [6] examine how the integration of advanced technologies—especially AI, IoT devices, and predictive analytics—can redefine service delivery and boost customer satisfaction across the hospitality sector. The authors begin by describing ways in which disruptive tools (e.g., VR/AR, mobile apps, AI chatbots, predictive maintenance systems) are reshaping traditional hospitality models. They argue that these innovative platforms facilitate more personalized and frictionless guest experiences, such as automated in-room controls and voice-activated assistants, which ultimately drive operational efficiency and improve brand competitiveness. A central focus is on the role of machine learning for tasks like demand forecasting, sentiment analysis, and resource allocation. By leveraging large volumes of guest data—e.g., on check-in patterns, service requests, and social media reviews—hotels can anticipate supply needs and align staffing or pricing in near-real-time. The paper also emphasizes predictive maintenance, explaining how IoT sensors in equipment can identify early signs of failure, thus reducing downtime and ensuring consistent service levels. Overall, the authors highlight how these technologies—backed by robust data analytics—lead to stronger guest loyalty, higher revenue, and more sustainable business practices, but also note the importance of overcoming adoption barriers such as cost, expertise, and data privacy concerns.

2.2. Maintenance Strategies in Luxury Hospitality

According to ISO 15686-1, maintenance is defined as the set of technical and administrative actions—including supervision—aimed at preserving or restoring an asset so that it can fulfill its intended function. This comprehensive definition encompasses both operational and managerial aspects, such as financial planning and staff organization, to ensure long-term functionality, safety, and value. In the context of luxury hospitality, maintenance is critical for ensuring service quality, operational continuity, cost efficiency, regulatory compliance, and guest safety. The field is increasingly characterized by digital innovation: data analytics, advanced monitoring, and machine learning now enable more accurate fault prediction, cost optimization, and targeted interventions. Consequently, effective maintenance management requires not only technical skills but also advanced managerial and administrative competencies, including budgeting, scheduling, and resource allocation.
Maintenance can be categorized into two main domains: managerial and technical. From a managerial perspective, a distinction is made between ordinary maintenance—routine, predictable actions that do not alter the asset’s structure—and extraordinary maintenance—significant interventions that require additional funding or authorization. From a technical standpoint, the principal maintenance strategies are as follows:
  • corrective maintenance (run-to-failure) Reactive interventions conducted in response to asset failures. These may be planned or unplanned and are generally not preceded by preventive measures;
  • preventive maintenance: Scheduled interventions performed regardless of the actual condition of the asset, aimed at reducing the probability of failure;
  • condition-Based Maintenance (CBM): Maintenance actions triggered by real-time monitoring of asset condition, based on parameter thresholds. This approach optimizes resource utilization but necessitates investment in monitoring infrastructure and data analytics.
  • Predictive maintenance: integrates real-time data and advanced algorithms to estimate the Remaining Useful Life (RUL) of components and schedule interventions before failures occur. Predictive maintenance encompasses anomaly detection, diagnostics (identifying root causes), and prognostics (estimating RUL). Its effectiveness depends on high-quality monitoring and reliable data analytics, often utilizing models such as the P-F (Potential Failure–Functional Failure) curve to optimize intervention timing.
  • Prescriptive Maintenance: extends predictive approaches by recommending not only when to intervene but also how to act and which resources to allocate.
In luxury hospitality, maintenance constitutes an integrated system that combines planned, preventive, and increasingly predictive strategies to maintain the high standards expected by guests and protect asset value. As highlighted by [19], insufficient prioritization and resource allocation for maintenance negatively impact overall performance, revealing persistent issues such as an overly technical focus, unclear objectives, inadequate data, and higher costs compared to other sectors. Each property presents unique challenges in terms of design, usage, and required services. Continuous, high-quality service requires that routine and preventive maintenance be treated as strategic priorities, directly influencing guest comfort, safety, and brand perception. As demonstrated by [20], even minor faults or delays can significantly undermine customer satisfaction and reputation.
Maintenance not only preserves asset value by maintaining technical, functional, and aesthetic attributes but also significantly reduces the risk of costly renovations and mitigates the potential deterioration of the brand image [19]. In this context, the core systems that require consistent attention, such as HVACR, lighting, electrical systems, elevators, and plumbing, are maintained through preventive and predictive strategies. These strategies, typically based on regular inspections, advanced monitoring, and IoT-enabled anomaly detection, ensure the ongoing reliability and efficiency of key assets.
Operational efficiency, moreover, is strongly dependent on rigorous maintenance planning [21,22], which is increasingly supported by digital tools capable of tracking and analyzing work orders and performance data. A thoughtful approach to maintainability design not only contributes to reduced long-term costs but also fosters improved organizational alignment.
Furthermore, compliance and sustainability are essential drivers of maintenance planning: integrated approaches facilitate reduced energy consumption, reinforce regulatory adherence, and ultimately improve both brand reputation and guest loyalty [2,19,22]. Outsourcing maintenance activities can further increase access to specialized expertise and optimize resource allocation, provided that contractual frameworks are clearly defined and supported by robust performance monitoring.
The ongoing digitalization of maintenance processes, particularly through Computerized Maintenance Management Systems (CMMSs), brings additional benefits by enhancing scheduling, resource management, and cost control, while also enabling the deployment of predictive analytics. Empirical evidence, such as that presented in [23], demonstrates that digital maintenance management not only reduces downtime and improves planning but also, when integrated with IoT and AI, further strengthens predictive capabilities and optimizes resource utilization.
Within this framework, key performance metrics, including availability and business availability, Mean Time To Failure (MTTF), Mean Time To Repair (MTTR), Preventive Mintenance Ratio (PMR), Urgent Repair Request Index (URRI), Energy Use Index (EUI), and average cost per code, assume a central role. Ultimately, the adoption of data-driven maintenance strategies is fundamental to achieving operational excellence and to safeguarding the long-term value and reputation of luxury hospitality assets.

2.3. Reinforcement Learning for Prescriptive Maintenance

While current maintenance management systems—empowered by digital platforms and advanced analytics—enable efficient scheduling, monitoring, and resource allocation across diverse assets, the complexity of real-world hospitality networks introduces new challenges. Traditional maintenance optimization models typically focus on a single facility or a homogeneous set of assets, optimizing interventions in isolation. Such single-site approaches simplify the problem (often assuming identical deterioration profiles and costs across assets) to remain tractable. However, this simplification ignores the heterogeneity and interdependencies that naturally arise in a distributed network of hospitality facilities. In the Salento region of Italy, hotels, resorts, and other tourism assets exhibit diverse usage patterns and wear rates. Treating them as identical would be unrealistic. Moreover, optimizing each facility separately fails to capture system-level trade-offs—for example, how limited maintenance crews or budgets are best allocated across multiple sites. Extending to a region-wide, multi-facility context introduces additional complexity and novelty. We must account for facility-specific deterioration processes and costs (each site has a unique “asset profile”), while also coordinating decisions across sites. Classical “top-down” infrastructure models that attempt a network-level plan often had to assume that all facilities are identical to make the computation feasible. In contrast, a bottom-up paradigm first optimizes each facility independently and then reconciles these plans at the system level. This allows heterogeneity (each facility’s unique characteristics) to be considered, but it still falls short: initial facility-level plans neglect interactions and shared constraints, requiring suboptimal after-the-fact adjustments. The decomposition of multi-facility maintenance problem into separate Markov Decision Processes (MDPs) per facility to handle heterogeneity imposes the necessity to apply heuristics to combine these solutions under budget limits. Such two-stage methods can produce feasible plans, but they do not learn explicit coordination policies—they optimize sequentially rather than holistically.
To overcome these limitations, there is a growing need for data-driven approaches capable of learning coordinated and adaptive maintenance policies across heterogeneous, distributed assets. In this context, Reinforcement Learning (RL) has emerged as a prime candidate for optimizing sequential decisions in stochastic, non-stationary maintenance environments. Unlike deterministic optimization, RL dispenses with a priori transition models by learning directly through interaction with the plant, thus adapting as asset conditions, workforce availability, or demand fluctuate.
A maintenance process with infinite horizon is formalized as a Markov Decision Process (MDP) M = ( S , A , r , p , γ ) , where γ [ 0 , 1 ] is the discount factor for future rewards, p ( s s , a ) denotes the (generally unknown) transition probability function, and  r ( s , a ) is the immediate reward [24].
A stochastic policy π ( a s ) assigns a probability distribution over actions for each state. The goal is to find an optimal policy π that maximizes the expected discounted sum of rewards (return). For a given initial state s 0 , the value function is defined as:
V π ( s 0 ) = E τ ρ π t = 0 γ t r t s 0 ,
where τ = ( s 0 , a 0 , r 0 , s 1 , a 1 , r 1 , ) is a trajectory induced by π and  ρ π ( τ ) = t = 0 π ( a t s t ) p ( s t + 1 s t , a t ) is the probability of observing τ under policy π . Expanding, we obtain the following.
V π ( s 0 ) = τ ρ π ( τ ) t = 0 γ t r t .
By recursively expanding the sum, the Bellman equation for the value function can be written as:
V π ( s ) = a A π ( a s ) r ( s , a ) + γ s S p ( s s , a ) V π ( s ) .
The action-value function, or Q-function, is defined as:
Q π ( s , a ) = r ( s , a ) + γ E s p ( · s , a ) V π ( s ) ,
which quantifies the value of taking action a in state s and following π thereafter.
Classical dynamic programming methods, such as value iteration and policy iteration, iteratively solve the Bellman equations. The value iteration updates the value function directly until convergence, while the policy iteration alternates between evaluating a policy and improving it. Both exploit the recursive structure of the Bellman equation to avoid exhaustive enumeration of all possible trajectories.
When the transition probability functions p ( s s , a ) are unknown or not explicitly modeled, model-free methods are employed. Monte Carlo methods estimate value functions from sampled episodes by averaging returns, without requiring a model of the environment. Updates are performed incrementally at the end of each episode:
V π ( s ) V π ( s ) + α G V π ( s ) ,
where G is the empirical return observed upon the first visit to state s in the episode.
Temporal Difference (TD) learning combines ideas from Monte Carlo and dynamic programming: it updates value estimates using other learned estimates as proxies for expected future returns, and it does so at each step rather than at episode end. The TD(0) update for state value is:
V π ( s t ) V π ( s t ) + α r t + γ V π ( s t + 1 ) V π ( s t ) .
In value-based methods, the objective is to estimate the Q-function. SARSA is an on-policy algorithm that updates Q ( s , a ) using observed transitions and the next action actually taken:
Q ( s t , a t ) Q ( s t , a t ) + α r t + γ Q ( s t + 1 , a t + 1 ) Q ( s t , a t ) .
Q-learning is an off-policy algorithm that updates Q ( s , a ) towards the best possible next action, regardless of the behavior policy:
Q ( s t , a t ) Q ( s t , a t ) + α r t + γ max a A Q ( s t + 1 , a ) Q ( s t , a t ) .
When the state–action space is large or continuous, function approximation is used. Deep Q-Networks (DQNs) use neural networks parameterized by weights w to represent Q. The weights are updated by minimizing the mean squared error between the current estimate and the target:
Δ w = α r + γ max a A Q ( s , a ) Q ( s , a ) .
DQNs improve stability with two techniques: the experience replay buffer (to remove correlations between samples and mitigate biases originating from sequential transitions) and the target network (a periodically updated copy to reduce non-stationarity in targets).
Policy gradient methods directly parameterize the policy as π θ ( a s ) and optimize the expected return with respect to the policy parameters θ :
θ = arg max θ E π θ t = 0 γ t r t .
A classical approach is the REINFORCE algorithm, which estimates the policy gradient as:
Δ θ t = α θ log π θ ( a t s t ) G t b ( s t ) ,
where G t is the empirical return from time t and b ( s t ) is a baseline (often V π ( s t ) ) to reduce variance. Advanced versions employ the advantage function A π ( s , a ) = Q π ( s , a ) V π ( s ) for further variance reduction:
θ J ( θ ) = E τ t = 0 T 1 θ log π θ ( a t s t ) A ^ π ( s t , a t , w ) .
A major innovation in policy gradient methods is the class of Proximal Policy Optimization (PPO) algorithms [25], which restrict the policy update at each iteration to avoid large, destabilizing steps. The PPO objective is:
L θ k CLIP ( θ ) = E τ π θ k t = 0 T min r t ( θ ) A ^ t π , clip ( r t ( θ ) , 1 ε , 1 + ε ) A ^ t π ,
where r t ( θ ) = π θ ( a t s t ) π θ k ( a t s t ) is the probability ratio between the new and old policies and  ε is a small constant to control step size. The policy is updated as
θ k + 1 = arg max θ L θ k CLIP ( θ ) .
The introduction of stable and effective methods such as PPO has accelerated the practical adoption of reinforcement learning in maintenance scheduling, particularly for complex environments characterized by uncertainty, stochastic dynamics, and multiple interacting decision-makers. Recent research has progressively addressed challenges arising from heterogeneous equipment conditions, fluctuating resource availability, and real-time decision constraints. These methodological advances facilitate the modeling of realistic maintenance scenarios, extending RL applicability beyond simplified theoretical frameworks toward concrete industry-scale case studies. Notably, contemporary literature highlights successful applications of PPO-based frameworks and multi-agent architectures, demonstrating their effectiveness in optimizing complex predictive-maintenance tasks and in managing system-wide trade-offs among cost, availability, and reliability.
The authors of [26] propose a multi-agent deep reinforcement-learning framework that addresses predictive-maintenance scheduling as a stochastic control problem in three inseparable phases—remote monitoring, failure prediction, and task scheduling—thereby overcoming the deterministic assumptions that dominate conventional optimization models. Whereas meta-heuristics and mathematical programming rely on Monte Carlo post-processing to cope with unforeseen failures, fluctuating technician availability, and shifting priorities, an RL agent can update its policy on-line and thus react adaptively to such volatility. In the reference factory setting, M identical machines, each decomposed into component sets C m , evolve on a discrete horizon through working, breakdown, and maintenance modes; component lifetimes follow a two-parameter Weibull distribution, and technicians with heterogeneous skills restore functionality in stochastic repair times. The problem is formalized as a Markov game in which each machine-centered agent observes a partial system state vector—the global machine and technician status together with local component ages and residual service times—and selects either a technician–component pair or a deferred action, receiving shaped rewards that favor prolonged working states, timely preventive interventions, and legality of choices. Corrective, random, and periodic baseline policies serve as benchmarks, and extensive simulations with PPO, curiosity-driven exploration, and action masking, implemented in RLlib, demonstrate superior uptime and cost profiles across scenarios of three and five machines after five million learning steps on a modest two-layer neural architecture.
The authors of [27] extend the agenda to multi-component systems with economic, structural, and stochastic dependencies, a situation common in luxury hospitality where diverse subsystems—HVAC, plumbing, fire safety—degrade interactively. Their two-stage pipeline first trains neural models of degradation dynamics and aggregated maintenance costs from historical data, then embeds these surrogates in a deep-RL optimizer that minimizes life-cycle expenditure while accommodating coupled failure processes; the multi-agent variant distributes the decision burden and preserves solution quality at scale.
Te authors of [28] concentrate on coastal hotels and resorts whose infrastructure deteriorates rapidly under aggressive environmental exposure. Combining deterministic regression, Markov chains, and monthly simulation, they compare strategies that either minimize total economic cost or maximize average condition under budgetary constraints, revealing the delicate trade-off between service quality and expenditure.
Finally, the authors of [29] evaluate dynamic maintenance scheduling under high uncertainty by contrasting PPO with a genetic-algorithm simheuristic and classical dispatching rules. Machines fail according to Weibull laws, technicians possess skill-dependent repair times, and an action-masked PPO agent, trained for three million steps over a 168 h horizon on a 256 × 256 actor–critic network, attains the highest cumulative uptime and lowest mean-time-to-repair, confirming the practical advantage of deep RL for real-time, constraint-aware maintenance planning.

3. Methodological Framework

3.1. Application Scenario and Objectives

This study considers a fictitious company operating in the luxury hospitality sector that manages multiple guestrooms—possibly distributed across several hotels within the same network—whose commercial value depends on uninterrupted availability and consistently high service standards.
Each room is treated as an individual, yet interacting, asset whose physical condition deteriorates with use and whose occupancy is governed by a stochastic booking process reflecting seasonal demand. For modeling purposes, and consistently with the validation campaign described later, rooms are assumed to behave independently; any cross-facility coupling (e.g., shared utilities or correlated demand shocks) is neglected. This simplification isolates the maintenance–revenue trade-off at room level and enables the learning agent to scale to large, geographically dispersed portfolios, albeit at the cost of potentially underestimating systemic effects such as cascading failures or inter-property resource-allocation delays.
The primary objective is to optimize its long-run economic performance by trading off three competing effects:
  • daily revenue earned when a confirmed guest occupies an available room;
  • preventive maintenance expenditure that restores reliability without upsetting future bookings;
  • corrective maintenance expenditure and reputational loss caused by unexpected breakdowns that displace guests or lower perceived quality.

3.2. Problem Formalization

The daily maintenance planning task is addressed by formulating it as a finite-horizon, discrete-time MDP. Decisions regarding maintenance are made and revised once per day. The MDP is defined by the tuple M = S , A , P , R , γ , with decision epochs t = 0 , 1 , , T and one-day time steps. States encode, for every room, the time since last intervention, next confirmed arrival, operational status, and maintenance deadline remaining days. Actions are binary maintenance indicators for each room, subject to resource constraints. The reward credits occupied, available rooms and penalizes preventive or corrective interventions. The objective is to maximize the expected discounted return over the planning horizon T = 90 days. Full mathematical details—including transition dynamics, reward shaping, and constraint formalization—are provided in Appendix A.
Given the high-dimensional state and action spaces, stochastic environment dynamics, and operational constraints, we employ a model-free, policy gradient method for optimization, with the policy parameterized by a neural network and trained solely through simulated experience. The adoption of this approach is primarily motivated by several factors. First, the analytical intractability of the problem: the combined stochastic dynamics of room aging, maintenance, guest bookings, and resource constraints preclude the derivation of closed-form analytical solutions. Second, scalability: policy gradient methods can effectively leverage deep neural networks to manage high-dimensional state and action spaces, with sample complexity and computational requirements that scale linearly with the number of rooms. Third, operational constraints are natively addressed through action masking, which enforces domain-specific feasibility conditions at the policy output level, thereby eliminating the need for ad hoc post-processing or repair of infeasible actions. Finally, this approach offers a high degree of flexibility, as the same methodological framework can be easily adapted to alternative maintenance regimes, booking dynamics, or resource constraints, with only minor modifications to the underlying architecture.

3.3. Constraint Handling

Feasibility constraints imposed by room occupancy, maintenance resource limits, and preventive maintenance deadlines must be enforced throughout learning and inference. To ensure that the policy never samples infeasible actions, we adopt an action-masking strategy at the policy output level. Specifically, for any state–action pair where the corresponding action is infeasible (e.g., no maintenance resources available, room occupied, or not eligible for maintenance), the associated neural network logit is set to before the sigmoid activation. This procedure deterministically zeros the Bernoulli probability, strictly restricting the policy’s support to the feasible action set at every time steps.

3.4. Optimization Algorithm

To optimize the policy parameters, we employ Proximal Policy Optimization (PPO), a first-order, model-free actor–critic algorithm with proven stability in high-dimensional, discrete-action settings. The PPO update is based on maximizing a clipped surrogate objective, which trades off sample efficiency, robustness to large policy changes, and ease of implementation. The loss function is augmented with an entropy regularization term to encourage exploration and mitigate premature convergence to sub-optimal deterministic policies.
The neural network policy is constructed as a Multi-Layer Perceptron (MLP) to exploit the compact vectorial representation of the state and action spaces. The MLP architecture is selected to balance expressiveness with computational tractability and scalability, ensuring that both forward inference and policy optimization remain feasible even as the number of rooms increases. The rationale for adopting independent Bernoulli outputs is to allow simultaneous, per-room maintenance decisions, reflecting the decoupled nature of the underlying control problem.

4. Experimental Campaign

This section details the experimental protocol, validation strategy, and evaluation methodology adopted to assess the proposed reinforcement learning approach for the maintenance planning problem.

4.1. Environment Parameterization

The simulation environment is configured with the following key parameters (Table 1):

4.2. Experimental Setup

Experiments are conducted using a custom simulation environment implementing the mathematical model formalized in Section 3.2. The environment is developed atop the gymnasium framework and accurately emulates the daily operational dynamics, stochastic guest demand, and maintenance scheduling for a luxury-hospitality property.
  • Key technical features include the following:
    • StateRepresentation The agent observes a flattened vector constructed as the concatenation of ( τ i , o i , s i , δ i ) for each room and the global variable ρ t . The observation is implemented as a Box array with bounds specified for each variable according to environment parameters. Internal environment variables (timers, booking schedules, pending requests) are updated deterministically but remain hidden from the agent.
    • Action Representation: The action space is an N-dimensional MultiBinary, with action masking enforced at each step via an explicit action mask. Infeasible actions are masked before sampling by the agent.
    • Transition Simulation: Environment transitions (aging, breakdowns, bookings, maintenance, resource updates) are updated room by room at each time step, with stochastic events (breakdown, booking arrivals) sampled from the relevant distributions. The booking process uses a Poisson arrival rate with fixed mean.
All experiments are executed on a standard workstation equipped with an Intel i7 6700 CPU, 16 GB RAM, and an NVIDIA GTX 1060 GPU. The training pipeline is implemented in Python 3.12.0 using stable_baselines3.

4.3. Training and Evaluation

During training, agent performance is systematically monitored using periodic evaluation callbacks derived from EvalCallback in stable_baselines3. Each training session spans 1 × 10 6 timesteps, with data collected from 6 parallel environments to accelerate experience accumulation and decorrelate training data. Every 2000 timesteps, the current policy is evaluated on 5 dedicated validation environments with fixed seeds; mean and standard deviation of cumulative reward are recorded via custom callbacks. All training, validation, and model checkpointing operations are fully automated.
Training progress and policy convergence are tracked using TensorBoard routine, which provides real-time learning curves (reward evolution plots) for each hyperparameter configuration, facilitating rapid comparison and early detection of convergence or instability issues.
Upon completion of training, learned policies are systematically evaluated on multiple random seeds not encountered during training, in order to assess both generalization capability and robustness to stochastic environment realizations. For each hyperparameter configuration, two main analyses are conducted:
  • learning curves: The evolution of mean and standard deviation of cumulative reward is monitored over training epochs on fixed-seed validation environments. These curves are automatically logged and visualized via TensorBoard and custom plotting utilities, enabling direct comparison across different settings and policies.
  • State heatmaps For selected evaluation episodes and seeds, custom heatmaps are generated to visualize the temporal evolution of each room’s operational state. Each cell encodes the state (available, occupied, preventive maintenance, corrective maintenance, breakdown with or without guest) using a dedicated color map. This enables rapid, qualitative inspection of emergent maintenance strategies and identification of operational bottlenecks, such as clustered breakdowns or suboptimal intervention timing.
This integrated quantitative and qualitative evaluation protocol enables comprehensive assessment of both policy convergence dynamics and the effectiveness of the learned maintenance strategies across diverse operational scenarios.

4.4. Hyperparameter Tuning

The principal hyperparameters are systematically tuned via grid search:
  • Entropy coefficien: controls the exploration–exploitation tradeoff.
  • Learning rate: determines the optimization step size.
  • Network architecture: defines the depth and width of the policy MLP.
  • Maintenance resources: maximum number of simultaneous maintenance interventions.
Grid search is orchestrated by an automated routine, which spawns independent training jobs across combinations of hyperparameter settings.

5. Results

After completion of the training phase, learned policies were evaluated using multiple random seeds to assess both the diversity of agent behaviors and the generalization capacity of the policies.
The experiments confirm the critical role of accurate hyperparameter tuning and parallel environment structure for successful reinforcement learning in this domain. The detailed visualization of room states and associated economic performance provides deeper insight into the learned behaviors and enables rapid identification of optimal or suboptimal strategies.

5.1. Cyclic Preventive Policy—Infinite Capacity

Experiments with the number of maintenance resources equal to the number of rooms (i.e., infinite operational capacity) highlight that the agent oscillates between two distinct strategies depending on the hyperparameter regime: a strictly cyclic preventive maintenance policy and an anticipatory preventive policy.
The cyclic preventive policy arises exclusively in underfitting scenarios, characterized by insufficient exploration. This is typically induced by a low entropy coefficient, excessively small learning rate, or an overly large neural network (capacity), which compromises learning effectiveness. In these settings, the learning curve (Figure 1) consistently converges to a mean cumulative reward of approximately EUR −30,000 with high variance, due to the stochasticity of the environment.
This behavior emerges because the agent learns a suboptimal policy, which essentially triggers scheduled maintenance as soon as the preventive timer expires, regardless of the underlying risk of breakdowns. The corresponding heatmap (Figure 2) clearly shows that preventive interventions are performed only when breakdowns do not occur before the scheduled window.

5.2. Optimal Preventive Policy—Infinite Capacity

By increasing the entropy coefficient or the learning rate, the agent is enabled to better explore the environment and more accurately evaluate the impact of its actions. In these scenarios, after roughly 400,000 training steps, the policy begins to converge towards an anticipatory preventive regime. As shown in Figure 3, the mean cumulative reward stabilizes at approximately EUR +38,000, indicating significant improvement over the cyclic baseline.
The learned policy in this setting converges to an optimal preventive cycle of 8 days, corresponding to a 97.5 % probability of avoiding breakdowns. This is achieved by systematically anticipating preventive interventions, particularly when guest bookings might preclude maintenance at the ideal time. The heatmap in Figure 4 highlights the more frequent and timely preventive interventions, resulting in very few breakdowns and higher overall profit.

5.3. Optimal Preventive Policy—Finite Capacity (Favorable Case)

In the presence of tight resource constraints ( ρ = 2 ), the agent is forced to further adapt its strategy. Although the optimal preventive period remains approximately the same as in the infinite-capacity case, the agent must plan interventions with greater caution to respect the limited number of simultaneous maintenance operations. This leads to a higher frequency of preventive actions and an overall 13 % decrease in cumulative reward (Figure 5), reflecting the cost of increased intervention frequency required to meet resource limits.
The corresponding heatmap (Figure 6) illustrates that, despite the constraints, the agent maintains high operational uptime and minimizes breakdowns, confirming effective adaptation to stricter operational conditions.

5.4. Optimal Preventive Policy—Finite Capacity (Unfavorable Case)

In the most restrictive case ( ρ = 1 ), the agent is unable to maintain the optimal preventive cycle for all rooms. The learning curve (Figure 7) shows convergence to a markedly suboptimal solution, with cumulative reward remaining negative. The learned policy is predominantly reactive, as the limited resource capacity prevents the agent from acting preventively in a timely manner.
The heatmap in Figure 8 demonstrates extended periods with multiple rooms in breakdown, reflecting the inability of the agent to prevent failures across all rooms. Under these extreme constraints, the agent is compelled to intervene frequently, with a substantial portion of actions triggered only after breakdowns have already occurred. In certain episodes, the policy must even schedule maintenance on rooms that are still operational to avoid further penalties from delays on already failed rooms. The result is an overall degradation in system performance, with cascading negative effects on uptime, costs, and guest satisfaction.
These results collectively highlight the strong dependence of policy effectiveness on both hyperparameter selection and resource availability. While anticipatory preventive strategies yield substantial benefits under sufficient capacity, severe resource constraints impose structural limits on achievable performance, underscoring the need for adequate maintenance planning in real-world deployments.

6. Conclusions

This study set out to (i) design and validate an RL-driven DT-enabled Prescriptive Maintenance module for luxury hospitality and (ii) analyze how its effectiveness varies with the number of available maintenance resources. Consequently, the study explores whether or not an anticipatory policy can outperform fixed cycle baselines across different crew capacities, and whether or not the proposed DT–RL architecture can operate feasibly within stringent service quality constraints. The results confirm the effectiveness of RL to optimize maintenance strategies in the luxury hospitality sector under various operational constraints. The results highlight the importance of sufficient maintenance resources and accurate hyperparameter tuning to enable anticipatory and cost-effective maintenance policies. The agent is able to discover effective preventive regimes when resource availability allows, but severe limitations inevitably drive the system toward suboptimal, reactive strategies. Despite these results, there are multiple avenues for future research and development. The simulated environment can be further enhanced to better capture the complexity of real-world hotel operations, and this is the gain reachable by the integration of the RL-agent into a extended platforms to manage maintenance requests. The model does not incorporate external factors such as seasonal demand variation, market price fluctuations, or unexpected events that may impact room occupancy and maintenance planning. A promising direction is the integration of predictive demand models, which would allow both booking management and maintenance scheduling to dynamically adapt to anticipated occupancy patterns. A notable omission in the current model is the absence of booking cancellation dynamics. Introducing a cancellation probability ( p cancel ) would account for the realistic possibility that confirmed reservations are withdrawn before guest arrival. This directly affects room availability and maintenance scheduling. If cancellations occur, the agent could opportunistically exploit unexpected idle periods to perform maintenance, thereby reducing the risk of breakdowns during high-occupancy periods. At present, the learned maintenance policies rely on relatively rigid scheduling logic, primarily driven by fixed preventive cycles. Future work should investigate the introduction of dynamic postponement mechanisms, whereby maintenance actions can be rescheduled based on room condition and predicted failure risk. Such flexibility could further optimize the trade-off between preventive and corrective interventions, ultimately improving cost efficiency and room availability. The current environment represents the condition of the room naively, relying solely on the elapsed time since the last maintenance intervention. A more sophisticated approach could incorporate health indicators that also reflect cumulative guest usage and the duration of stays. These health signals could be included in the agent’s observation space or even directly influence the reward function, enabling more granular and effective maintenance scheduling. Integrating IoT technologies and the relative data by enabling an extended platform represents a major opportunity for future enhancements. Environmental sensors (measuring temperature, humidity, energy consumption, etc.) could provide real-time data to assess room degradation. Using protocols such as MQTT, continuous sensor streams could update the agent’s decision model online. These data could also be leveraged by anomaly detection algorithms to generate automatic maintenance alerts before failures occur, enabling a shift toward fully prescriptive and predictive maintenance. The present simulation treats the portfolio of rooms as if they were located in a single building; in reality, luxury chains often operate multiple properties spread over a region. Extending the environment to account for travel times, crew dispatch priorities, and inter-property constraints would enable an assessment of how the RL agent scales when maintenance resources must be routed dynamically across hotels. Modeling these spatial frictions is a necessary step toward validating the approach in truly distributed scenarios and may reveal further trade-offs between preventive timing and logistical cost. In conclusion, the results of this study highlight several actionable insights for hotel management. The adoption of reinforcement learning-based maintenance policies can significantly reduce operational costs and minimize room downtime, especially when sufficient maintenance resources are available. The analysis demonstrates that proactive, anticipatory preventive maintenance yields substantial benefits over reactive or cyclic strategies, both in terms of guest satisfaction and economic performance. The proposed framework highlights the value of integrating predictive analytics and IoT technologies into hotel operations. By leveraging real-time data and adaptive algorithms, hotel managers can transition from rigid, schedule-driven maintenance to truly data-driven, flexible asset management, ultimately enhancing service quality, competitiveness, and long-term profitability.

Author Contributions

Conceptualization, P.M. and A.G.; methodology, P.M.; software, P.C.; validation, P.M., A.G. and P.C.; data curation, P.M.; writing—original draft preparation, P.M.; writing—review and editing, A.G.; funding acquisition, A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by ArcoNuovo Srl CUP code: B85H24002340009, Pratica code: MPT001635 and Sunsea Yellow Srl, CUP code: B95H24000970009, Pratica code: MPT001520.

Data Availability Statement

All data used in the article can be reproduced based on the data provided.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
DTDigital Twin
IoTInternet of Things
MDPMarkov Decision Process
MLPMulti Layer Perceptron
PPOProximal Policy Optimization
RLReinforcement Learning

Appendix A. MDP Formulation

Appendix A.1. State Space

The property operates N guestrooms. For each room i { 1 , , N } , the observable state is characterized by a four-tuple x i = ( τ i , o i , s i , δ i ) , where:
  • τ i 0 counts the days since its last maintenance intervention.
  • o i 0 is the countdown in days to the next confirmed guest arrival; o i = 0 indicates that the room is occupied on day t.
  • s i { 0 , 1 , 2 } denotes the operational mode of the room: 0 for available, 1 for under maintenance (either preventive or corrective), and 2 for in a breakdown state.
  • δ i 0 records the days remaining until the next cyclic maintenance service; δ i = 0 indicates that the maintenance service is currently due.
The global state at decision epoch t is the collection of all room states, s t = ( x 1 , t , , x N , t ) S .
In addition to the room-specific states, a critical global variable is ρ t , representing the number of maintenance resources (e.g., teams or technicians) available on day t. The set of these resources is limited, and all resources are assumed to be interchangeable and capable of handling any type of maintenance or breakdown event. The availability of ρ t directly constrains the number of maintenance actions that can be scheduled concurrently. The agent’s observation at each time step t is a flattened array concatenating the per-room variables for all N rooms with the global variable ρ t , resulting in a state vector s t = ( τ 1 , t , o 1 , t , s 1 , t , δ 1 , t , , τ N , t , o N , t , s N , t , δ N , t , ρ t ) of dimension 4 N + 1 .
In addition, the environment internally tracks several other variables crucial for simulating its dynamics, managing maintenance tasks, and handling the booking process. These include:
  • Π i , t 0 : a timer for room i indicating the remaining duration in days until the current maintenance intervention (preventive or corrective) is completed. This timer is active when s i , t = 1 .
  • T i , t D 0 : a timer for room i specifying the number of days remaining until the current guest checks out. This variable is relevant when the room is occupied (i.e., o i , t = 0 ).
  • B i , t : an ordered set of confirmed future bookings assigned to room i. Each booking is typically represented by a tuple (check-in day, check-out day). The variables o i , t and T i , t D can be derived from this set and the current day t.
  • P t : the set of pending booking requests that have been generated (e.g., through a daily arrival process) but have not yet been assigned to any room.

Appendix A.2. Action Space

At each time step t, the agent selects a binary action vector a t = ( a 1 , t , , a N , t ) . This vector belongs to the actions space A = { 0 , 1 } N .
  • For each room i:
    • a i , t = 1 signifies a proactive decision by the agent to attempt to initiate a maintenance service on room i starting the next day.
      If room i is currently available ( s i , t = 0 ) and not yet due for scheduled preventive maintenance ( δ i , t > 0 ), this action represents a request for early or opportunistic preventive maintenance.
      If room i is in breakdown state ( s i , t = 2 ), this action represents a request for corrective maintenance.
    • a i , t = 0 signifies that the agent chooses not to initiate a new maintenance service for room i. The room’s state will then evolve based on other factors: it may continue its normal operation, potentially experience a stochastic breakdown, or undergo a forced maintenance if system rules dictate (e.g., a preventive maintenance becoming critically overdue, as detailed in Appendix A.3).
The actual execution of a maintenance action a i , t = 1 is contingent upon certain feasibility conditions evaluated by the environment at time t.
  • Resource availability: There must be at least one maintenance resource unit available (i.e., ρ t > 0 ). Both preventive and corrective maintenance tasks require one unit of resource for their entire duration. The total number of available resources per day is fixed.
  • Room occupancy: To initiate preventive maintenance on an available room ( s i , t = 0 ), the room must not be occupied by a guest on the day the maintenance would commence (i.e., o i , t > 0 ). This constraint does not apply to corrective maintenance tasks, as rooms in breakdown state are already unavailable for guests.
If the agent selects a i , t = 1 for a room i, but one or more of these feasibility conditions are not met, the environment automatically overrides the agent’s choice, and the action for that room is considered a i , t = 0 .
Conversely, the environment also enforces mandatory maintenance. If a room i’s scheduled preventive maintenance becomes due (i.e., δ i , t = 0 ), the system effectively mandates a i , t = 1 for that room, overriding any a i , t = 0 choice made by the agent. For such instances of cyclic preventive maintenance, room availability is guaranteed by the booking system, which updates the room’s booking schedule B i , t to block room i in anticipation by considering both the preventive maintenance due date (when δ i , t reaches 0) and its duration ( T maintenance ). Nevertheless, the final execution of any maintenance task remains contingent upon the availability of a maintenance resource unit ( ρ t > 0 ) at the time of allocation.
Thus, the agent’s scope regarding preventive maintenance decisions is primarily focused on whether or not to strategically anticipate an already scheduled intervention. Opting for such an early maintenance intervention then results in the rescheduling of that room’s subsequent periodic maintenance cycle.

Appendix A.3. Transition Dynamics

The environment transitions from its current state s t to a new state s t + 1 at each time step t, representing the progression of one day. This transition is governed by the current state, the action a t selected by the agent (and potentially modified by system rules), and stochastic events.
  • Room Aging: for a room i in available mode ( s i , t = 0 ) where no maintenance is initiated ( a i , t = 0 ):
    Its age increases by one day: τ i , t + 1 τ i , t + 1 .
    A breakdown may occur with probability P breakdown ( τ i , t ) = 1 exp τ i , t λ k , where k and λ are the Weibull shape and scale parameters. If a breakdown occurs, the room’s state transitions to breakdown: s i , t + 1 2 . Each breakdown is assumed to be of a single generic type, and the repair procedure does not differentiate between failure modes.
    If no breakdown occurs and the room is not imminently scheduled for maintenance ( δ i , t = 0 ), it is considered available. If it is occupied on day t, indicated by o i , t = 0 , it earns the daily revenue R.
  • Initiating Preventive Maintenance: for a room i in available mode ( s i , t = 0 ) where a preventive maintenance action is chosen ( a i , t = 1 ), and this action is feasible:
    The room state transitions to maintenance: s i , t + 1 1 .
    The maintenance duration timer is set: Π i , t + 1 T maintenance .
    The age counter is frozen: τ i , t + 1 τ i , t .
    The preventive maintenance cycle timer is frozen during maintenance and will be reset upon completion: δ i , t + 1 δ i , t .
    Any conflicting guest bookings for the duration of the maintenance, as recorded in its booking set B i , t , are cancelled, and B i , t + 1 is updated accordingly.
    The count of available maintenance resources is decremented: ρ ρ 1 .
  • Initiating Corrective Maintenance: for a room i in a breakdown state ( s i , t = 2 ) where a corrective maintenance action is chosen ( a i , t = 1 ), and this action is feasible:
    The room state transitions to maintenance: s i , t + 1 1 .
    The maintenance duration timer is set: Π i , t + 1 T breakdown , where T breakdown > T maintenance .
    The age counter is frozen: τ i , t + 1 τ i , t .
    The preventive maintenance cycle timer is frozen during maintenance and will be reset upon completion: δ i , t + 1 δ i , t .
    There is no need to check for conflicting bookings because a room in breakdown cannot be booked.
    The count of available maintenance resources is decremented: ρ ρ 1 .
  • Ongoing Maintenance: for a room i already undergoing maintenance ( s i , t = 1 ):
    The remaining maintenance duration timer decrements by one day: Π i , t + 1 Π i , t .
    If the timer Π i , t + 1 reaches 0:
    The room returns to available mode: s i , t + 1 0 .
    Its age is reset: τ i , t + 1 0 .
    The preventive maintenance cycle timer is reset to its nominal interval: δ i , t + 1 T maintenance .
    The maintenance resource unit it occupied becomes available: ρ ρ + 1 .
    Otherwise, the room remains in maintenance ( s i , t = 1 ) and its age and preventive maintenance cycle timer remain frozen ( τ i , t + 1 τ i , t , δ i , t + 1 δ i , t ).
  • Unmanaged Breakdowns: for a room i in a breakdown state ( s i , t = 2 ) where a maintenance action ( a i , t = 0 ) is not chosen or feasible:
    The room remains in the breakdown state: s i , t 2 .
    Its age and preventive maintenance cycle timer remain frozen: τ i , t + 1 τ i , t , δ i , t + 1 δ i , t .
  • Updates at end of day: for rooms not involved in maintenance activities (i.e., s i , t + 1 { 0 , 2 } ) that would otherwise reset or freeze them, relevant timers are decremented. Specifically:
    If room i is vacant ( o i , t > 0 ), its vacancy duration o i , t decrements: o i , t + 1 o i , t 1 .
    If room i is occupied ( o i , t = 0 ), its guest checkout timer T i , t D (which is derived from the booking set B i , t ) effectively decrements as one day passes. Once T i , t D indicates that the guest has checked out (i.e., it reaches zero relative to the start of the next day), this change is noted, and the room may become available. The definitive determination of its status o i , t + 1 (and T i , t + 1 D if immediately reoccupied) for day t + 1 occurs after all end-of-day processes, based on the comprehensively updated booking set B i , t + 1 .
    For rooms in available mode ( s i , t = 0 ) and not undergoing preventive maintenance initiation, the preventive maintenance cycle timer δ i , t decrements: δ i , t + 1 max ( 0 , δ i , t 1 ) .
  • Booking Process:
    New reservation requests arrive according to a Poisson process with intensity λ b .
    Each request is characterized by a stochastic lead time L (e.g., drawn from a geometric distribution with parameter p lead and a stay length S (e.g., drawn from a truncated normal distribution N + ( μ s , σ s 2 ) ).
    Pending requests P t (newly arrived and previously unassigned) are assigned to available rooms ( s i , t + 1 = 0 ) based on a defined heuristic (e.g., prioritizing rooms that will be vacant longest, i.e., largest o i , t + 1 , or are furthest from their next preventive maintenance, i.e., largest δ i , t + 1 ). This process updates the set of confirmed future bookings B i , t + 1 for the assigned rooms. Subsequently, the occupancy status o i , t + 1 and guest checkout timer T i , t + 1 D are derived from this updated B i , t + 1 .

Appendix A.4. Reward Function

The net reward r i , t for each room i at time step t is defined based on its state and trajectory:
r i , t = R , if s i , t = 0 ( available ) and o i , t = 0 ( occupied by guest ) , C maintenance , if s i , t = 1 ( maintenance ) and room i transitioned from s i = 0 , C breakdown , if s i , t = 1 ( maintenance ) and room i transitioned from s i = 2 , 0 , otherwise .
Here, R is the daily revenue, C maintenance is the cost of preventive maintenance, and C breakdown is the cost of corrective maintenance, with C breakdown C maintenance > 0 .
Thus, the total daily hotel reward is r t daily = i = 1 N r i , t .

Appendix A.5. Objective

The objective is to find an optimal policy π : S A that maximizes the expected total discounted sum of rewards over the finite horizon T:
π = arg max π E π t = 0 T γ t r t daily ,
where γ ( 0 , 1 ] is the economic discount factor.
The policy must achieve this objective while respecting guest occupancy, maintenance resource limits, and regulatory maintenance schedules.

Appendix A.6. Policy Parametrization

The policy π θ is represented as a parametric stochastic mapping from the observed state s S to a binary action vector a A . Formally, the policy outputs a vector of Bernoulli parameters:
π θ ( s ) = σ ( f θ ( s ) ) ] 0 , 1 [ N
where f θ ( · ) denotes a fully-connected neural network with ReLU activations and σ ( · ) is the element wise logistic sigmoid. Each component p i corresponds to the probability of selecting a maintenance action for room i, with the final action a i Bernoulli ( p i ) sampled independently for each room. The conditional-independence assumption factorizes the joint policy as:
π θ ( a s ) = i = 1 N p i a i ( 1 p i ) 1 a i
This factorization enables efficient sampling and closed-form computation of the log-probability and its gradients, which are essential for policy gradient updates.

References

  1. Zeqiri, A. From Traditional to Digital: The Evolution of Business Models in Hospitality Through Platforms. Platforms 2024, 2, 221–233. [Google Scholar] [CrossRef]
  2. Ceylan, E.N.; Tülbentçi, T. Facility operation and maintenance management model for small and medium-sized hotels in Turkey. Int. J. Adv. Appl. Sci. 2020, 7, 1–18. [Google Scholar] [CrossRef]
  3. Gajić, T.; Petrović, M.D.; Pešić, A.M.; Conić, M.; Gligorijević, N. Innovative Approaches in Hotel Management: Integrating Artificial Intelligence (AI) and the Internet of Things (IoT) to Enhance Operational Efficiency and Sustainability. Sustainability 2024, 16, 7279. [Google Scholar] [CrossRef]
  4. Turnšek, M.; Radivojević, V. Platformization in Tourism: Typology of Business Models, Evolution of Market Concentration and European Regulation Responses. Platforms 2025, 3, 1. [Google Scholar] [CrossRef]
  5. Beverungen, D.; Kundisch, D.; Wünderlich, N.V. Transforming into a platform provider: Strategic options for industrial smart service providers. J. Serv. Manag. 2020, 32, 507–532. [Google Scholar] [CrossRef]
  6. Osadare, O.O.; Akande, O.N.; Soladoye, A.A.; Sobowale, P.O. Smart Hospitality: Leveraging Technological Advances to Enhance Customer Satisfaction. FUOYE J. Eng. Technol. 2024, 9, 553–557. [Google Scholar] [CrossRef]
  7. Müller, J.M.; Buliga, O.; Voigt, K. Fortune favors the prepared: How SMEs approach business model innovations in Industry 4.0. Technol. Forecast. Soc. Change 2018, 132, 2–17. [Google Scholar] [CrossRef]
  8. Frank, A.G.; De Sousa Mendes, G.H.; Ayala, N.F.; Ghezzi, A. Servitization and Industry 4.0 convergence in the digital transformation of product firms: A business model innovation perspective. Technol. Forecast. Soc. Change 2019, 141, 341–351. [Google Scholar] [CrossRef]
  9. Cozzolino, A.; Corbo, L.; Aversa, P. Digital platform-based ecosystems: The evolution of collaboration and competition between incumbent producers and entrant platforms. J. Bus. Res. 2021, 126, 385–400. [Google Scholar] [CrossRef]
  10. Madanguli, A.; Parida, V.; Sjödin, D.; Oghazi, P. Literature review on industrial digital platforms: A business model perspective and suggestions for future research. Technol. Forecast. Soc. Change 2023, 194, 122606. [Google Scholar] [CrossRef]
  11. Ancillai, C.; Sabatini, A.; Gatti, M.; Perna, A. Digital technology and business model innovation: A systematic literature review and future research agenda. Technol. Forecast. Soc. Change 2023, 188, 122307. [Google Scholar] [CrossRef]
  12. Ku, E.C.S.; Chen, C.D. Artificial intelligence innovation of tourism businesses: From satisfied tourists to continued service usage intention. Int. J. Inf. Manag. 2024, 76, 102757. [Google Scholar] [CrossRef]
  13. Cranmer, E.E.; Urquhart, C.; tom Dieck, M.C.; Jung, T. Developing augmented reality business models for SMEs in tourism. Inf. Manag. 2021, 58, 103551. [Google Scholar] [CrossRef]
  14. Ben Youssef, A.; Redžepagić, S.; Zeqiri, A. The key changes to the hospitality business model under COVID-19. Strateg. Manag. 2022, 27, 55–64. [Google Scholar]
  15. Breier, M.; Kallmuenzer, A.; Clauß, T.; Gast, J.; Kraus, S.; Tiberius, V. The role of business model innovation in the hospitality industry during the COVID-19 crisis. Int. J. Hosp. Manag. 2021, 92, 102723. [Google Scholar] [CrossRef]
  16. Zahidi, F.; Kaluvilla, B.B.; Mulla, T. Embracing the new era: Artificial intelligence and its multifaceted impact on the hospitality industry. J. Open Innov. Technol. Mark. Complex. 2024, 10, 100390. [Google Scholar] [CrossRef]
  17. Huang, A.; Chao, Y.; de la Mora Velasco, E.; Bilgihan, A.; Wei, W. When artificial intelligence meets the hospitality and tourism industry: An assessment framework to inform theory and management. J. Hosp. Tour. Insights 2021, 5, 228–248. [Google Scholar] [CrossRef]
  18. Florido-Benítez, L. The Use of Digital Twins to Address Smart Tourist Destinations’ Future Challenges. Platforms 2024, 2, 234–254. [Google Scholar] [CrossRef]
  19. Ghazi, K.M. Hotel maintenance management practices. J. Hotel Bus. Manag. 2016, 5, 136. [Google Scholar] [CrossRef]
  20. Chan, K.T.; Lee, R.H.K.; Burnett, J. Maintenance performance: A case study of hospitality engineering systems. Facilities 2001, 19, 494–504. [Google Scholar] [CrossRef]
  21. Longart, P. Understanding hotel maintenance management. J. Qual. Assur. Hosp. Tour. 2019, 21, 267–296. [Google Scholar] [CrossRef]
  22. Ihsan, B.; Alshibani, A. Factors affecting operation and maintenance cost of hotels. Prop. Manag. 2018, 36, 296–313. [Google Scholar] [CrossRef]
  23. Lai, J.H.K. An analysis of maintenance demand, manpower, and performance of hotel engineering facilities. J. Hosp. Tour. Res. 2013, 37, 426–444. [Google Scholar] [CrossRef]
  24. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
  25. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  26. Ruiz Rodríguez, M.L.; Kubler, S.; de Giorgio, A.; Cordy, M.; Robert, J.; Le Traon, Y. Multi-Agent Deep Reinforcement Learning-Based Predictive Maintenance on Parallel Machines. Robot. Comput.-Integr. Manuf. 2022, 78, 102406. [Google Scholar] [CrossRef]
  27. Nguyen, V.T.; Do, P.; Voisin, A. Artificial-Intelligence-Based Maintenance Scheduling for Complex Systems with Multiple Dependencies. PHM Soc. Eur. Conf. 2022, 7, 586–589. [Google Scholar] [CrossRef]
  28. Ghaly, A.; Amin, M.; Tedla, T.; Hosny, O.; Elbehairy, H. Coastal Hotels and Resorts: Infrastructure Asset Management System Model. In Proceedings of the Canadian Society of Civil Engineering Annual Conference 2022 (CSCE 2022), Whistler, BC, Canada, 25–28 May 2022; Gupta, R., Sun, M., Brzev, S., Alam, M.S., Wai Ng, K.T., Li, J., El Damatty, A., Lim, C., Eds.; Lecture Notes in Civil Engineering. Springer: Cham, Switzerland, 2023; Volume 363. [Google Scholar] [CrossRef]
  29. Ruiz-Rodríguez, M.L.; Kubler, S.; Robert, J.; Le Traon, Y. Dynamic Maintenance Scheduling Approach under Uncertainty: Comparison between Reinforcement Learning, Genetic-Algorithm Simheuristic and Dispatching Rules. Expert Syst. Appl. 2024, 248, 123404. [Google Scholar] [CrossRef]
Figure 1. Learning curve (mean ± standard deviation) under infinite capacity, showing convergence to a suboptimal solution (cyclic preventive policy, underfitting regime).
Figure 1. Learning curve (mean ± standard deviation) under infinite capacity, showing convergence to a suboptimal solution (cyclic preventive policy, underfitting regime).
Platforms 03 00016 g001
Figure 2. Room state heatmap in the cyclic preventive regime (infinite capacity). Rows: rooms; columns: days. Colors indicate operational state, as detailed in the legend.
Figure 2. Room state heatmap in the cyclic preventive regime (infinite capacity). Rows: rooms; columns: days. Colors indicate operational state, as detailed in the legend.
Platforms 03 00016 g002
Figure 3. Learning curve for the anticipatory preventive policy (infinite capacity). Substantial reward improvement is observed as the policy converges.
Figure 3. Learning curve for the anticipatory preventive policy (infinite capacity). Substantial reward improvement is observed as the policy converges.
Platforms 03 00016 g003
Figure 4. Room state heatmap for the anticipatory preventive regime (infinite capacity). Preventive interventions are performed proactively to maximize operational uptime.
Figure 4. Room state heatmap for the anticipatory preventive regime (infinite capacity). Preventive interventions are performed proactively to maximize operational uptime.
Platforms 03 00016 g004
Figure 5. Learning curve for the anticipatory preventive policy with finite resource ( ρ = 2 ). A reduction in reward is observed due to resource constraints.
Figure 5. Learning curve for the anticipatory preventive policy with finite resource ( ρ = 2 ). A reduction in reward is observed due to resource constraints.
Platforms 03 00016 g005
Figure 6. Room state heatmap under finite resource constraints ( ρ = 2 ). The agent increases preventive frequency to maintain performance.
Figure 6. Room state heatmap under finite resource constraints ( ρ = 2 ). The agent increases preventive frequency to maintain performance.
Platforms 03 00016 g006
Figure 7. Learning curve for the resource-constrained case ( ρ = 1 ). The policy converges, but performance is severely limited by resource scarcity.
Figure 7. Learning curve for the resource-constrained case ( ρ = 1 ). The policy converges, but performance is severely limited by resource scarcity.
Platforms 03 00016 g007
Figure 8. Room state heatmap for the most restrictive scenario ( ρ = 1 ). Long periods of breakdown are frequent, and preventive actions are mostly replaced by corrective maintenance.
Figure 8. Room state heatmap for the most restrictive scenario ( ρ = 1 ). Long periods of breakdown are frequent, and preventive actions are mostly replaced by corrective maintenance.
Platforms 03 00016 g008
Table 1. Summary of environment parameters used in the experiments.
Table 1. Summary of environment parameters used in the experiments.
ParameterDescriptionValue
NNumber of rooms10
TEpisode duration90 days
T maintenance Preventive maintenance duration1 day
T breakdown Corrective maintenance duration4 days
δ i (reset)Preventive maintenance cycle30 days
RDaily revenue per occupied roomEUR 150
C maintenance Preventive maintenance costEUR 150
C breakdown Corrective maintenance costEUR 1000
p lead Probability for booking lead time (geometric) 0.3
λ b Booking rate (Poisson mean)3 per day
μ s Average stay duration (Poisson mean)3 days
k, λ Weibull shape and scale (failure time) k = 2.0 , λ = 50.0
ρ Number of maintenance resourcesfrom 1 to 10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Grieco, A.; Caricato, P.; Margiotta, P. From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management. Platforms 2025, 3, 16. https://doi.org/10.3390/platforms3030016

AMA Style

Grieco A, Caricato P, Margiotta P. From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management. Platforms. 2025; 3(3):16. https://doi.org/10.3390/platforms3030016

Chicago/Turabian Style

Grieco, Antonio, Pierpaolo Caricato, and Paolo Margiotta. 2025. "From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management" Platforms 3, no. 3: 16. https://doi.org/10.3390/platforms3030016

APA Style

Grieco, A., Caricato, P., & Margiotta, P. (2025). From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management. Platforms, 3(3), 16. https://doi.org/10.3390/platforms3030016

Article Metrics

Back to TopTop