1. Introduction
Large thermal-energy systems are increasingly managed as data-rich engineered processes in which sensing, prediction, optimization, and supervisory control are tightly coupled. This tendency is especially visible in intelligent buildings, district-scale energy infrastructure, and industrial energy systems, where digital data streams are used not only for retrospective diagnostics but also for active decision support and closed-loop operation [
1,
2,
3,
4,
5]. The challenge is no longer simply to predict load or design a controller in isolation. It is to construct a predictive representation of system dynamics that remains informative under admissible future control trajectories and disturbance regimes that differ from the historical archive.
This challenge is particularly acute for thermal systems. Compared with purely electrical processes, thermal-energy objects combine inertia, delayed responses, strong exogenous forcing, intermittent operating modes, and hard technological constraints. A predictive model for this setting must remain accurate on multi-step horizons, numerically tractable under repeated optimizer calls, responsive to candidate control inputs, and resistant to implausible responses that can degrade downstream control quality. Studies on energy forecasting in buildings and engineering systems show that exogenous variables and multi-horizon sequence structure improve forecast quality, but they also show that lower archival error does not automatically translate into better operational usefulness [
6,
7,
8,
9,
10,
11,
12].
The practical relevance of this issue becomes even clearer in systems in which energy use is coupled with a quality-of-service variable. In such cases, the operator must not only reduce consumption but also preserve a technological target. For indoor ice rinks, this target is ice temperature. The refrigeration plant, the glycol circuit, and the slab form a thermally inert nonlinear object whose performance depends on resurfacing events, indoor microclimate, outdoor conditions, and equipment limitations. Prior work on ice-rink energy use shows that the physics of the slab and the refrigeration load are strongly coupled and that both annual energy requirements and short-term load profiles are sensitive to operating mode, facility configuration, and environmental conditions [
13,
14,
15,
16]. From the control viewpoint, this makes the ice rink a representative case of a wider class of constrained thermal-energy systems rather than an isolated niche application.
At the same time, the existing literature suggests that the predictive and control layers are still often developed with partially different objectives. On the prediction side, the state of the art has moved toward deep sequence models, transformer architectures, and hybrid feature pipelines for multi-building or multi-zone energy forecasting [
1,
10,
11,
12,
17,
18]. On the control side, model predictive control has become one of the dominant formulations for energy-efficient operation of thermal systems, buildings, and refrigeration equipment because it naturally accommodates constraints, disturbances, and multi-criteria objectives [
19,
20,
21,
22,
23]. Yet many practical studies still connect the two stages in a loose manner: a model is chosen primarily on the basis of forecast metrics, and only afterward is it inserted into a predictive controller. Such a workflow leaves unresolved whether the selected model is structurally appropriate for counterfactual control evaluation.
The gap becomes more evident when considering physically informed modeling. Recent studies on control-oriented thermal modeling emphasize that hybrid and gray-box approaches can improve robustness under noisy measurements and better align a learned model with the dominant physical mechanisms of the object [
24,
25,
26,
27]. That matters because an optimizer will exploit whatever structure the predictive model makes available inside NMPC, including patterns that may be statistically valid on historical data but physically inconsistent under unseen control trajectories. In practice, a weak physical prior may be more useful than an excessively flexible black-box model if it yields smoother and more structurally plausible horizon trajectories inside the controller.
Another active line of research moves from isolated algorithms to executable decision workflows. In the digital-twin and intelligent infrastructure literature, the value of a model is increasingly tied to its role inside an operational computation loop rather than to standalone descriptive power [
4,
5,
28,
29,
30,
31,
32,
33]. The same shift appears in benchmarking and software-in-the-loop studies, which emphasize reproducibility, runtime feasibility, and the consistent alignment of data preparation, model training, controller evaluation, and diagnostics [
34,
35]. For work at the intersection of forecasting, optimization, and energy informatics, this computational layer is part of the scientific argument, not an auxiliary engineering detail.
Against this background, the paper examines whether weak physical regularization of a control-oriented state model improves downstream nonlinear model predictive control for an indoor ice-rink refrigeration system. The study develops a controlled multi-step predictor with a normalized thermal-balance residual and evaluates it against both historical operation and an NMPC variant driven by the non-regularized base model in a surrogate-based receding-horizon closed-loop benchmark. This benchmark is informative for model selection, but it does not establish realized field savings.
The paper makes three main contributions. First, it introduces a physically regularized controlled state model in which historical state, control, and exogenous variables are encoded jointly with an admissible future control trajectory, while a normalized aggregated thermal-balance residual is included in the training objective. Second, it formulates a nonlinear model predictive control method around that model with explicit penalties on energy use, terminal ice-temperature deviation, and constraint violations. Third, it implements the full workflow as a reproducible software pipeline with explicit runtime assessment under controlled data access, which is essential for demonstrating operational feasibility in data-centric energy control studies [
2,
4,
5,
30,
35].
The remainder of the paper is structured as follows.
Section 2 reviews the literature most closely connected to the proposed work and formulates the research gap.
Section 3 describes the data protocol, the controlled state model, the physical regularization term, and the nonlinear model predictive control formulation.
Section 4 reports the forecasting, control, and runtime results.
Section 5 discusses the implications of the findings and their limitations.
Section 6 concludes the paper.
3. Materials and Methods
3.1. System, Variables, and Data Protocol
The study considers a minute-resolution operational archive of an industrial ice rink refrigeration system. The raw archive contains 306,364 observations collected from 1 September 2024 to 1 April 2025. Because compressor power exhibited an abnormal downward drift in March 2025, only the interval up to 28 February 2025 23:59:59 was retained for model development and control evaluation. The resulting dataset contains 260,285 one-minute observations. The split is chronological: 70% for training, 15% for validation, and 15% for testing.
The data protocol follows the intended receding-horizon use of the model. The chronological split was preserved without random reshuffling, and the March 2025 interval was excluded rather than down-weighted because the compressor-power drift would distort the relation between refrigeration state and the energy term in the control objective. The final cutoff, therefore, keeps model training, hyperparameter selection, and control evaluation on a thermodynamically coherent archive.
The modeled state vector is
where
is the ice temperature,
is the return-glycol temperature,
is the supply-glycol temperature, and
is compressor power. The manipulated variable
is the admissible return-glycol setpoint. The exogenous vector
contains measured indoor and outdoor conditions and engineered features derived from them: indoor and outdoor temperature, indoor and outdoor humidity, motion and illumination indicators, water temperature, heating power, ventilation power, calendar harmonics, short-term derivatives, and moving averages. After feature engineering, the history encoder receives 31 input features per minute. The full feature list is reported in
Appendix A.
The chosen state vector is the smallest set of coordinates that still links energy use, slab thermal response, and operator action. Ice temperature is the controlled technological variable. Return- and supply-glycol temperatures summarize the dominant thermal interaction between the slab and the refrigeration circuit. Compressor power serves both as a state coordinate and as a proxy for the energetic consequence of a given control action. The manipulated variable is kept separate because the task is not simple trajectory continuation, but prediction of the future state under admissible setpoint changes. The exogenous variables account for thermal disturbances and occupancy-related effects that are not directly governed by the supervisory controller but still alter heat gains and operating regime.
The historical window length is L = 180 min and the prediction horizon is H = 30 min. The control update period is 5 min; therefore, the control trajectory on the horizon is represented by six piecewise-constant control blocks.
Table 1 summarizes the resulting dataset and evaluation protocol.
For the NMPC study, representative full-day panels were selected by a diversity-based procedure over daily event count, ice-temperature range, mean compressor power, and mean control level. The resulting validation panel comprised 6 January 2025, 14 January 2025, and 16 January 2025; the final test panel comprised 2 February 2025, 9 February 2025, 14 February 2025, and 21 February 2025. These panel days were fixed before the final policy comparison and without using downstream control outcomes. The focus day used for visualization was 2 February 2025 because, within the selected test panel, it combined the highest resurfacing load with high thermal variability.
A 180 min historical context is long enough to capture the slow thermal drift of the slab and the delayed response of the glycol loop, while a 30 min forecast horizon remains operationally meaningful for supervisory intervention. The 5 min control step matches the intended update frequency of the upper control layer: short enough to react to disturbances such as resurfacing and regime changes, yet long enough to leave a wide computational margin for repeated nonlinear optimization.
3.2. Controlled Multi-Step State Model
At each decision time
, the model receives the historical tensor
and an admissible future control candidate
The predictive task is to approximate the transition operator
where
is the full future trajectory of all state coordinates on the horizon. This formulation is deliberately control-aware: the model is not trained as an autonomous forecaster of observed time series, but as an approximation of the future system response under candidate control actions.
The chosen architecture is a light-conditioned transformer implemented in PyTorch. The history branch projects the 31 historical features to a latent space of dimension 32, augments them with sinusoidal positional encoding, and processes them with a one-layer transformer encoder with four attention heads and feed-forward width 64. The future control branch embeds the candidate control sequence together with a normalized time coordinate . A feed-forward head then decodes the full 30-step trajectory of the four target variables. The compact architecture was selected intentionally to preserve fast re-evaluation inside the control loop.
The architecture is compact by design, balancing explicit control conditioning, differentiability of the horizon mapping, and low inference cost under repeated optimizer calls.
Earlier benchmarking on the same archive had already identified the compact controlled transformer as the strongest practical lightweight baseline for multi-step forecasting. The present comparison therefore focuses on whether weak physical regularization improves that baseline for downstream NMPC.
Appendix B summarizes the earlier benchmark under the same archive cutoff, chronological split, 180 min history, 30 min horizon, and evaluation KPIs.
The forecast component of the training loss is
where
are horizon weights increasing from 1.0 to 1.5, and
increases the contribution of ice temperature relative to the remaining state coordinates. The model is trained with AdamW, a learning rate of
, batch size 128, and early stopping with patience 4 over a maximum of 16 epochs.
The loss weighting reflects the control task. The later optimization problem depends on coordinated prediction of the full state vector, with slightly greater emphasis on the ice-temperature trajectory and the end of the horizon. The diagonal weight matrix therefore emphasizes the state coordinate that appears in the terminal control penalties, while the increasing horizon weights discourage the predictor from concentrating capacity on the earliest and easiest forecast minutes.
3.3. Physical Regularization Through a Normalized Thermal-Balance Residual
The physical prior is introduced not through a full first-principles model of the slab and the refrigeration plant, but through an aggregated balance on the target coordinate, i.e., ice temperature. For each horizon step, the residual is written as
where
s,
is an aggregated heat-exchange coefficient, and
represents unresolved external heat gains.
To make the physical term numerically commensurate with the forecast loss, the residual is normalized by a robust scale identified on calm night-time intervals:
The physical loss is then
with event-sensitive weights
that amplify resurfacing intervals. The total training objective is
Normalization makes the physical term dimensionless and comparable across candidate regularization strengths. The robust scale expresses the residual relative to calm-interval variability, so can be interpreted as a relative weighting coefficient.
Resurfacing starts were marked automatically from the ice-temperature signal. The raw ice-temperature series was first cleaned within the admissible physical range and smoothed with a 60 min exponentially weighted mean. A candidate resurfacing start was registered when the smoothed derivative exceeded 0.03 °C /min. Accepted starts had to be separated by at least 30 min, and each accepted start was then dilated symmetrically by 10 min to form the binary resurfacing mask used in diagnostics and in the physical-loss weighting.
Calm identification segments for the physical prior were defined as 1 min points with a valid 59–61 s sampling step, time-of-day between 23:00 and 06:00, outside the dilated resurfacing mask, ice temperature within [−10, 3] °C, absolute return-glycol temperature rate not exceeding 0.05 °C /min, and absolute compressor-power rate not exceeding 5000 W/min. Nights with fewer than 120 valid points were discarded. This protocol yielded 55,519 valid points across 179 nights.
Appendix C reports the retained counts and robust coefficients.
The thermal-balance parameters were identified from calm night segments and then fixed during model training:
The final coefficients were taken as medians of the nightwise least-squares estimates rather than as a single global fit, which reduced sensitivity to unusually disturbed nights. In the physical loss, nominal horizon steps received weight 1.0, and resurfacing-labeled steps received weight 4.0. Exploratory scans on validation data over regularization coefficients reduced the normalized balance residual but degraded forecast quality too sharply. The final weak-regularization selection was therefore carried out on validation data, and was selected because it was the weakest value that materially improved physical consistency while preserving near-base forecast accuracy. The test split is used below only for final out-of-sample reporting, not for selecting .
The balance relation is estimated from quiet intervals where the dominant slab dynamics are easier to isolate, then used as a weak prior during training on the full operational archive, including disturbed periods. The physical term constrains the data-driven model rather than replacing it, which is why the selected regularization coefficient remains weak: larger coefficients imposed too much bias on the predictor and degraded the learned representation of disturbed operating regimes.
3.4. NMPC Formulation
The control problem is solved on a receding horizon with a piecewise-constant control parameterization. Let
denote the six 5 min control blocks over the 30 min prediction horizon. For a fixed model bundle, the objective is
where the terms are defined over the predicted horizon as follows:
The reference and bounds are °C, °C, and °C. The admissible setpoint is constrained to °C. Additional linear inequality constraints enforce bounds on step-to-step changes of the blockwise input. The fixed weights are , , , , and . The energy weight and the admissible blockwise change are tuned on a three-day validation panel through Pareto filtering and minimum distance to the ideal point over daily energy, terminal violation share, mean absolute terminal ice-temperature deviation, and mean objective value.
The normalization constants are fixed as , , and . Their role is purely dimensional: they keep the tracking, violation, and energy terms numerically commensurate so that the controller weights remain interpretable. Controller tuning used the validation panel comprising 6 January 2025, 14 January 2025, and 16 January 2025.
The resulting controller settings are °C for the base model and °C for the physically regularized model. The optimization problem is solved with the SLSQP method using analytical gradients from PyTorch automatic differentiation and warm-started from the shifted solution of the previous control step.
The computational pipeline was implemented in Python 3.12.8 using PyTorch 2.6.0 for model training and automatic differentiation, NumPy 1.26.4 and pandas 2.2.3 for data processing, SciPy 1.13.1 for the SLSQP optimization routine, and Matplotlib 3.10.0 for visualization.
The physically regularized predictor tolerated a larger admissible blockwise setpoint change and a stronger weight on the energy term without losing terminal control quality. This suggests that the added physical structure reshaped the optimization landscape rather than merely changing an auxiliary metric. Warm starting serves the same objective: in a receding-horizon setting, the previous optimal block sequence carries useful information about the neighborhood of the next optimum and improves numerical efficiency.
3.5. Reproducible Software Workflow and Evaluation Protocol
The study is organized as a reproducible computation pipeline linking data preparation, model training, controller tuning, policy evaluation, and runtime diagnostics.
Figure 1 summarizes the workflow. Consistent data alignment, reuse of the identified physical parameters, synchronized model artifacts, and a shared temporal protocol are all necessary for a valid comparison, so the software workflow is treated as part of the method [
4,
5,
31,
35]. The present workflow is reproducible in the computational sense under controlled access rather than fully open in the raw-data sense, because the archived measurements originate from an industrial facility and remain confidential.
The workflow also addresses a common reproducibility problem in predictive-control studies. When training, tuning, figure generation, and control evaluation are carried out in loosely connected notebooks or scripts, small mismatches in time alignment, scaling, feature definitions, or model artifacts can contaminate the comparison. Here the exact data cutoff, training outputs, controller parameters, and evaluation summaries are stored as synchronized artifacts, so the comparison between the historical strategy, NMPC with the base model, and NMPC with the physically regularized model can be regenerated without reinterpretation of intermediate processing steps.
Appendix E summarizes the controlled-access artifact inventory.
In this study, surrogate-based closed-loop comparison refers to a receding-horizon policy benchmark. It does not represent plant deployment or a free-running first-principles simulation. At each decision time, each controller is optimized with its designated predictive model, the first 5 min control block is applied, and the resulting policy is then scored under both learned evaluators. The main-text summary reports the arithmetic mean of each controller metric under the base and physically regularized evaluators to avoid a self-favoring comparison.
Appendix F reports the corresponding evaluator-specific aggregate summaries.
The benchmark uses repeated archive-context re-initialization rather than a recursively propagated full-day surrogate rollout. When the next 5 min decision time arrives, the controller is re-initialized from the newly available measured 180 min archive window rather than from surrogate-predicted states carried over from the previous step. Surrogate rollouts are confined to each local 30 min decision horizon and to evaluator-side scoring; between decision steps the history is reset from the measured archive.
No separate future exogenous forecast is supplied on the 30 min horizon. Exogenous influence enters through the measured historical encoder window, while the future trajectory is driven only by the candidate control sequence. This setup isolates the effect of the predictive model used inside NMPC under a common information set instead of assuming perfect disturbance foresight on a short horizon. The measured future archive is used only for diagnostic comparison and model-gap analysis.
3.6. Evaluation Metrics and Ablation Logic
The evaluation protocol addresses two separate questions: how physical regularization changes the predictive behavior of the model, and whether that change helps once the model is embedded in nonlinear model predictive control. The study therefore uses both forecast-level and control-level metrics.
At the forecast level, three indicators are emphasized. The average ice-temperature root-mean-square error over the horizon measures the accuracy of the main technological coordinate. The event-tail state root-mean-square error measures the difficulty of disturbed periods and the late part of the horizon, where prediction is more relevant for control than for immediate one-step continuation. The normalized thermal-balance root-mean-square error on the horizon tail measures physical inconsistency relative to the identified balance scale. Together, these metrics describe the trade-off between statistical fit and structural coherence.
At the control level, the daily predicted energy, the terminal violation share, the mean absolute terminal ice-temperature deviation, and the mean objective value are reported on multi-day panels. The terminal metrics matter because, in the proposed NMPC formulation, the end of the horizon summarizes whether the current control sequence drives the system toward a favorable future state rather than only toward a favorable immediate response. A controller that looks good near the current time but systematically deteriorates the horizon end would be unreliable for receding-horizon use.
The model ablation follows three stages. First, exploratory regularization screening and final weak-region selection are performed on validation data rather than the final test split. Second, the selected weak-regularization candidate and the base model are reported on the test split using forecasting and physical-consistency metrics. Third, both models are embedded into their own tuned NMPC configurations and compared against historical admissible operation and against each other on the preselected four-day test panel.
5. Discussion
The results suggest that physical regularization is useful only in a weak regime. Strong regularization damages forecasting quality, whereas weak regularization reduces structural inconsistency while preserving enough flexibility to represent disturbed operating regimes. This is consistent with the broader literature on control-oriented physics-informed learning, which argues for balancing structural priors against data fidelity rather than replacing one with the other [
24,
25,
26,
27]. In the present case, the selected coefficient acts as a calibrated bias toward thermally plausible trajectories, not as an attempt to recover a full first-principles simulator.
Archival forecasting accuracy alone is an incomplete criterion for a model that will later be queried under optimization. The base model retained a slight advantage in average ice-temperature RMSE, but the physically regularized model yielded the best policy once embedded in NMPC. This outcome matches the broader argument in data-driven predictive-control research that the predictive model and the optimization layer should be evaluated jointly rather than sequentially [
22,
23,
36,
38]. The physical term appears to reshape admissible horizon trajectories in a way that makes the control objective easier to optimize. The gain over NMPC (base) is modest in absolute terms, but it is consistent across all reported control indicators and therefore relevant for model selection.
Indoor ice rinks are difficult control objects because they combine strongly disturbed periods, narrow technological requirements, and expensive refrigeration operation. The present experiments suggest that a state model informed by an aggregated heat balance is useful in this context because the ice slab behaves as an inertial buffer whose response cannot be represented adequately by a pure continuation model. Earlier physical studies of ice-rink energy use and more recent work on predictive environmental control in rinks both point in the same direction: the thermal structure of the object matters for both energy efficiency and service quality [
13,
14,
15,
16]. The contribution here is to embed that insight directly into a learned multi-step state model used for receding-horizon optimization.
Reproducibility is part of the same methodological picture. In many applied control studies, it is discussed only briefly, even though the final conclusions depend on a chain of linked steps: data cutoff, feature engineering, model training, controller tuning, and policy evaluation. The digital-twin and benchmarking literature suggests that this chain should be treated as part of the method, not as an implementation appendix [
4,
5,
31,
32,
33,
35]. The present workflow follows that view. The same artifact chain generates the figures, tables, and control summaries reported in the paper, which reduces the risk that the comparison is affected by silent mismatches between experimental stages.
Several limitations remain. First, the control study is counterfactual: the policies are evaluated with learned surrogate models rather than by direct online deployment on the industrial facility. This is a useful intermediate step for method development, but it is not a substitute for full field validation. Accordingly, the reported gains should be interpreted as benchmark evidence for control-oriented model selection rather than as realized operational savings. Second, the present study uses one facility and one manipulated variable, so transferability to other refrigeration plants, other slab constructions, or other supervisory degrees of freedom still has to be established. Third, the physical prior is intentionally aggregated. It captures the dominant thermal structure of the slab-glycol interaction, but it does not represent all heat-gain channels explicitly. Fourth, the short-horizon benchmark does not inject an explicit future exogenous forecast. This isolates the effect of the predictive model used inside NMPC under a common information set, but future studies should test whether explicit disturbance forecasts change the relative ranking of the predictive models. Adding richer disturbance channels, state-estimation layers, or adaptive physical coefficients may improve performance, but such extensions must preserve the computational economy that makes the present NMPC loop feasible.
These limitations point naturally toward future work. The most immediate next step is online or hardware-in-the-loop validation of the controller on the actual supervisory system. A second direction is the extension of the control input space to coordinated management of refrigeration, ventilation, or humidity-related subsystems, which would connect the present slab-focused model to the broader environmental control problem already emerging in the ice-rink literature [
13]. A third direction concerns model adaptation: because the present workflow already stores synchronized training and evaluation artifacts, it provides a practical basis for future studies on periodic re-identification, seasonal adaptation, and concept-drift handling under operational constraints.
6. Conclusions
This paper proposed and evaluated a physically regularized control-oriented state model for an ice rink refrigeration system, together with an NMPC framework and a reproducible software workflow. The model conditions the forecast on an admissible future control trajectory and adds a normalized thermal-balance residual to the training objective. The selected weak-regularization model reduced the normalized thermal-balance error on the horizon tail by 30.29% while increasing the average ice-temperature RMSE by only 1.90%.
Within the surrogate-based counterfactual NMPC evaluation, the physically regularized model outperformed both historical admissible setpoint tracking and NMPC driven by the base model on the four-day aggregate panel. Relative to historical operation, the proposed controller reduced predicted daily energy by 4.84%, terminal violation share by 17.32%, mean absolute terminal ice-temperature deviation by 18.74%, and the mean objective value by 30.82%. Relative to NMPC based on the base model, it produced additional aggregate gains across all four indicators. The implementation remained computationally lightweight, with a mean full control cycle time of 0.0311 s and a maximum of 0.1064 s.
Taken together, these results show that weak physical regularization can improve surrogate-based receding-horizon energy optimization under learned-model evaluation, even when it does not dominate every forecast metric. More broadly, the study shows that control-oriented model selection should be based on predictive accuracy, structural consistency, and performance inside the optimization loop.
For energy-intensive thermal processes, predictive models should be designed with the control problem in mind from the outset. A model trained only to reproduce archived trajectories may remain suboptimal or even misleading when used for nonlinear predictive control under counterfactual setpoint candidates. A weakly regularized control-oriented model can preserve adequate forecasting quality while supplying the optimizer with horizon trajectories that are more compatible with the dominant thermal structure of the object in a surrogate-based receding-horizon setting.
Digital, data-centric control studies benefit from being reported as executable workflows rather than as disconnected algorithmic blocks. In this paper, the data protocol, predictive model, physical prior, tuning procedure, control evaluation, and runtime diagnostics were treated as one coherent computational experiment. That perspective is especially relevant for journals in the BDCC scope because it aligns the machine-learning, optimization, and software-engineering aspects of the contribution instead of isolating them artificially.
The same combination of inertial dynamics, exogenous disturbances, tight operational limits, and energy-quality trade-offs appears in broader classes of refrigeration, cooling, and thermal-management systems. The proposed framework may therefore also be useful beyond indoor ice rinks. Field validation remains the next necessary step before the reported gains can be interpreted as realized operational savings.