1. Introduction
Accurate estimation of crop evapotranspiration (ETc) is fundamental to precision irrigation management, enabling farmers to match water supply with actual crop demand and thereby improve water-use efficiency, reduce energy costs, and sustain agricultural productivity under increasing water scarcity [
1]. In tropical fruit cultivation, where irrigation scheduling is particularly sensitive to diurnal and seasonal fluctuations in atmospheric demand, reliable hourly ETc estimates are important for supporting data-driven irrigation management in drip and micro-irrigation systems [
2]. Durian, Thailand’s most economically significant tropical fruit, is cultivated extensively in the Eastern region, where prolonged dry spells and erratic rainfall patterns make data-driven irrigation decisions increasingly critical for maintaining yield and fruit quality [
3].
The Food and Agriculture Organization (FAO) Penman–Monteith (PM) equation remains the internationally recommended standard for computing reference evapotranspiration (ETo) [
4]. Although physically well-founded, its operational application is impeded by the requirement for complete sets of meteorological inputs, net radiation, air temperature, humidity, and wind speed, which are frequently unavailable and subject to measurement gaps at the farm level, particularly in developing regions [
5]. Moreover, ETc estimates rely on empirically derived crop coefficients (
) that must be locally calibrated and periodically updated, introducing additional uncertainty in dynamic tropical environments [
6]. These limitations have motivated a growing body of research into data-driven approaches capable of learning ETc dynamics directly from meteorological observations, without explicit physical parameterization.
Machine learning (ML) and deep learning (DL) methods have been widely explored as alternatives or complements to physics-based ETc models. Shallow ML approaches including support vector regression, random forests, and extreme gradient boosting have demonstrated competitive accuracy relative to PM estimates when trained on locally collected meteorological data [
7,
8]. Among sequential models, Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) have been particularly effective for short-term forecasting, owing to their ability to capture nonlinear temporal dependencies across multi-day look-back windows [
9,
10,
11,
12]. However, recurrent architectures process sequences step-by-step and may struggle to directly exploit long-range periodic patterns—such as the pronounced 24-h diurnal and 7-day weekly cycles that characterize weather in tropical orchards—because information from distant timesteps must propagate through many sequential hidden states [
12,
13].
Statistical time-series models, notably the Seasonal Autoregressive Integrated Moving Average (SARIMA), have also been applied to short-term ETc and reference ET forecasting [
14,
15]. SARIMA is interpretable and computationally efficient, but it is inherently univariate, relying solely on the historical ETc record and unable to assimilate concurrent meteorological drivers such as solar radiation and humidity as auxiliary features. Furthermore, SARIMA assumes linear dynamics and a fixed seasonal structure, which can inadequately represent the complex non-stationary behavior of ETc under variable cloud cover and intermittent rainfall in tropical climates [
9,
14,
15].
The Transformer architecture, originally introduced by Vaswani et al. for natural language processing [
16], has recently demonstrated strong performance on multivariate time-series forecasting tasks in agricultural and environmental domains. Its multi-head self-attention mechanism allows the model to simultaneously relate any pair of timesteps within the input window, regardless of their temporal distance, making it well-suited to capturing both short-term diurnal transitions and longer weekly periodicities without the information-decay constraint of recurrent networks [
17,
18]. Compared with LSTM-based models, Transformer encoders are also more amenable to parallelization, facilitating training on large meteorological datasets collected over multi-year periods [
19,
20].
Despite these promising properties, the application of Transformer models to hourly ETc estimation in tropical horticultural settings remains largely unexplored, and a direct comparison with SARIMA as a statistical baseline under identical data conditions has not been reported. Most existing studies focus on daily and sub-daily reference ET in temperate climates, using meteorological station data rather than location-specific weather API data [
21,
22]. The present study addresses this gap through a site-specific proof-of-concept evaluation; given the single-site design, PM-derived target variable, single chronological holdout period, and heuristic simulation component, it is not intended as a validated operational system, and its findings should be interpreted accordingly.
To address these gaps, this study proposes a Transformer encoder model for hourly ETc estimation in a durian orchard in Chanthaburi Province, Eastern Thailand, using 36,528 hourly meteorological observations obtained from the Visual Crossing Weather API for the orchard location over four years, with ETc computed using the FAO-56 Penman–Monteith equation. The model employs a look-back window of 168 h (7 days) and stacked multi-head self-attention blocks to capture both diurnal and weekly ETc dynamics. A SARIMA model trained on the same data serves as the statistical baseline.
To the best of the authors’ knowledge, no previous study has applied a Transformer encoder model to hourly crop ETc estimation in a tropical orchard setting using meteorological data from a weather API. Most existing studies address daily and sub-daily reference ET in temperate climates with standard weather station data [
6,
9], leaving a gap in exploratory evaluations of deep learning approaches for hourly ETc estimation in tropical fruit production systems. The principal contributions of this work are as follows:
A multivariate Transformer encoder architecture is developed and evaluated for one-step-ahead hourly FAO-56 PM-derived ETc estimation at a single tropical orchard location, using meteorological data obtained from a weather API. The results suggest that a 168-h self-attention window can reproduce the 24-h diurnal cycle and 7-day weekly pattern of ETc at this site and test period; however, these findings are specific to the single site and episode evaluated and cannot be generalised without further validation.
An exploratory and structurally unbalanced comparison between the proposed multivariate Transformer and a univariate SARIMA baseline is conducted under identical train-test partitioning. Because the Transformer receives five meteorological input features while SARIMA operates on the ETc series alone, the observed performance difference reflects a combination of architectural, input, and context-length factors and cannot be attributed to the self-attention mechanism alone. This comparison is therefore reported as an initial exploratory evaluation rather than a controlled architectural assessment.
A recursive 168-h heuristic simulation is implemented as a visually assessed feasibility demonstration under approximated meteorological inputs, in which future meteorological variables are approximated using observed values from 72 h prior. This exercise is not a validated forecasting procedure; it illustrates only that the model can maintain physically plausible ETc patterns under a simplified meteorological approximation. Its utility for irrigation planning remains to be established through field-based investigation involving independently measured ETc and real forecast weather inputs.
3. Results
3.1. Model Training
The Transformer model was trained for 24 epochs before early stopping was triggered, indicating that validation loss ceased to improve within the allotted patience of 10 epochs. The best model weights were restored from epoch 14.
The training loss decreased rapidly during the first two epochs, from approximately 0.0555 to 0.0032, and continued to decline gradually thereafter, converging at a training loss of 0.0016 and a validation loss of 0.0017 at the optimal checkpoint. No signs of severe overfitting were observed, as the training and validation losses remained closely aligned throughout the training process (
Figure 2). These results confirm that the early stopping strategy with best-weight restoration was effective in obtaining a well-generalized model.
3.2. Stationarity Analysis
Prior to model comparison, the Augmented Dickey–Fuller (ADF) test was applied to the ETc time series to verify stationarity. The results are summarized in
Table 3. The original ETc series yielded an ADF statistic of
with a
p-value of
, well below the 1% significance threshold (
), providing strong evidence against the null hypothesis of a unit root. Consequently, the series is stationary in its original form, which is consistent with the SARIMA model specification using
purely for capturing autoregressive dynamics rather than for removing a stochastic trend. First-order differencing further reduced the ADF statistic to
(
), confirming robust stationarity.
3.3. ETc Estimation Performance on the Test Set
Table 4 presents the ETc estimation performance of the SARIMA baseline and the proposed Transformer model evaluated on the 168-h test set across six complementary metrics. The Transformer model achieved an RMSE of 0.0308 mm/h, MAE of 0.0188 mm/h, and
of 0.9018, substantially outperforming SARIMA (RMSE = 0.0717, MAE = 0.0593,
= 0.4688). The Transformer’s RMSE was approximately 57% lower than that of SARIMA, while its
was approximately 0.43 units higher, indicating that the model explained over 90% of the variance in the observed ETc values during the test period.
It is noted at the outset that this evaluation is based on a single 168-h test episode, without multivariate statistical baselines such as SARIMAX with exogenous inputs or machine learning comparators; the absence of these baselines means that the observed performance gap cannot be attributed solely to the Transformer architecture.
The Transformer outperformed SARIMA on all reported metrics. Three design differences between the two models may contribute to this gap: the Transformer’s multivariate input design incorporating four meteorological features, its 168-h self-attention window enabling direct comparison of any two timesteps within the look-back period, and its three stacked encoder blocks. However, the current experimental design does not isolate these factors, as the Transformer differs from SARIMA simultaneously in input dimensionality, model architecture, and temporal dependency structure. It is therefore not possible to determine from the present results whether the performance gains are attributable to the self-attention architecture, the multivariate input design, or the combination of both. A comparison with a multivariate statistical baseline such as SARIMAX would be required to isolate these contributions and is identified as a direction for future research.
The irrigation-relevant metrics provide evaluation criteria that are more consistent with agronomic contexts than hourly RMSE alone. However, they remain statistical proxies rather than demonstrated irrigation outcomes. The daytime-only RMSE () was 0.0414 mm/h for the Transformer versus 0.0791 mm/h for SARIMA, representing a 47.7% reduction, indicating more accurate reproduction of ETc during daytime hours, a period that is agronomically relevant for irrigation scheduling, though the present study does not evaluate actual irrigation decisions or outcomes.
The mean daily peak ETc bias was −0.0180 mm/h for the Transformer and −0.0965 mm/h for SARIMA. Both models exhibited a tendency to underestimate daily peak ETc values, but the magnitude of underestimation was substantially smaller for the Transformer, indicating more accurate reproduction of maximum daily water demand.
The daily cumulative ETc MAE was 0.1599 mm/day for the Transformer versus 0.5901 mm/day for SARIMA, a 72.9% reduction, reflecting more accurate statistical proxies of total daily water demand. These metrics are more consistent with irrigation-relevant evaluation criteria than hourly RMSE alone, but do not constitute demonstrated field-scale irrigation outcomes.
The in-sample performance of the SARIMA model on the training set is illustrated in
Figure 3. The fitted values closely follow the observed diurnal pattern over the last 500 h of the training period, achieving a training RMSE of 0.0392 and
of 0.9539, indicating that the model captured the general seasonal structure of the ETc series during fitting. However, the considerably lower test-set performance (RMSE = 0.0717,
= 0.4688) suggests limited generalization to unseen data, particularly during periods of anomalous solar radiation and atypical meteorological conditions. Note that
Figure 3 covers the training period (4–23 September 2025), while
Figure 4 covers the held-out test period (24 September–1 October 2025); the two figures necessarily cover different time windows because they serve different analytical purposes.
Figure 4 provides a visual illustration of the qualitative difference in forecasting behavior between the two models over the selected test week. The Transformer more closely tracks the diurnal fluctuations of the observed series, whereas the SARIMA forecast exhibits a damped amplitude response and underestimates the magnitude of daytime ETc peaks during high-radiation periods. It should be noted, however, that this figure represents a single 168-h test episode and is intended to illustrate qualitative differences in model behavior rather than serve as evidence of general predictive superiority. Performance over other seasonal periods or meteorological conditions may differ, and rolling-origin evaluation across multiple non-overlapping test windows is identified as an important direction for future research.
The test period (24 September–1 October 2025) falls within Thailand’s rainy season, which is characterized by higher cloud cover, reduced solar radiation, and lower ETc peaks relative to the dry season (November–April). The representativeness of this test week with respect to the full four-year dataset—which spans both rainy and dry season conditions—has not been formally established. Model performance during dry season periods, which typically exhibit higher solar radiation and substantially higher ETc peaks, remains untested. The visual comparison in
Figure 4 should therefore be interpreted as an illustration of qualitative model behavior under rainy season conditions only, not as evidence of performance across diverse seasonal contexts.
3.4. Future Forecasting
To demonstrate the model’s potential utility in a proof-of-concept setting, the trained Transformer was deployed in a recursive multi-step heuristic simulation to predict ETc for the subsequent 168 h beyond the end of the available record (
Figure 5). At each step, the model predicted one hour ahead; the forecast window was then advanced by one timestep, and the predicted ETc value was inserted into the input sequence. Exogenous feature values (temperature, humidity, solar radiation, and wind speed) for future timesteps were approximated using the corresponding observations from 72 h prior (equivalent to the same hour three days previously). This replication strategy preserves the diurnal cycle structure of the meteorological inputs but does not represent real future meteorological uncertainty, as day-to-day variability in temperature, humidity, solar radiation, and wind speed is not captured. The exercise should therefore be interpreted solely as a heuristic simulation demonstrating that the model can generate physically plausible ETc patterns under a simplified meteorological approximation, and not as evidence of operational multi-step forecasting capability. The 168-h simulation exhibits a physically plausible diurnal pattern, with ETc values rising during daylight hours and approaching zero at night, consistent with the dependence of evapotranspiration on solar radiation. The peak ETc values in the forecast period are comparable in magnitude to those observed in the 168-h history window, suggesting that the 72-h feature replication strategy preserved the general diurnal structure under the approximated meteorological scenario.
Figure 5 presents the recursive 168-h ETc simulation generated by the trained Transformer model for the period 1–8 October 2025, immediately following the end of the available observation record. The black line represents the last 48 h of observed ETc, serving as the historical context window, while the green dashed line with markers shows the model’s predictions for the subsequent 168 h.
The simulation exhibits a physically plausible and consistent diurnal pattern throughout the entire 7-day horizon. ETc values rise sharply during daylight hours, reaching daily peaks of approximately 0.28–0.35 mm/h between late morning and early afternoon, and approach near-zero values during night-time hours, consistent with the dependence of evapotranspiration on incoming solar radiation. The amplitude and timing of predicted diurnal peaks remain stable across all seven forecast days, with no evidence of cumulative error propagation or systematic drift.
These results demonstrate that under the 72-h feature replication heuristic, the model generates physically plausible diurnal ETc patterns across the 7-day simulation horizon. However, this exercise constitutes a heuristic simulation rather than a fully rigorous operational forecast, as the exogenous meteorological inputs are approximated rather than independently predicted. The results should therefore be interpreted as a proof-of-concept demonstration of the model’s ability to maintain physically consistent ETc patterns under an approximated meteorological scenario, rather than as evidence of operational forecasting capability without future weather data.
4. Discussion
4.1. Transformer Performance Relative to SARIMA
The Transformer encoder model substantially outperformed the SARIMA baseline on the 168-h test set, achieving an RMSE of 0.0308 mm/h and
of 0.9018 compared with RMSE = 0.0717 and
= 0.4688 for SARIMA. These findings are consistent with the broader literature reporting that deep learning models outperform classical statistical approaches for hourly and sub-daily PM-derived ETc estimation, particularly when multiple meteorological drivers are available as inputs [
6]. LSTM-based models applied to daily ETc forecasting in tropical conditions have demonstrated strong predictive accuracy relative to physics-based baselines [
9,
12], while GRU-based approaches for sub-daily ET estimation have shown competitive performance under similar meteorological input configurations [
10]. The Transformer encoder proposed in this study achieves competitive accuracy at hourly resolution in a tropical orchard setting, suggesting that self-attention mechanisms offer a viable alternative to recurrent architectures for this estimation task.
It should be noted that these metrics are computed over a single 168-h test period, representing one meteorological episode in late September 2025. While the results are consistent with the broader literature on deep learning versus statistical baselines for ETc estimation, they should be interpreted as site-specific and period-specific findings. A rolling-origin evaluation across multiple non-overlapping test windows spanning different seasons would provide a more robust assessment of generalization performance [
39,
40].
The comparison between the multivariate Transformer and the univariate SARIMA baseline is structurally unbalanced in three respects: input dimensionality (five meteorological features versus the ETc series alone), temporal context (168-h self-attention window versus autoregressive lags), and model family (nonlinear deep learning versus linear statistical model) [
41]. The observed performance gap therefore reflects a combination of these factors and cannot be attributed to the self-attention architecture alone. This is not merely a future direction but a necessary caveat for interpreting the present results: the current design demonstrates that the combined Transformer configuration outperforms a univariate statistical baseline on this dataset and test period, but does not establish which design element drives the improvement. A fair architectural comparison would require at minimum a SARIMAX model with exogenous meteorological inputs, a multivariate machine learning baseline such as XGBoost or Random Forest, and a univariate Transformer operating on the ETc series alone [
6,
41].
The relatively modest generalization performance of SARIMA (
on the test set versus
on the training set) highlights the well-known limitations of univariate linear time-series models when applied to ETc estimation in variable tropical climates. The large gap between in-sample and out-of-sample performance is attributable to two structural constraints: first, SARIMA cannot assimilate concurrent meteorological covariates such as solar radiation and relative humidity, which are the primary drivers of day-to-day ETc variability; second, the fixed seasonal structure assumed by SARIMA
may be inadequate to represent non-stationary ETc dynamics arising from irregular cloud cover and rainfall events common in Chanthaburi Province during the study period [
14,
26]. These observations are in line with findings by [
6], who reported that multivariate machine learning models consistently outperformed univariate statistical baselines for reference ET estimation under data-sparse and climatically variable conditions.
4.2. Role of Multivariate Inputs and Long Look-Back Window
A key distinguishing feature of the proposed model is its use of five concurrent meteorological features and a 168-h (7-day) look-back window. The ETc values used in this study were computed from the same four meteorological inputs (solar radiation, temperature, relative humidity, and wind speed) using the FAO-56 Penman–Monteith equation. The Transformer encoder therefore learns to approximate this functional relationship from sequential meteorological observations, rather than forecasting an independently measured agronomic variable [
42,
43].
This distinction has important implications for interpreting the high
of 0.9018. The strong performance reflects the model’s ability to approximate a deterministic mathematical relationship between inputs and a derived target, not its ability to predict actual crop water use under variable biophysical conditions. Independently measured ETc, obtained from lysimeter or eddy-covariance systems, reflects actual crop water use including stomatal regulation, soil water limitation, and canopy boundary-layer effects not captured by the PM equation [
43,
44]. Model performance against measured ETc is therefore likely to be lower than against PM-derived ETc, particularly under water stress conditions, partial canopy cover, or periods when the FAO-56 crop coefficient does not accurately represent local crop status. This is not a limitation unique to the present study—it is an inherent consequence of using a PM-derived target and must be acknowledged as a boundary of interpretation rather than deferred as future work [
42].
Within this context, the inclusion of all four meteorological drivers as input features is consistent with the physical basis of the FAO-56 PM framework [
1], in which ETc is determined by the joint action of radiation, temperature, vapour pressure deficit, and aerodynamic resistance. The 168-h look-back window further enables the model to capture temporal patterns in these drivers, particularly the sharp diurnal rise and fall of solar radiation that dominates hourly ETc fluctuations in tropical orchards [
29,
30].
The choice of
h was motivated by the need to encompass both the 24-h diurnal cycle and the 7-day weekly periodicity of ETc, which have been observed to influence irrigation demand patterns in perennial orchard crops [
29,
30]. The self-attention mechanism, by directly comparing any two positions within the 168-h window, can in principle capture these periodicities without relying on the sequential state propagation required by LSTM or GRU architectures. However, this interpretation remains inferential. The present study does not include an ablation analysis, a univariate Transformer baseline, or a multivariate statistical comparison such as SARIMAX. It is therefore not possible to determine from the current results whether the performance gains are attributable to the self-attention architecture, the multivariate input design, the 168-h temporal context, or the combination of these factors. The claims in this section should therefore be read as plausible hypotheses consistent with the observed results, rather than as demonstrated causal mechanisms.
The test results show that the Transformer more closely reproduced the timing and amplitude of daytime ETc peaks across all seven test days compared with SARIMA, including days with lower peak ETc values (27–28 September 2025). The lower peaks on these days are consistent with reduced solar radiation input, as solar radiation is the dominant driver of hourly ETc in the FAO-56 PM framework, though the specific meteorological cause cannot be confirmed without additional observational data.
4.3. Operational Applicability of Recursive Forecasting
The heuristic simulation demonstrated that under the 72-h feature replication approximation, the trained model generates physically plausible and stable diurnal ETc patterns across the 7-day simulation horizon, with no evidence of systematic drift or amplitude decay. These results should be interpreted as a proof-of-concept demonstration of the model’s ability to maintain physically consistent ETc patterns under an approximated meteorological scenario, rather than evidence of fully autonomous operational forecasting capability [
3,
21].
It should be emphasized that the 72-h feature replication strategy is a heuristic approximation that does not constitute forecasting without future meteorological information in a realistic operational sense, as it assumes that meteorological conditions three days prior are representative of future conditions. This assumption may be reasonable during periods of stable weather but is likely to introduce error during periods of sustained meteorological change, such as the onset of the monsoon season or multi-day cloudy spells.
More fundamentally, the recursive simulation does not test the model’s ability to forecast ETc from independently predicted meteorological inputs—it tests only whether the model can maintain physically plausible ETc patterns when fed a simplified approximation of future weather. The two tasks are structurally different: the first requires a coupled meteorological forecasting system, while the second requires only a self-consistent diurnal pattern. The physically plausible output shown in
Figure 5 is therefore a necessary but not sufficient condition for operational multi-step ETc forecasting. True operational deployment would require integration with independently forecast meteorological inputs from a numerical weather prediction (NWP) source, replacing the replication heuristic with data-driven atmospheric projections [
19,
22]. Until such integration is demonstrated and validated against independently measured ETc, the recursive simulation exercise should be understood as a feasibility demonstration rather than a validated forecasting procedure.
4.4. Implications for Smart Irrigation in Tropical Horticulture
The results of this study suggest several potential advantages of the proposed approach for data-driven irrigation management, subject to the limitations noted in this study. These should be understood as directions for future applied research rather than demonstrated operational outcomes.
First, the model relies solely on meteorological variables routinely available from low-cost weather monitoring platforms, without requiring specialised sensors or atmospheric sounding equipment required by the FAO-56 PM method [
1,
4]. If validated with on-site sensor data and independently measured ETc values, this characteristic could make the approach accessible to smallholder tropical orchards with limited instrumentation infrastructure.
Second, once trained, the model performs inference at low computational cost and could in principle be deployed on edge computing devices or cloud-based platforms. Whether this translates to reliable hourly irrigation decisions at the field scale would depend on the model’s performance under real sensor conditions, which has not been evaluated in the present study.
Third, the model’s 168-h look-back window and one-step-ahead forecasting capability suggest potential utility for supporting advance estimates of crop water demand during critical fruit development and maturation stages of durian [
12]. However, the present study evaluated the model only on meteorological data from the Visual Crossing Weather API and did not assess its performance using externally supplied weather forecasts, decision-relevant irrigation metrics, or real management outcomes. Claims about weekly irrigation planning or field-scale optimization therefore remain speculative at this stage.
Establishing the irrigation usefulness of the proposed model would require a structured validation program that goes beyond statistical accuracy metrics on a held-out test set. At minimum, four components would be necessary. First, the model’s ETc estimates would need to be compared against a conventional irrigation scheduling approach, such as FAO-56 PM computed from on-site sensor readings or a standard crop-coefficient calendar, to determine whether model-guided scheduling produces materially different irrigation prescriptions under field conditions [
45]. Second, water-use efficiency metrics, including total seasonal irrigation volume and irrigation water productivity, would need to be quantified across model-guided and conventional treatments in a replicated field trial [
46]. Third, agronomic outcomes, including durian yield, fruit quality, and physiological indicators of water stress, would need to be assessed to determine whether any reduction in estimated water demand translates to maintained or improved crop performance [
3]. Fourth, the model’s robustness under real sensor conditions would need to be evaluated, including its behaviour under sensor noise, calibration drift, and missing data typical of low-cost IoT deployments in tropical orchard environments [
21]. None of these components has been addressed in the present study, and claims about irrigation benefit therefore remain speculative pending such field-based validation.
It is also worth noting that even if the model performs well against PM-derived ETc under API-sourced meteorological inputs, additional uncertainty would arise when transitioning to on-site IoT sensor data. API-sourced gridded estimates may differ systematically from in situ measurements due to topographic and microclimate effects specific to the orchard canopy environment. The performance gap between API-sourced and sensor-sourced inputs has not been evaluated in the present study and represents an additional validation step required before operational deployment [
21,
22].
4.5. Limitations and Future Research Directions
Several limitations of the present study should be acknowledged. First, the meteorological data used in this study were obtained from the Visual Crossing Weather API for the orchard location in Chanthaburi Province, rather than from on-site IoT sensors installed directly in the orchard. Throughout this study, ETc refers exclusively to values computed via the FAO-56 Penman–Monteith equation from API-retrieved meteorological inputs, not to independently measured field evapotranspiration [
42,
44]. Accordingly, the model learns to approximate a deterministic mathematical relationship between meteorological variables and ETc, rather than forecasting an independently measured agronomic variable. Field validation using direct on-site sensor measurements and independently observed ETc values remains an essential next step before operational deployment.
Second, the model was trained and evaluated on data representing a single orchard location in Chanthaburi Province. Its generalizability to other durian-growing regions or different crop types has not been established. Transfer learning or domain adaptation strategies may be required to extend the model to new sites with limited historical data.
Third, the train–test split comprised a single chronological hold-out period of 168 h, representing less than 0.5% of the total dataset. While this design preserves temporal integrity and avoids data leakage, a rolling-window cross-validation scheme over multiple test periods would provide a more robust estimate of out-of-sample performance and model generalization across different seasons and meteorological conditions [
39,
40]. It is noted, however, that the training pool comprised approximately 36,192 sliding-window samples spanning four years of continuous hourly observations, which is considered sufficient for training the Transformer architecture employed in this study.
Several directions are identified for future research. First, validation using data from on-site IoT sensors together with independently measured ETc from lysimeter or eddy-covariance systems would confirm whether the model’s performance holds under real field conditions with sensor noise and data gaps. Second, incorporating additional input features, including vapor pressure deficit [
47], soil moisture from multi-source remote sensing [
48], and land surface temperature [
49], would enrich the model’s representation of crop water demand. Third, a systematic comparison with SARIMAX, Random Forest, and state-of-the-art architectures such as Informer [
19] and TimesNet under identical experimental conditions would clarify the relative contribution of multivariate inputs versus architectural design. Fourth, replacing the 72-h replication heuristic with NWP outputs would substantially improve the operational relevance of the multi-step ETc simulation [
19]. Fifth, integrating the model into a closed-loop automated irrigation controller and evaluating it against agronomic benchmarks in a replicated field trial represents essential steps toward practical deployment. Sixth, rolling-origin evaluation across multiple seasonal test periods, together with a formal sensitivity analysis of individual input features, would provide stronger evidence of model generalizability [
39].
5. Conclusions
This study developed and evaluated a Transformer encoder model for one-step-ahead hourly FAO-56 PM-derived crop evapotranspiration (ETc) estimation in a durian orchard in Chanthaburi Province, Eastern Thailand. Meteorological data were obtained from the Visual Crossing Weather API for the orchard location over four years, with ETc computed using the FAO–56 Penman-Monteith equation. The model was benchmarked against a SARIMA statistical baseline under identical data partitioning conditions.
On the 168-h held-out test set, the Transformer achieved an RMSE of 0.0308 mm/h, MAE of 0.0188 mm/h, and of 0.9018, compared with RMSE = 0.0717, MAE = 0.0593, and = 0.4688 for SARIMA, representing a 57.0% reduction in RMSE, a 68.3% reduction in MAE, and a 92.4% improvement in . Three design differences between the models—multivariate meteorological inputs, the 168-h self-attention window, and stacked encoder blocks—may contribute to this performance gap, though the present study does not isolate these factors, and the gains cannot be causally attributed to any single design choice.
A recursive 168-h heuristic simulation showed that the model generates physically plausible diurnal ETc patterns under a 72-h feature replication approximation. This exercise is a proof-of-concept demonstration rather than a rigorous operational forecast, as future meteorological inputs were approximated rather than independently predicted.
The study is best understood as a site-specific proof of concept. It demonstrates that a Transformer encoder can learn to approximate the FAO-56 PM functional relationship from sequential meteorological observations at hourly resolution and that the resulting model outperforms a SARIMA baseline on a single test episode at one orchard location. These results are encouraging but cannot be generalized to other sites, crop types, or seasons without further validation. Future work should prioritize field validation using on-site sensor data and independently measured ETc, ablation analysis to isolate the contribution of individual design choices, comparison with multivariate baselines such as SARIMAX, rolling-origin evaluation across multiple seasonal test periods, and integration with numerical weather prediction outputs for operationally rigorous multi-step forecasting.