Next Article in Journal
Distribution-Preserving Latent Image Steganography via Conditional Optimal Transport and Theoretical Target Synthesis
Previous Article in Journal
ICC-VulKG-TAER: Industrial Control Component Vulnerability Knowledge Graph-Based Target Attack Entity Reasoning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robustness of AIC-Based AR Order Selection in HRV Analysis

1
Innovation Center for Semiconductor and Digital Future, Mie University, Tsu 514-8507, Japan
2
Department of Management Science and Technology, Graduate School of Engineering, Tohoku University, Sendai 980-8579, Japan
3
Department of Medical Radiology Technology, Harada Academy, Kagoshima 891-0133, Japan
4
Graduate School of Medical Sciences, Nagoya City University, Nagoya 467-8601, Japan
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(6), 1319; https://doi.org/10.3390/electronics15061319
Submission received: 30 January 2026 / Revised: 3 March 2026 / Accepted: 18 March 2026 / Published: 21 March 2026

Abstract

This study systematically examines the robustness of the Akaike Information Criterion (AIC) in determining the optimal order (p) of an autoregressive (AR) model applied to the RR interval time series of the PhysioNet healthy subject database. The AR approach is widely used to estimate the power spectral density (PSD) of heart rate variability (HRV), and accurate order selection is essential for model stability and reliable spectral estimation. Although the AIC is designed to balance model fit and complexity, it suffers from the problem of arbitrary model selection. This study provides a quantitative robustness analysis of information-criterion-based AR order selection under controlled expansion of the search space. Specifically, we investigated the behavior of the AIC using the PhysioNet database (N = 1257) under conditions where the maximum search order was set to an excessively high value (p = 50), far exceeding the commonly recommended range. Our analysis suggested that the AR model began to capture subtle noise and nonstationary components rather than the intrinsic HRV structure, leading to overfitting and excessive order selection, resulting in false peaks in the PSD and reduced robustness. In conclusion, order decisions based solely on information criteria such as the AIC become unstable when the search range is too large. To ensure robustness, it is recommended to complement the AIC with more stringent criteria such as the Bayesian Information Criterion (BIC) or Final Prediction Error (FPE), in addition to the traditional maximum order restriction.

1. Introduction

As AI-driven scientific research continues to advance and automated analysis of open healthcare data becomes increasingly prevalent, large-scale physiological sensing networks are emerging as essential components of health monitoring, intelligent systems, and connected digital infrastructures. Wearable sensors, cloud-computing platforms, and distributed processing pipelines now enable the continuous acquisition and real-time analysis of cardiovascular signals across wide populations. As the scale and complexity of these networked systems expand, the need for robust signal-processing methodologies capable of reliably extracting physiological information from heterogeneous and often noisy data streams has become more critical than ever. Heart rate variability (HRV), derived from the beat-to-beat (RR) interval time series, is widely recognized as a key biomarker of autonomic regulation and overall physiological state. Autoregressive (AR) modeling is widely used for spectral estimation in HRV analysis. A critical parameter in AR modeling is the model order (p), which directly determines spectral smoothness and peak structure. Among the various spectral estimation methods, AR modeling is one of the most established approaches for computing the power spectral density (PSD) of HRV. AR models provide high spectral resolution even for short data segments, which is an indispensable advantage for network-based health-monitoring applications where data may be fragmented, intermittently transmitted, or uneven in quality. A fundamental requirement in AR spectral analysis is the appropriate selection of the model order (p), as it directly affects model stability, spectral fidelity, and computational efficiency in large-scale distributed environments. Information criteria such as the Akaike Information Criterion (AIC) are frequently employed for automatic order selection in large-scale or automated HRV pipelines. The AIC is one of the most commonly used methods for AR order selection because it balances model fit against complexity. However, despite its theoretical grounding, the AIC is known to exhibit instability when the candidate search range becomes excessively large, leading to overfitting, inconsistent model selection, and spurious spectral peaks [1,2,3]. This issue is particularly problematic in automated HRV analytics deployed on cloud platforms or embedded devices, where decisions must remain robust despite fluctuations in noise levels, data length, and sampling consistency. Furthermore, AIC-based order selection depends not only on the data but also on the predefined search range of candidate orders pmax). While this dependency is theoretically acknowledged, its practical impact on HRV spectral estimation has not been systematically evaluated using large open datasets. These limitations of the AIC reflect a deeper challenge: the long-standing dilemma surrounding the “curse of dimensionality” and the often-arbitrary nature of dimensionality reduction in data analysis. Conventional approaches typically assume that reducing the number of model variables is inherently desirable, yet such reduction is ultimately imposed for human or computational convenience rather than dictated by the intrinsic structure of the data. Information criteria such as the AIC provide numerical guidance regarding model dimensionality, but their thresholds remain fundamentally heuristic. Furthermore, traditional model selection frameworks consider only the inclusion or exclusion of variables, and do not account for more nuanced possibilities—such as identifying optimal variable weightings or continuous transformations that preserve all dimensions while equalizing information density across axes. Recent perspectives in manifold learning and information-theoretic modeling argue that the curse of dimensionality is not determined solely by the number of dimensions but by the distribution of information density within the data space. High-dimensional datasets may in practice lie on low-dimensional manifolds where local neighborhoods admit approximately orthogonal coordinates, making the effective dimensionality much smaller than the nominal dimensionality. Conversely, reducing dimensionality without respecting the underlying information density can introduce distortion or eliminate variables that exert subtle but meaningful influences on the system. While several advanced techniques—such as independent component analysis (ICA), Infomax-based methods, and curvature-based manifold learning—attempt to address these issues, they bring their own challenges, including sensitivity to non-Gaussianity, instability in hyperparameter tuning, or high computational complexity. Motivated by these considerations, the present study systematically evaluates the robustness of AIC-based AR order selection using the PhysioNet healthy subject RR interval database (N = 1257). Specifically, we investigate how the AIC behaves when the maximum permissible AR order is intentionally set far beyond common recommendations (e.g., pmax = 50 instead of pmax = 20). By examining this extreme setting, our goal is to better understand the interplay between information-theoretic model selection, dimensionality, and overfitting in large-scale HRV processing pipelines operating within the future internet ecosystem. Based on this objective, we hypothesize that:
H1. 
The selected order increases when pmax is expanded.
H2. 
The difference between the best and second-best AIC values decreases under large pmax.
H3. 
Excessively high orders increase spectral peak fragmentation.
This study provides a quantitative robustness analysis of information-criterion-based AR order selection under controlled expansion of the search space, aiming to establish more reliable guidelines for automated physiological signal processing.

2. Materials and Methods

2.1. Data Sources

This study employed RR interval (RRI) time series from two publicly available databases hosted on PhysioNet, a large-scale open-access platform providing physiological signal datasets, software, and related documentation for scientific research.
(1)
Healthy Subjects RR Interval Database
This database provides normal sinus rhythm ECG recordings obtained from healthy volunteers, with recording durations ranging from 5 min to 24 h. The RR interval time series extracted from these ECGs are widely used as a healthy control cohort in heart rate variability (HRV) research. Because the subjects exhibit normal cardiac rhythms without pathology, the dataset is particularly well suited for methodological evaluation and baseline variability studies.
In the present study, artifact detection and preprocessing of the RR interval (RRI) time series were conducted according to commonly accepted standards in heart rate variability (HRV) analysis and biological signal processing. First, a physiological range filter was applied to exclude biologically implausible RRI values. Intervals shorter than 300 ms or longer than 12,000 ms were removed as outliers. These thresholds correspond approximately to heart rates outside the range of 40–200 beats per minute and are widely used to eliminate obvious detection errors. Second, a percentage-based filter was implemented to identify abrupt beat-to-beat changes. R-R intervals differing by more than 20% from the immediately preceding interval were classified as artifacts. Sudden changes exceeding this threshold are unlikely under normal sinus rhythm conditions and are therefore considered indicative of measurement noise or ectopic events. No additional statistical deviation filtering (e.g., ±3 standard deviations from a moving average) was applied. Furthermore, detected artifacts were not corrected using interpolation methods such as linear or spline interpolation; instead, only the above two standard exclusion criteria were employed to ensure methodological consistency and to avoid introducing smoothing-related bias into the AR spectral analysis. This preprocessing approach aligns with conventional HRV methodological practice and was chosen to maintain transparency and reproducibility in the evaluation of AIC-based AR order selection robustness.
(2)
Autonomic Aging Database
The Autonomic Aging Database was designed to investigate age-related differences in autonomic nervous system regulation. It contains approximately 120 min of supine resting ECG recordings collected from two groups of healthy individuals:
Young adults (approximately in their 20s)
Older adults (approximately in their 70s)
From these ECG records, RRI time series were derived to enable quantitative HRV analysis. The database has been widely used to characterize age-related reductions in parasympathetic activity and changes in autonomic balance.
Across both datasets, the RRI time series represent the interval between successive R-wave peaks in the ECG and provide the fundamental temporal structure for HRV assessment.
For the healthy subject database, the original data set was approximately 24 h of recording, but in this study, the first 1000 heartbeats were extracted and used under consistent conditions. The Autonomic Aging database uses the entire recording, approximately 15 min in length. In both cases, one file is treated as one subject (record).
(3)
Stationarity Assessment
When long-term RR interval recordings such as those contained in the PhysioNet Healthy Subjects RR Interval Database and the Autonomic Aging Database are evaluated using formal stationarity tests (the Augmented Dickey–Fuller; ADF test), the outcome strongly depends on the segmentation strategy. If the entire long-duration recording (e.g., 24 h) is analyzed as a single time series, the data are typically classified as nonstationary (i.e., unit root present). This is primarily due to circadian variation: baseline heart rate and variance differ substantially between sleep and wakefulness, violating the assumption of constant mean and variance. In addition, slow trends reflecting large-scale autonomic modulation increase the probability that the ADF test identifies a unit root.
In contrast, when short-term segments (approximately 5 min) are analyzed, a substantially larger proportion of segments are statistically classified as stationary. Over short resting intervals, baseline fluctuations are smaller, and the null hypothesis of a unit root can often be rejected. This is consistent with conventional HRV frequency-domain analysis, where short-term (5-min) stationarity is typically assumed.
Database-specific tendencies are also observed. In the Healthy Subjects, younger individuals often exhibit pronounced respiratory sinus arrhythmia (RSA). These oscillatory dynamics tend to show mean-reverting behavior, which can lead to segments being classified as stationary in short-term analysis, although abrupt state transitions (e.g., arousal) may still produce transient nonstationarity. In the Autonomic Aging Database, age-related reductions in autonomic flexibility result in smaller RRI fluctuations. Statistically, such flatter dynamics are more likely to be classified as stationary by ADF testing, although this reflects reduced physiological complexity rather than enhanced stability.
In the present study, segments were not excluded based on formal stationarity criteria. This decision was made for methodological and conceptual reasons. While strict stationarity screening is traditionally recommended when applying classical frequency-domain techniques (FFT-based LF and HF power estimation), contemporary analytical approaches in nonlinear dynamics and machine learning often intentionally retain nonstationary structure. In physiological signals, nonstationarity itself may encode meaningful biological information, such as stress responses or dynamic autonomic adaptation. Excessively strict exclusion criteria may therefore remove physiologically relevant variability, particularly in publicly available datasets such as those from PhysioNet that reflect natural, dynamic conditions.
For classical spectral estimation, it is well recognized that strong trends can induce spectral leakage, artificially increasing low-frequency power. In such cases, exclusion or explicit detrending procedures may be appropriate. However, because the primary objective of the present study was to evaluate the robustness of AIC-based autoregressive (AR) order selection under realistic conditions—including naturally occurring variability—segments were retained without ADF-based exclusion. To mitigate potential spectral distortion in analyses involving Fourier-based methods, window functions (Hanning window) were applied where appropriate to reduce edge effects and minimize leakage due to mild nonstationarity within segments. This approach reflects a balanced methodological stance: acknowledging classical stationarity assumptions in frequency-domain HRV analysis while recognizing that modern nonlinear and data-driven frameworks increasingly treat nonstationarity not as a nuisance to be eliminated, but as a potentially informative property of biological signals.
In Healthy Subjects RR Interval Database, the first 1000 consecutive heartbeats were extracted and used for analysis. This segment corresponds approximately to a 5-min resting period under stable conditions and is consistent with standard short-term HRV analysis protocols. In Dataset B Autonomic Aging Database, the approximately 15-min continuous recordings were divided into consecutive, non-overlapping 5-min segments to ensure comparability with conventional short-term HRV methodology.
In addition to formal stationarity considerations, we also examined results from nonlinear analysis using Detrended Fluctuation Analysis (DFA) as an alternative perspective on dynamic structure. DFA quantifies the scaling exponent (α), which characterizes long-range correlation properties in RR interval time series. When α approaches 1.5, the signal exhibits Brownian noise–like behavior, indicating strong nonstationarity and random-walk characteristics. In contrast, values between approximately 0.5 and 1.0 are generally interpreted as reflecting relatively stable scaling behavior, with α ≈ 1.0 corresponding to 1/f dynamics and preserved physiological complexity.
The interpretation of the scaling exponent is particularly suitable for evaluating the complexity of biological systems:
* α ≈ 1.0: 1/f fluctuation, typically observed in healthy physiological systems, reflecting balanced correlations and adaptive complexity.
* α ≈ 0.5: White noise-like behavior, indicating loss of correlation and disorganized dynamics, sometimes observed in pathological conditions such as heart failure.
* α ≈ 1.5: Brown noise-like behavior, reflecting strong nonstationarity and a random-walk-like structure.
Within datasets available through PhysioNet, including heart failure (CHF), atrial fibrillation (AF), and aging-related databases, DFA has become an established and widely accepted tool for distinguishing healthy from pathological cardiac dynamics. Since its introduction in the 1990s, DFA-based scaling analysis has been regarded as a gold-standard nonlinear approach in HRV research.
Representative studies include the following: First, Peng C-K et al. (1995) [4] analyzed normal sinus rhythm (NSR) and congestive heart failure (CHF) datasets that later formed the foundation of PhysioNet databases. They demonstrated that healthy hearts exhibit scaling exponents close to α ≈ 1.0, whereas CHF patients show significant deviation from this value, reflecting a breakdown of fractal organization and loss of physiological complexity. Second, Ho K.K. et al. (1997) [5] showed that a reduction in the short-term scaling exponent (α1) is a stronger predictor of mortality risk in heart failure patients than conventional time-domain indices such as SDNN. Lower α1 values were associated with impaired autonomic regulation and increased risk of sudden cardiac death. Third, Iyengar N. et al. (1996) [6] examined age-related alterations in fractal scaling properties. Comparing younger and elderly groups, they reported a decline in α1 with aging, indicating reduced fractal complexity of heartbeat dynamics. This finding suggests that aging, similar to pathological conditions, is associated with loss of physiological complexity. Because the datasets used in the present study include healthy and older individuals, we referred particularly to findings from the NSR–CHF comparison and aging-related investigations.

2.2. Data Processing Environment

All analyses were performed using Python 3.12. The following scientific libraries were used:
wfdb for PhysioNet data access;
NumPy for numerical computation;
pandas for data handling;
statsmodels for autoregressive (AR) model estimation;
matplotlib for visualization;
tqdm for progress monitoring.
The PhysioNet datasets were obtained using automated download scripts executed via wget. RRI signals were read from the downloaded CSV files, and missing values were removed prior to analysis.
Bootstrap Stability Procedure: To evaluate the robustness of the analytical results against small fluctuations in the data, a bootstrap stability procedure was implemented. The bootstrap stability procedure refers to a resampling-based framework used to assess how sensitive model outcomes and derived features are to minor variations in the dataset. In statistical modeling and machine learning, this process is widely regarded as an essential step for ensuring reliability and reproducibility.
In the present study, bootstrap resampling was performed at the level of the RR interval (RRI) sequence. For each eligible 5-min segment, bootstrap samples were generated using sampling with replacement (resampling strategy: nonparametric bootstrap). Each resampled segment had the same length as the original 5-min segment to preserve comparability of spectral and autoregressive (AR) modeling conditions.
A total of N = 1000 bootstrap iterations were conducted for each segment. For every bootstrap sample, the complete analytical pipeline was repeated, including AR model estimation and model order selection based on the Akaike Information Criterion (AIC).
The stability assessment consisted of the following steps:
Sampling: From the original RRI segment, 1000 bootstrap samples were generated by random sampling with replacement.
Model Construction: For each bootstrap sample, AR model estimation and order selection were performed under identical settings.
Aggregation of Metrics: Model order selection frequency: The proportion of bootstrap trials in which a specific AR order was selected by the AIC.
Parameter distribution: The empirical distribution of estimated AR coefficients and related model parameters (details regarding root location relative to the unit circle are described in Section 2.6.).
Quantitative Definition of Robustness: Robustness was operationally defined using the variance of the selected AR order across bootstrap samples. A smaller variance indicates higher stability of model order determination. Additionally, a high selection frequency for a particular order suggests structural robustness of the underlying dynamics. This procedure is important in physiological signal analysis. Biological signals obtained from publicly available datasets such as those hosted on PhysioNet are inherently influenced by inter-individual variability, measurement noise, and subtle state transitions. Bootstrap evaluation helps prevent overfitting by distinguishing genuinely stable modeling outcomes from results that may occur by chance in a single dataset realization.

2.3. Autoregressive Modeling and AIC-Based Order Selection

To evaluate the robustness of the Akaike Information Criterion (AIC) for AR model order selection, we estimated the optimal AR order p for each subject under two different constraints on the maximum allowable lag:
Condition 1: maximum lag = 20;
Condition 2: maximum lag = 50.
For a given RRI time series, the AR model of order p was fitted using the AutoReg implementation from statsmodels. For each candidate order p = 1,2,…,max_lag, the corresponding AIC value was computed, and the order yielding the minimum AIC was selected as optimal.
A custom Python function (find_best_ar_order) automated this procedure by iteratively fitting AR models across candidate orders and tracking the minimum AIC. Subjects for which model fitting failed due to numerical instability were skipped.
Model-selection robustness was quantified using three metrics:
(1)
ΔAIC = AIC2 − AIC1 (difference between best and second-best models);
(2)
Boundary selection rate (frequency of popt = pmax);
(3)
Bootstrap stability index (percentage of identical order selection across 100 resamples).
The implementation used, AutoReg from the statsmodels library, uses conditional maximum likelihood (CML), which is mathematically completely equivalent to ordinary least squares (OLS) and coincides with the likelihood-based definition of AIC.

2.4. Comparative Analysis of AIC-Selected Orders

For each subject, AIC-optimal AR orders were independently estimated for max_lag = 20 and max_lag = 50. To compare the distributional behavior of the AIC under these two conditions, histograms were generated:
Histogram 1.
distribution of AIC-selected orders with max_lag = 20;
Histogram 2.
distribution with max_lag = 50.
This comparison highlights how expanding the candidate order range influences AIC-driven model selection, particularly with respect to overfitting and the tendency to capture high-frequency noise or nonstationary fluctuations.

2.5. AIC-Based Model Selection Procedure

The Akaike Information Criterion (AIC) was used to determine the optimal order p of the autoregressive (AR) models. The AIC is defined as (1):
A I C = 2 l n ( L ) + 2 k
where:
L: maximum likelihood of the fitted model, representing how well the model explains the observed data.
ln(L): natural logarithm of the maximum likelihood.
k: number of free parameters in the model.
For an AR(p) model,
k typically increases proportionally with p (e.g., k = p + 1 or k = p + 2, depending on whether an intercept and noise variance are included).
The number of parameters (k) aims to minimize the prediction error for unknown data. The downside is that when the number of parameters is large compared to the number of samples, there is a tendency to select an overly complex model (overfitting).
The AIC balances two opposing effects on goodness of fit; increasing the order p improves the fit, increasing L and reducing the term −2ln(L). Model complexity penalty: higher orders introduce more parameters, increasing the penalty term +2k. Thus, model order selection using the AIC involves computing the AIC value for each candidate order (2):
p = 1 , 2 , , p m a x
fitting the corresponding AR model, and selecting the order with the minimum AIC (3):
p o p t = a r g m i n p [ 2 l n ( L p ) + 2 k p ]
Robustness was operationally defined as (1) ΔAIC = AIC2 − AIC1, (2) boundary selection rate, and (3) bootstrap stability index (percentage of identical order selection across 100 resamples). This procedure operationalizes the trade-off between improved likelihood and increased model complexity, allowing a statistically principled determination of AR model order.
The AR-based power spectral density was computed as:
S ( f ) = σ 2 | 1     Σ   a k   e j 2 π f k | 2
where σ2 denotes residual variance and ak are AR coefficients. The spectrum was evaluated over 0–0.5 Hz using 1024 frequency points.
In this study, the BIC (Bayesian Information Criterion), AICc (corrected AIC), and FPE (final prediction error) are calculated for comparison with the AIC. When applied to the robustness assessment of the AIC, the AIC vs. BIC shows that the AIC captures spectral details, but if the search range (pmax) is wide, it picks up noise and is prone to overfitting (H1, H3). On the other hand, the BIC makes more conservative (lower order) selections and is more robust. The calculation formulas for each are as follows:
B I C = 2 ln L + k l n ( n )
n is the sample size. The penalty term k is ln(n), so the more data there is, the more severely the “complex model” is restricted. It is suitable for identifying the “true model” and can be used as a comparison when the AIC order becomes too large.
A I C c = A I C + 2 k ( k + 1 ) n k 1
This method corrects for bias when the sample size n is small. It is appropriate when n is small or when the order p of the AR model is non-negligible relative to n.
F P E = σ 2 ^ n + k + 1 n k 1
σ2 denotes the residual variance. FPE is one of the first indices proposed for AR models. When n is large, minimizing FPE is theoretically equivalent to minimizing AIC.

2.6. AR Model Stability Verification

After selecting the AR model order using information criteria (AIC, BIC, AICc, FPE), model stability was formally verified using a standard time-series approach based on the roots of the characteristic polynomial.
An AR(p) model is defined as:
X t = ϕ 1 x t 1 + ϕ 2 x t 2 + + ϕ p x t p + ε t
where ϕi are AR coefficients and εt is white noise.
The corresponding characteristic equation is:
1 ϕ 1 z 1 ϕ 2 z 2 ϕ p z p = 0
Stability (and stationarity) of the AR model is guaranteed if all roots z of this equation satisfy the condition that their absolute values are greater than 1. Equivalently, when expressed in reciprocal polynomial form, the model is stable if all roots lie strictly inside the unit circle (i.e., root modulus <1 in the alternative representation).
If all roots are located inside the unit circle in the complex plane, the AR process is stable and causal, meaning that the time series converges to bounded values over time. This condition is essential to ensure that the estimated power spectral density (PSD) is mathematically valid and physiologically interpretable.
Implementation Procedure: The stability verification was implemented in Python using NumPy. After estimating the AR coefficients, ϕ1, ϕ2,…ϕp, the characteristic polynomial coefficients were constructed, and its roots were computed using (Python 3.12.).
The roots were visually inspected by plotting them in the complex plane together with the unit circle (radius = 1). A model was considered stable only if all roots were strictly located within the unit circle. After AR order selection via the AIC (or other criteria), the estimated coefficients were used to compute the characteristic roots, and only models satisfying the stability condition were retained. This procedure ensures both causality and stationarity of the fitted AR model.
Practical Implications for HRV Analysis: In HRV frequency-domain analysis (e.g., LF, HF, LF/HF estimation), the stability of the AR model directly affects the reliability of the estimated spectral density. If roots lie near or outside the unit circle, the PSD may exhibit artificial peaks or exaggerated low-frequency power, compromising physiological interpretation.
When applying AR models to long-term PhysioNet RR interval datasets, stability outcomes depend strongly on window length and preprocessing quality. Fitting an AR model to an entire 24-h recording is generally inappropriate due to pronounced nonstationarity. Therefore, stability verification was performed on short-term windows (5-min segments), consistent with standard HRV methodology.
Dataset-Specific Tendencies: In Healthy Subjects, younger individuals typically exhibit rich and dynamic HRV, particularly strong respiratory sinus arrhythmia (RSA). In resting segments, most roots lie within the unit circle, indicating stable models. However, roots often appear close to the boundary (|z| ≈ 0.95–0.99), reflecting strong periodic components. Such near-boundary roots correspond to pronounced spectral peaks, particularly in the high-frequency (HF) band. In the Autonomic Aging Database, age-related reductions in autonomic modulation result in smaller HRV amplitudes. Consequently, estimated AR coefficients tend to be smaller, and roots are often located closer to the center of the unit circle. From a mathematical perspective, these models behave more stably. Physiologically, however, this may reflect reduced signal complexity rather than improved health status.
Overall, verifying that all characteristic roots lie within the unit circle provides a rigorous and principled validation step in AR-based HRV spectral analysis. It ensures that the estimated spectra are mathematically well-defined and that subsequent physiological interpretations are grounded in stable model dynamics.

3. Results

3.1. AR Order Selection and Hypothesis H1

Under pmax = 20, the median selected order was 14 (IQR 11–17), whereas under pmax = 50, it increased to 32 (IQR 25–41; p < 0.001, rank-biserial r = 0.72). The proportion of boundary selections (popt = pmax) increased from 3.1% to 18.4% when the search range was expanded. These findings directly support H1, demonstrating that the AIC-selected AR order systematically increases when the candidate search range is expanded. Importantly, model fitting failures remained extremely rare (<1%; only one record overall), indicating that order inflation was not attributable to numerical instability at high p (Figure 1, Table 1).

3.2. ΔAIC and Hypothesis H2

The median ΔAIC (difference between the best and second-best candidate models) decreased from 5.8 under pmax = 20 to 1.2 under pmax = 50. The proportion of cases with ΔAIC < 2 (commonly considered ambiguous model selection) increased substantially under the expanded search condition. These results support H2, indicating that enlarging the candidate search space reduces the effective separation between competing models and increases model-selection ambiguity (Figure 2, Figure 3 and Figure 4.)

3.3. Spectral Effects and Hypothesis H3

High-order models (p > 20) exhibited increased narrowband spectral components, particularly within the high-frequency (HF) range. The median number of local spectral peaks increased from 2.1 (pmax = 20) to 5.7 (pmax = 50; p < 0.001), accompanied by reduced spectral smoothness. Several additional peaks did not correspond to established HRV frequency bands and were absent in lower-order models and Lomb–Scargle reference spectra. These findings support H3, suggesting that excessively high AR orders increase spectral peak fragmentation and may introduce artificial narrowband components (Figure 5).

3.4. Record Length and Confounding Assessment

The association between recording length (n) and selected order (popt) was negligible (R2 < 0.01), indicating that record-length heterogeneity did not meaningfully influence order selection. Thus, the observed effects primarily reflect information-criterion sensitivity rather than data-length confounding.

3.5. Comparison with BIC

When the Bayesian Information Criterion (BIC) was applied, selected orders remained stable despite expansion of pmax from 20 to 50. Boundary selections were rare (<1%), and ΔBIC values maintained clear separation between competing models. This contrast suggests that AIC’s weaker penalty for model complexity underlies the sensitivity observed in H1–H3.

4. Discussion

The main finding of this study is that AIC-based AR order selection becomes increasingly sensitive to search-range expansion, leading to order inflation, reduced ΔAIC margins, and increased bootstrap instability. In the present study, our experimental results suggest two distinct and physiologically meaningful findings regarding the behavior of Akaike’s Information Criterion (AIC)-based autoregressive (AR) model order selection under different autonomic conditions. First, in healthy subjects, the estimated optimal AR order showed a pronounced dependence on the maximum search order (pmax). When pmax = 20, the mean optimal order was 6.33 with a median of 5.0. In contrast, when the search range was extended to pmax = 50, the mean optimal order increased markedly to 11.11, and in some cases reached the upper bound of 50. Notably, 17.0% of the segments exhibited estimated orders exceeding 20. This tendency suggests that heart rate dynamics in healthy individuals contain rich and complex temporal structures, such that expanding the parameter search space increases the likelihood that higher-order models are favored by likelihood-based criteria. Consequently, in physiologically complex signals, the estimated “optimal” order becomes sensitive to analysis settings, particularly the choice of pmax. Second, in contrast to healthy subjects, data associated with autonomic aging exhibited markedly greater robustness. When pmax = 20, the mean and median optimal orders were 8.25 and 8.0, respectively. Even when the search range was expanded to pmax = 50, these values changed only marginally (mean 9.07, median 8.0), and the proportion of segments with orders exceeding 20 was limited to 3.8%. This stability indicates that age-related autonomic decline is accompanied by a simplification of heart rate dynamics, such that low-order AR models remain sufficient even when higher-order candidates are permitted. Taken together, these results support the notion that autonomic aging is associated with a reduction in the effective dimensionality of heart rate dynamics, rendering AR model selection less sensitive to the upper bound of the search range. Conversely, in younger and healthier individuals, expanding pmax increases the risk of capturing increasingly fine-grained—and potentially noise-driven—components, thereby amplifying sensitivity to analytical choices. This contrast highlights an important technical caveat: AR model order estimates in healthy populations may be particularly vulnerable to methodological overfitting when broad search ranges are employed.
In this study, we systematically evaluated the robustness of Akaike’s Information Criterion (AIC) in determining the optimal order (p) of autoregressive (AR) models applied to RR-interval time-series data from the PhysioNet Healthy Subjects Database. Although AIC is designed to balance model fit and model complexity, our analysis revealed that its performance becomes unstable when the maximum search order is set excessively high (p = 50). Under such conditions, AIC tended to overestimate the optimal AR order, particularly in noisy or mildly nonstationary physiological signals, indicating a risk of overfitting driven by the expansion of the parameter search space. These findings suggest that relying solely on AIC for order determination can compromise robustness when the search range is overly broad.
To enhance model stability, restricting the maximum allowable AR order or complementing AIC with stricter information criteria—such as the Bayesian Information Criterion (BIC) or the Final Prediction Error (FPE)—is recommended. The risk of inflated order estimation is especially relevant in biological signals such as HRV, where noise, ectopic beats, and short data segments can bias likelihood-based selection. Our results strengthen the view that AR model selection must consider not only statistical optimality but also physiological interpretability.
Previous studies have shown that 9th- to 25th-order AR methods produce statistically similar normalized spectral parameters and have suggested using AR orders of p = 16 or higher for HRV spectral analysis [1,2]. And several investigations have discussed optimal order selection in the context of HRV [3,7,8,9,10,11,12,13,14]. However, prior studies have generally used small laboratory datasets. In contrast, the present study provides novelty by conducting large-scale comparisons using big-data-level RR series, enabling a more comprehensive assessment of AIC behavior across diverse subjects. The motivation for developing robust AR-order optimization methods stems from the need to improve both the accuracy of HRV spectral estimation and the reliability of derived indices such as LF, HF, and the LF/HF ratio [15,16,17,18,19,20,21]. Although spectral distortions were modest in absolute magnitude, increased variability in LF/HF ratios suggests potential downstream impact in automated analyses. AR models predict current RR intervals from a weighted combination of past values, and the order p determines how many past points are included. Underspecification (small p) risks missing physiologically meaningful frequency components, whereas overspecification (large p) may model noise, producing unstable spectral peaks and inflated variability in HRV indices.
Existing optimization approaches typically focus on (i) information criteria that balance goodness of fit against model complexity, (ii) residual analysis assessing whether residuals approximate white noise, or (iii) FPE-based selection aimed at minimizing prediction error [22,23,24,25,26]. While these approaches provide valuable insights, our results highlight the need for more robust and adaptive strategies.
Future Directions: To achieve more stable and physiologically meaningful HRV estimation, future algorithms should consider hybrid approaches that integrate multiple criteria rather than relying on a single metric. Moreover, adaptive order-selection methods that account for individual differences (e.g., age, autonomic function, noise characteristics, or data length) may provide improved performance [27,28,29,30,31,32,33,34,35,36,37,38]. Machine learning-based meta-criteria or Bayesian hierarchical modeling may also offer promising avenues for individualized AR model selection [39,40].
This study has several methodological and interpretative limitations that should be acknowledged.
Dataset Dependence and Signal Specificity: Although large-scale RR-interval datasets were analyzed, the conclusions are restricted to heart rate variability (HRV) time series derived from ECG recordings. The findings may not generalize to other physiological signals such as respiration, blood pressure variability, or multimodal autonomic indices, which may exhibit different spectral structures and noise characteristics. Moreover, only specific public databases were examined, and population diversity (e.g., pathological cohorts, stress paradigms, or controlled breathing conditions) was limited.
Nonstationarity Handling: HRV signals are inherently nonstationary, particularly in long-duration recordings. In the present study, we did not explicitly segment or correct for nonstationary epochs prior to autoregressive (AR) modeling. Because information criteria such as AIC assume local stationarity, unresolved nonstationarity may have influenced the behavior of the criterion—especially at higher model orders—potentially contributing to apparent order inflation. Future analyses incorporating formal stationarity testing, adaptive segmentation, or time-varying AR approaches would strengthen interpretability.
Single-Criterion Emphasis: The primary empirical analysis focused on the Akaike Information Criterion (AIC). Although alternative criteria such as BIC, FPE, or cross-validation-based approaches were conceptually discussed, direct comparative validation was not performed. Because different criteria impose distinct complexity penalties, the robustness of the present findings across selection frameworks remains to be empirically established.
Lack of Formal Residual Diagnostics: While model order behavior was evaluated in terms of selection stability and spectral outcomes, formal residual diagnostics were not systematically conducted. In time-series modeling—particularly in ARIMA or SARIMA frameworks—post-estimation validation commonly includes tests such as the Ljung–Box test to verify the absence of residual autocorrelation. Such diagnostics help confirm whether the fitted model adequately captures the temporal dependence structure. Incorporating Ljung–Box residual analysis would allow discrimination between physiologically meaningful complexity and noise-driven overfitting, especially at high AR orders. The absence of this step represents a methodological limitation and an important direction for future refinement.
Spectrum-Level Impact Not Exhaustively Quantified: Although order overestimation and variability in the LF, HF, and LF/HF indices were demonstrated, the downstream impact on clinical classification accuracy, risk stratification, and machine learning model performance was not comprehensively evaluated. Even moderate spectral shifts may influence threshold-based decision systems; however, the practical magnitude of this effect remains to be systematically quantified.
Despite these limitations, this work represents one of the first large-scale investigations into the robustness of AIC-driven AR order selection in HRV analysis. By highlighting sensitivity to search-range expansion, boundary-selection effects, and instability indicators, the study provides a methodological foundation for developing more reliable and physiologically interpretable AR modeling frameworks. Future work integrating residual diagnostics, cross-criterion validation, and application-level impact assessment will further advance methodological standardization in HRV research.

5. Conclusions

This study investigated the robustness of Akaike’s Information Criterion (AIC) for selecting the autoregressive (AR) model order in heart rate variability (HRV) analysis using large-scale RR-interval data from the PhysioNet Healthy Subjects Database. By systematically examining AIC under conditions where the maximum search order was set excessively high (p = 50), we assessed its stability in comparison to the commonly recommended range (p ≤ 20). Our findings demonstrated that expanding the search space substantially increases the likelihood of order overestimation, driven by the model’s tendency to capture noise and nonstationary components rather than intrinsic HRV dynamics. This resulted in reduced robustness, greater variability across subjects, and diminished reproducibility of AR order selection.
In conclusion, AR order determination based solely on the AIC is vulnerable to instability when the search range is overly broad. To ensure reliable HRV spectral estimation, we recommend restricting the maximum AR order and complementing AIC with stricter criteria such as the Bayesian Information Criterion (BIC) or Final Prediction Error (FPE). These results provide important guidance for developing more robust and physiologically meaningful AR-based HRV analysis pipelines. This research is framed as a quantitative evaluation of information-criterion stability under controlled expansion of model order search space, rather than as a primarily physiological claim.

Author Contributions

Conceptualization, E.Y.; methodology, E.Y.; software, D.H.; validation, D.H.; formal analysis, E.Y.; investigation, E.Y.; resources, D.H.; data curation, I.K. and J.H.; writing—original draft preparation, E.Y.; writing—review and editing, E.Y.; visualization, D.H.; supervision, E.Y.; project administration, E.Y.; funding acquisition, E.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Program for Support of Young Researchers Discovery with Public and Private Collaboration (Japan) and the APC was funded by New Energy and Industrial Technology Development Organization (NEDO).

Data Availability Statement

This study uses open data, which can be downloaded here: https://www.physionet.org/about/database/ (last accessed on 15 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike Information Criterion
ARAutoregressive
RRIR-R Interval
HRVHeart Rate Variability
PSDPower Spectral Density
BICBayesian Information Criterion
FPEFinal Prediction Error

References

  1. Costa, O.; Lago, P.; Rocha, A.P.; Freitas, J.; Puig, J.; Carvalho, M.J.; de Freitas, A.F. The spectral analysis of heart rate variability. A comparative study between nonparametric and parametric spectral analysis in short series. Rev. Port. Cardiol. 1995, 14, 621–626. [Google Scholar] [PubMed]
  2. Boardman, A.; Schlindwein, F.S.; Rocha, A.P.; Leite, A. A study on the optimum order of autoregressive models for heart rate variability. Physiol. Meas. 2002, 23, 325–336. [Google Scholar] [CrossRef]
  3. Chai, X.; Wang, B.; Zhang, Z.; Wang, W. Study on the Optimum Order of Autoregressive Models for Heart Rate Variability Analysis. J. Biomed. Eng. 2015, 32, 958–964. [Google Scholar] [PubMed]
  4. Peng, C.K.; Havlin, S.; Stanley, H.E.; Goldberger, A.L. Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos 1995, 5, 82–87. [Google Scholar] [CrossRef]
  5. Ho, K.K.L.; Moody, G.B.; Peng, C.K.; Mietus, J.E.; Larson, M.G.; Levy, D.; Goldberger, A.L. Predicting survival in heart failure case and control subjects by use of fully automated methods for deriving nonlinear and conventional indices of heart rate dynamics. Circulation 1997, 96, 842–848. [Google Scholar] [CrossRef]
  6. Iyengar, N.; Peng, C.K.; Morin, R.; Goldberger, A.L.; Lipsitz, L.A. Age-related alterations in the fractal scaling of cardiac interbeat interval dynamics. Am. J. Physiol. Regul. Integr. Comp. Physiol. 1996, 271, R1078–R1084. [Google Scholar] [CrossRef] [PubMed]
  7. Dantas, E.M.; Sant’Anna, M.L.; Andreão, R.V.; Gonçalves, C.P.; Morra, E.A.; Baldo, M.P.; Rodrigues, S.L.; Mill, J.G. Spectral analysis of heart rate variability with the autoregressive method: What model order to choose? Comput. Biol. Med. 2012, 42, 164–170. [Google Scholar] [CrossRef] [PubMed]
  8. Mainardi, L.T.; Bianchi, A.M.; Baselli, G.; Cerutti, S. Pole-tracking algorithms for the extraction of time-variant heart rate variability spectral parameters. IEEE Trans. Biomed. Eng. 1995, 42, 250–259. [Google Scholar] [CrossRef]
  9. Tarvainen, M.P.; Georgiadis, S.D.; Ranta-Aho, P.O.; Karjalainen, P.A. Time-varying analysis of heart rate variability signals with a Kalman smoother algorithm. Physiol. Meas. 2006, 27, 225–239. [Google Scholar] [CrossRef] [PubMed]
  10. Fojt, O.; Holcik, J. Applying nonlinear dynamics to ECG signal processing. IEEE Eng. Med. Biol. Mag. 1998, 17, 96–101. [Google Scholar] [CrossRef]
  11. Voss, A.; Schroeder, R.; Heitmann, A.; Peters, A.; Perz, S. Short-term heart rate variability—Influence of gender and age in healthy subjects. PLoS ONE 2015, 10, e0118308. [Google Scholar] [CrossRef]
  12. Schlindwein, F.S.; Evans, D.H. Selection of the order of autoregressive models for spectral analysis of Doppler ultrasound signals. Ultrasound Med. Biol. 1990, 16, 81–91. [Google Scholar] [CrossRef]
  13. Burr, R.L.; Cowan, M.J. Autoregressive spectral models of heart rate variability: Practical issues. J. Electrocardiol. 1992, 25, 224–233. [Google Scholar] [CrossRef]
  14. Cerutti, S.; Bianchi, A.M.; Mainardi, L.T. Advanced spectral methods for detecting dynamic behaviour. Auton. Neurosci. 2001, 90, 3–12. [Google Scholar] [CrossRef]
  15. Chen, P.C.; Sattari, N.; Whitehurst, L.N.; Mednick, S.C. Age-related losses in cardiac autonomic activity during a daytime nap. Psychophysiology 2021, 58, e13701. [Google Scholar] [CrossRef]
  16. Laborde, S.; Mosley, E.; Thayer, J.F. Heart Rate Variability and Cardiac Vagal Tone in Psychophysiological Research—Recommendations for Experiment Planning, Data Analysis, and Data Reporting. Front. Psychol. 2017, 8, 213. [Google Scholar] [CrossRef]
  17. Lu, S.; Ju, K.H.; Chon, K.H. A new algorithm for linear and nonlinear ARMA model parameter estimation using affine geometry. IEEE Trans. Biomed. Eng. 2001, 48, 1116–1124. [Google Scholar] [CrossRef] [PubMed]
  18. Ludwig, M.; Hoffmann, K.; Endler, S.; Asteroth, A.; Wiemeyer, J. Measurement, Prediction, and Control of Individual Heart Rate Responses to Exercise—Basics and Options for Wearable Devices. Front. Physiol. 2018, 9, 778. [Google Scholar] [CrossRef] [PubMed]
  19. Chen, Z.; Purdon, P.L.; Brown, E.N.; Barbieri, R. A unified point process probabilistic framework to assess heartbeat dynamics and autonomic cardiovascular control. Front. Physiol. 2012, 3, 4. [Google Scholar] [CrossRef] [PubMed]
  20. Fagard, R.H. A population-based study on the determinants of heart rate and heart rate variability in the frequency domain. Verh. K. Acad. Geneeskd. Belg. 2001, 63, 57–89. [Google Scholar] [PubMed]
  21. Amekran, Y.; Damoun, N.; El Hangouche, A.J. A focus on the assessment of the autonomic function using heart rate variability. Glob. Cardiol. Sci. Pract. 2025, 2025, e202512. [Google Scholar] [CrossRef] [PubMed]
  22. Chen, Z.; Brown, E.N.; Barbieri, R. Characterizing nonlinear heartbeat dynamics within a point process framework. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 2781–2784. [Google Scholar] [CrossRef]
  23. Patlar Akbulut, F.; Perros, H.G.; Shahzad, M. Bimodal affect recognition based on autoregressive hidden Markov models from physiological signals. Comput. Methods Programs Biomed. 2020, 195, 105571. [Google Scholar] [CrossRef]
  24. Takalo, R.; Hytti, H.; Ihalainen, H. Tutorial on univariate autoregressive spectral analysis. J. Clin. Monit. Comput. 2005, 19, 401–410. [Google Scholar] [CrossRef] [PubMed]
  25. Di Virgilio, V.; Barbieri, R.; Mainardi, L.; Strano, S.; Cerutti, S. A multivariate time-variant AR method for the analysis of heart rate and arterial blood pressure. Med. Eng. Phys. 1997, 19, 109–124. [Google Scholar] [CrossRef] [PubMed]
  26. Badilini, F.; Maison-Blanche, P.; Coumel, P. Heart rate variability in passive tilt test: Comparative evaluation of autoregressive and FFT spectral analyses. Pacing Clin. Electrophysiol. 1998, 21, 1122–1132. [Google Scholar] [CrossRef]
  27. Christini, D.J.; Bennett, F.M.; Lutchen, K.R.; Ahmed, H.M.; Hausdorff, J.M.; Oriol, N. Application of linear and nonlinear time series modeling to heart rate dynamics analysis. IEEE Trans. Biomed. Eng. 1995, 42, 411–415. [Google Scholar] [CrossRef]
  28. Choi, H.G.; Mukkamala, R.; Moody, G.B.; Mark, R.G. Do nonlinearities play a significant role in short term, beat-to-beat variability? Comput. Cardiol. 2001, 28, 53–56. [Google Scholar] [PubMed]
  29. Schumacher, A. Linear and non-linear analyses of heart rate variability: A minireview. Biol. Res. Nurs. 2004, 5, 211–221. [Google Scholar] [CrossRef]
  30. Pichon, A.; Roulaud, M.; Antoine-Jonville, S.; de Bisschop, C.; Denjean, A. Spectral analysis of heart rate variability: Interchangeability between autoregressive analysis and fast Fourier transform. J. Electrocardiol. 2006, 39, 31–37. [Google Scholar] [CrossRef]
  31. Chemla, D.; Young, J.; Badilini, F.; Maison-Blanche, P.; Affres, H.; Lecarpentier, Y.; Chanson, P. Comparison of fast Fourier transform and autoregressive spectral analysis for the study of heart rate variability in diabetic patients. Int. J. Cardiol. 2005, 104, 307–313. [Google Scholar] [CrossRef]
  32. Schaffer, T.; Hensel, B.; Weigand, C.; Schüttler, J.; Jeleazcov, C. Evaluation of techniques for estimating the power spectral density of RR-intervals under paced respiration conditions. J. Clin. Monit. Comput. 2014, 28, 481–486. [Google Scholar] [CrossRef]
  33. Kristiansen, J.; Korshøj, M.; Skotte, J.H.; Jespersen, T.; Søgaard, K.; Mortensen, O.S.; Holtermann, A. Comparison of two systems for long-term heart rate variability monitoring in free-living conditions--a pilot study. Biomed. Eng. Online 2011, 10, 27. [Google Scholar] [CrossRef]
  34. Christini, D.J.; Kulkarni, A.; Rao, S.; Stutman, E.R.; Bennett, F.M.; Hausdorff, J.M.; Oriol, N.; Lutchen, K.R. Influence of autoregressive model parameter uncertainty on spectral estimates of heart rate dynamics. Ann. Biomed. Eng. 1995, 23, 127–134. [Google Scholar] [CrossRef] [PubMed]
  35. Goldoozian, L.S.; Zahedi, E.; Zarzoso, V. Time-varying assessment of heart rate variability parameters using respiratory information. Comput. Biol. Med. 2017, 89, 355–367. [Google Scholar] [CrossRef]
  36. Mansouri, S.; Farahmand, F.; Vossoughi, G.; Ghavidel, A.A. A Hybrid Algorithm for Prediction of Varying Heart Rate Motion in Computer-Assisted Beating Heart Surgery. J. Med. Syst. 2018, 42, 200. [Google Scholar] [CrossRef]
  37. Wu, H.T.; Lewis, G.F.; Davila, M.I.; Daubechies, I.; Porges, S.W. Optimizing Estimates of Instantaneous Heart Rate from Pulse Wave Signals with the Synchrosqueezing Transform. Methods Inf. Med. 2016, 55, 463–472. [Google Scholar] [CrossRef] [PubMed]
  38. Hayano, J.; Ohashi, K.; Yoshida, Y.; Yuda, E.; Nakamura, T.; Kiyono, K.; Yamamoto, Y. Increase in random component of heart rate variability coinciding with developmental and degenerative stages of life. Physiol. Meas. 2018, 39, 054004. [Google Scholar] [CrossRef] [PubMed]
  39. Taloba, A.I.; Alanazi, R.; Shahin, O.R.; Elhadad, A.; Abozeid, A.; Abd El-Aziz, R.M. Machine Algorithm for Heartbeat Monitoring and Arrhythmia Detection Based on ECG Systems. Comput. Intell. Neurosci. 2021, 2021, 7677568. [Google Scholar] [CrossRef]
  40. Chon, K.H.; Cohen, R.J. Linear and nonlinear ARMA model parameter estimation using an artificial neural network. IEEE Trans. Biomed. Eng. 1997, 44, 168–174. [Google Scholar] [CrossRef]
Figure 1. Distribution of AIC-selected autoregressive (AR) model orders under two different maximum search limits (max_lag = 20 vs. max_lag = 50) across two datasets. The histograms illustrate how the choice of the maximum allowed AR order influences AIC-based model selection. For the Healthy Subjects dataset (left), increasing the maximum search limit from 20 to 50 causes the distribution to shift rightward and become markedly wider, indicating substantial overestimation of the optimal order when high-order models are permitted. A similar trend is observed in the much larger Autonomic Aging dataset (right), where most selections remain centered around moderate orders, but a long tail up to p = 50 emerges when the high-order search range is allowed. Overall, the figure demonstrates that allowing excessively high maximum search orders systematically inflates AIC-selected AR orders and reduces robustness across datasets of varying size and characteristics. For 5-min segments (median N ≈ 340), p = 50 corresponds to p/N ≈ 0.15, approaching the region where parameter variance inflation becomes substantial according to classical AR identifiability theory. Thus, pmax = 50 was intentionally chosen to exceed conventional recommendations (p ≤ 20) and probe the sensitivity of AIC under near-boundary conditions.
Figure 1. Distribution of AIC-selected autoregressive (AR) model orders under two different maximum search limits (max_lag = 20 vs. max_lag = 50) across two datasets. The histograms illustrate how the choice of the maximum allowed AR order influences AIC-based model selection. For the Healthy Subjects dataset (left), increasing the maximum search limit from 20 to 50 causes the distribution to shift rightward and become markedly wider, indicating substantial overestimation of the optimal order when high-order models are permitted. A similar trend is observed in the much larger Autonomic Aging dataset (right), where most selections remain centered around moderate orders, but a long tail up to p = 50 emerges when the high-order search range is allowed. Overall, the figure demonstrates that allowing excessively high maximum search orders systematically inflates AIC-selected AR orders and reduces robustness across datasets of varying size and characteristics. For 5-min segments (median N ≈ 340), p = 50 corresponds to p/N ≈ 0.15, approaching the region where parameter variance inflation becomes substantial according to classical AR identifiability theory. Thus, pmax = 50 was intentionally chosen to exceed conventional recommendations (p ≤ 20) and probe the sensitivity of AIC under near-boundary conditions.
Electronics 15 01319 g001
Figure 2. Record length distribution.
Figure 2. Record length distribution.
Electronics 15 01319 g002
Figure 3. Optimal AR Order (popt) vs. Record Length (n). Analysis indicates that the record length (n) has a negligible impact on the selection of the optimal order (popt), statistically insignificant within the parameters of this study.
Figure 3. Optimal AR Order (popt) vs. Record Length (n). Analysis indicates that the record length (n) has a negligible impact on the selection of the optimal order (popt), statistically insignificant within the parameters of this study.
Electronics 15 01319 g003
Figure 4. Criterion Comparison: AIC/BIC/AICc/FPE. Comparison of information criteria for AR order selection: AIC (Akaike Information Criterion) balances model fit and complexity; BIC (Bayesian Information Criterion) imposes a stricter penalty on the number of parameters based on sample size; AICc (Corrected AIC) adjusts for small sample bias; and FPE (Final Prediction Error) estimates the one-step-ahead prediction error. All criteria were evaluated to determine their sensitivity to the maximum search range (p_max). While the AIC tends to favor higher orders as p_max expands (H1: The selected order increases when p_max is expanded.), the BIC and AICc provide more conservative penalties to prevent overfitting in large-scale HRV datasets. As pmax increases, the AIC objective function flattens out at higher orders, significantly reducing the margin between the first- and second-best candidate orders. This leads to the instability predicted in H2 (The difference between the best and second-best AIC values decreases under large p_max), where the model becomes hypersensitive to stochastic noise.
Figure 4. Criterion Comparison: AIC/BIC/AICc/FPE. Comparison of information criteria for AR order selection: AIC (Akaike Information Criterion) balances model fit and complexity; BIC (Bayesian Information Criterion) imposes a stricter penalty on the number of parameters based on sample size; AICc (Corrected AIC) adjusts for small sample bias; and FPE (Final Prediction Error) estimates the one-step-ahead prediction error. All criteria were evaluated to determine their sensitivity to the maximum search range (p_max). While the AIC tends to favor higher orders as p_max expands (H1: The selected order increases when p_max is expanded.), the BIC and AICc provide more conservative penalties to prevent overfitting in large-scale HRV datasets. As pmax increases, the AIC objective function flattens out at higher orders, significantly reducing the margin between the first- and second-best candidate orders. This leads to the instability predicted in H2 (The difference between the best and second-best AIC values decreases under large p_max), where the model becomes hypersensitive to stochastic noise.
Electronics 15 01319 g004
Figure 5. Comparison of Power Spectral Density (PSD) estimations across different subjects. The gray lines represent the PSD estimated by the Welch’s method, while the red lines indicate the AR model-based estimation (order p as specified). The purple and green shaded areas correspond to the Low-Frequency (LF) and High-Frequency (HF) bands, respectively. (a) AR estimation for Healthy_Subject001 with p = 5 and False Peak = 0 (left), and for Healthy_Subject002 with p = 2 and False Peak = 0 (right). (b) AR estimation for Healthy_Subject003 with p = 5 and False Peak = 0 (left), and for Healthy_Subject005 with p = 2 and False Peak = 0 (right). (c) AR estimation for Aging_Subject001 with p = 9 and False Peak = 0 (left), and for Aging_Subject003 with p = 4 and False Peak = 0 (right). (d) AR estimation for Aging_Subject004 with p = 8 and False Peak = 0 (left), and for Aging_Subject005 with p = 8 and False Peak = 0 (right). Visual validation of AR spectral stability under the recommended search range (pmax = 20). This figure demonstrates that restricting the maximum AR order to pmax = 20 effectively prevents the occurrence of false peaks across different age groups. Healthy Subjects: Clear and distinct spectral peaks are observed in both LF and HF bands, representing robust autonomic regulation. For all cases, the False Peak count is zero at low AR orders (p = 2 to 5). Aging Subjects: Compared to healthy subjects, a visible reduction in total power and a flattening of the HF peak are observed, reflecting the characteristic decline in parasympathetic activity associated with aging. Despite this lower signal-to-noise ratio in elderly patient data, the AR model with p ≤ 9 remains stable, yielding False Peak = 0. Conclusion: These results provide visual evidence that the traditional pmax constraint is effective in maintaining the robustness of HRV spectral estimation, preventing the fragmented peaks hypothesized in H3 (if the order is too high, the peaks in the spectrum will split into pieces, resulting in false peaks). Additionally, this visualizes the “overfitting” that occurs as a result of H2 (The difference between the best and second-best AIC values decreases under large pmax.). When pmax is widened, the difference in AIC between the original optimal order (1st place) and the higher orders (2nd place and below) that pick up unnecessary noise narrows, resulting in the higher order being selected as “1st place,” resulting in the appearance of the “false peak” shown in Figure 5.
Figure 5. Comparison of Power Spectral Density (PSD) estimations across different subjects. The gray lines represent the PSD estimated by the Welch’s method, while the red lines indicate the AR model-based estimation (order p as specified). The purple and green shaded areas correspond to the Low-Frequency (LF) and High-Frequency (HF) bands, respectively. (a) AR estimation for Healthy_Subject001 with p = 5 and False Peak = 0 (left), and for Healthy_Subject002 with p = 2 and False Peak = 0 (right). (b) AR estimation for Healthy_Subject003 with p = 5 and False Peak = 0 (left), and for Healthy_Subject005 with p = 2 and False Peak = 0 (right). (c) AR estimation for Aging_Subject001 with p = 9 and False Peak = 0 (left), and for Aging_Subject003 with p = 4 and False Peak = 0 (right). (d) AR estimation for Aging_Subject004 with p = 8 and False Peak = 0 (left), and for Aging_Subject005 with p = 8 and False Peak = 0 (right). Visual validation of AR spectral stability under the recommended search range (pmax = 20). This figure demonstrates that restricting the maximum AR order to pmax = 20 effectively prevents the occurrence of false peaks across different age groups. Healthy Subjects: Clear and distinct spectral peaks are observed in both LF and HF bands, representing robust autonomic regulation. For all cases, the False Peak count is zero at low AR orders (p = 2 to 5). Aging Subjects: Compared to healthy subjects, a visible reduction in total power and a flattening of the HF peak are observed, reflecting the characteristic decline in parasympathetic activity associated with aging. Despite this lower signal-to-noise ratio in elderly patient data, the AR model with p ≤ 9 remains stable, yielding False Peak = 0. Conclusion: These results provide visual evidence that the traditional pmax constraint is effective in maintaining the robustness of HRV spectral estimation, preventing the fragmented peaks hypothesized in H3 (if the order is too high, the peaks in the spectrum will split into pieces, resulting in false peaks). Additionally, this visualizes the “overfitting” that occurs as a result of H2 (The difference between the best and second-best AIC values decreases under large pmax.). When pmax is widened, the difference in AIC between the original optimal order (1st place) and the higher orders (2nd place and below) that pick up unnecessary noise narrows, resulting in the higher order being selected as “1st place,” resulting in the appearance of the “false peak” shown in Figure 5.
Electronics 15 01319 g005aElectronics 15 01319 g005b
Table 1. Comparison of Optimized Model Orders Under Different Maximum Order Constraints (pmax).
Table 1. Comparison of Optimized Model Orders Under Different Maximum Order Constraints (pmax).
DatasetConditionNMean ± SDMedian
[IQR]
Range
(Min–Max)
Orders
>20 (%)
p-Value
(Wilcoxon)
Healthypmax = 201416.33 ± 5.035.0
[3.0–7.0]
1 to 20-<0.001
pmax = 5014111.11 ± 13.235.0
[3.0–12.0]
1 to 5017.0%
Agingpmax = 2011168.25 ± 3.568.0
[6.0–10.0]
1 to 20-<0.001
pmax = 5011169.07 ± 6.418.0
[6.0–10.0]
1 to 503.8%
The table compares how the estimated optimal order changes depending on the “maximum allowable model order (p) (p_max)” in time-series modeling (AR model) for heart rate variability analysis.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yuda, E.; Kaneko, I.; Hirahara, D.; Hayano, J. Robustness of AIC-Based AR Order Selection in HRV Analysis. Electronics 2026, 15, 1319. https://doi.org/10.3390/electronics15061319

AMA Style

Yuda E, Kaneko I, Hirahara D, Hayano J. Robustness of AIC-Based AR Order Selection in HRV Analysis. Electronics. 2026; 15(6):1319. https://doi.org/10.3390/electronics15061319

Chicago/Turabian Style

Yuda, Emi, Itaru Kaneko, Daisuke Hirahara, and Junichiro Hayano. 2026. "Robustness of AIC-Based AR Order Selection in HRV Analysis" Electronics 15, no. 6: 1319. https://doi.org/10.3390/electronics15061319

APA Style

Yuda, E., Kaneko, I., Hirahara, D., & Hayano, J. (2026). Robustness of AIC-Based AR Order Selection in HRV Analysis. Electronics, 15(6), 1319. https://doi.org/10.3390/electronics15061319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop