Next Article in Journal
Evaluating Architecture Scalability and Transfer Learning in Urban Scene Segmentation Using Explainable AI
Previous Article in Journal
Building Prototype Evolution Pathway for Emotion Recognition in User-Generated Videos
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams

by
Nikitas Gerolimos
1,*,
Vasileios Alevizos
2,3 and
Georgios Priniotakis
1
1
Department of Industrial Design and Production Engineering, University of West Attica, 12244 Athens, Greece
2
Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, 171 77 Stockholm, Sweden
3
MLV Research Group, Department of Informatics, Democritus University of Thrace, 65404 Kavala, Greece
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2026, 10(3), 74; https://doi.org/10.3390/bdcc10030074
Submission received: 18 January 2026 / Revised: 14 February 2026 / Accepted: 24 February 2026 / Published: 28 February 2026

Abstract

Ergonomic load in human–autonomy teams is commonly treated as a static score or a post-hoc audit, even though modern sensing and communication enable real-time regulation of operator effort. We model ergonomic load as a dissipative dynamical state inferred online from multimodal effort proxies and task context, and couple it to autonomy through load-dependent gain moderation and compliance shaping. The method is evaluated on public human–swarm and human–robot interaction traces together with effort-proximal wearable and myographic datasets using a unified, windowed pipeline and controlled stress tests that emulate latency, downsampling, packet loss, and channel dropouts. On a large human–swarm benchmark, the estimator achieves strong discrimination and calibration for rare high-load events (up to AUROC 0.87 , AUPRC 0.41 , ECE 0.031 at q = 0.90 ) and degrades predictably under delay, with a knee around 300– 400 ms (AUROC 0.87 0.80 , ECE 0.031 0.061 at 500 ms ). Embedding the estimate in the adaptation schedule reduces overload incidence and oscillatory redistribution while preserving coordination proxies in surrogate closed-loop simulation: overload time drops from 7.8 % to 4.1 % (relative reduction   47 % ) with throughput maintained near baseline ( 1.00 0.97 ) and oscillation power reduced ( 0.26 0.14 ) under nominal timing. These results provide a reproducible pathway for making ergonomics a control-relevant feedback signal, together with explicit operational constraints on estimator calibration (target ECE 0.05 ) and end-to-end latency (effective τ 300 ms ) required to avoid regime switching and maintain stable, interpretable adaptation.

1. Introduction

Closed-loop human–autonomy system ensemble refers to a human operator that supervises or collaborates with a multi-robot swarm (or a single robot acting as a proxy for swarm behavior), while the autonomy continuously adapts its coordination policy using real-time estimates of the operator’s cognitive/physiological state and task context. Concretely, they couple three streams, including decentralized multi-agent control and communication, human sensing (e.g., workload, fatigue, posture/effort proxies), and an adaptation layer (gain scheduling, shared-control arbitration, or constraint enforcement)—so that safety and ergonomic risk are regulated online without requiring the operator to explicitly command every adjustment.
A useful way to approach these systems is to treat the coupled human–swarm ensemble as a negotiated order: a field of local rules that seeks coherence while being continuously perturbed by an operator whose attention, effort, and intent are neither stationary nor fully observable. In that lens, the central design problem is not merely to inject more feedback, but to decide what variables are admissible as a state, what transformations make them decision-relevant, and which parts of the closed loop should be entrusted to the human versus to distributed autonomy. This perspective aligns with shared-control and control-sharing accounts that analyze the human, interface, and autonomy as co-determining the loop’s effective dynamics, rather than as separable modules [1]. Within human–swarm interaction (HSI), the same stance motivates adaptive autonomy: the autonomy is not fixed, but reallocated as operator capacity and environmental volatility vary [2]. The epistemic challenge is then to maintain the interpretability of the loop while tolerating uncertainty in human internal state, which has led to taxonomies of human models and feedback interconnections in the HITL control literature [3] and to distributed-control perspectives that emphasize robustness under partial information and non-ideal communication [4].
From an implementation standpoint, physiological feedback typically enters as a structured proxy for latent operator variables (workload, vigilance, fatigue, stress) and is fused with task/performance observables (coverage, error rate, response time) and biomechanical/kinematic observables (posture, joint angles, repetition) [5]. The sensing categories are commonly organized by bandwidth and intrusiveness: neurophysiology (e.g., EEG) and ocular metrics (eye tracking, pupil dilation) for cognitive/attentional load; myoelectric signals (EMG) for muscular effort and local fatigue and inertial/vision-based pose tracking for posture-derived ergonomic scores [6]. Ergonomic-risk actuation often uses standardized scoring surrogates such as RULA, whether in its original form or via instrumented, real-time approximations [7,8], and has been operationalized in industrial contexts through IMU-driven feedback that demonstrably shifts working postures toward lower-risk regimes [9]. In parallel, subjective workload instruments such as NASA-TLX are frequently used for calibration/ground truthing or for hybrid identification strategies when physiological proxies drift [10]. A pragmatic benefit of this multi-layer architecture is redundancy: if one modality saturates or becomes noisy (e.g., motion artifacts in EEG), others can preserve observability of operator capacity at the control timescale [2].
The modeling step determines what kinds of stability claims are even meaningful. One common family treats human internal state as a latent mode that switches the loop’s effective gains and delays, motivating stochastic or hybrid representations and synthesis conditions that survive mode uncertainty [11]. In this family, linear-matrix-inequality (LMI) approaches and related robust-control tools are attractive because they provide constructive sufficient conditions for stability under bounded uncertainty, and they can accommodate input/output feedback designs when full-state access is unrealistic [12]. A second family emphasizes distributed estimation and decentralized coordination, in which the human influences high-level objectives or constraints while the swarm maintains local autonomy through consensus, formation, or coverage controllers [4]. In practice, these families are often combined: the swarm is stabilized by distributed control, while a slower supervisory loop adapts parameters based on estimated operator state and task phase, with explicit attention to intention estimation and safe trajectory tracking in collaborative settings [13]. The principal limitation is epistemic: guarantees tend to be model-conditional, and mis-specification of how physiology maps to latent state can shift the loop from stabilizing compensation to destabilizing overreaction.
Within that structure, gain scheduling and adaptive assistance can be interpreted as an allocation policy over autonomy, not merely as a difficult tuning. Closed-loop teaming studies illustrate a canonical mechanism: build an individualized mapping from multi-modal physiology to workload, then modulate robotic assistance so that performance improves without requiring explicit commands at each decision point [14]. In HSI, related ideas appear as adaptive autonomy and interface-level interventions that shape attention and control, including frameworks that explicitly model attention trajectories and their coupling to multi-agent decision cadence [15] and structured experimental frameworks that connect situation awareness measures to swarm-task performance [16]. A third, ergonomics-forward class schedules gains to maintain biomechanical safety margins (e.g., reducing required exertion, modifying motion profiles, or reallocating tasks) while attempting to preserve swarm-level objectives; this is where real-time ergonomic monitoring frameworks and risk-score feedback are operationally leveraged [5]. The major advantage of scheduling is responsiveness to nonstationarity; the major risk is chattering or oscillation when physiological estimates are noisy or delayed, which is why safety-oriented hybrid and cloud/edge control approaches increasingly emphasize bounded adaptation and explicit constraint handling [17,18].
Across these categories, the trade space can be summarized as follows. Model-based synthesis (e.g., LMI-driven designs, jump/hybrid representations) offers analyzable sufficient conditions and clearer failure modes, but it can be brittle to sensor drift, individualized physiology, and context-dependent meaning of workload [11,12]. Data-driven and learning-augmented approaches can absorb heterogeneity and may exploit a richer physiological structure, yet they raise identification and safety questions: the loop may appear stable in-distribution while behaving unexpectedly under rare stressors or altered task demands [19]. Ergonomics-first controllers can reduce musculoskeletal risk and improve sustainability of operation, but they may trade away short-horizon task efficiency or introduce operator frustration if the intervention is not well aligned with perceived intent [6,9]. Finally, the HSI literature highlights an additional confound: interacting with swarms can itself shift psychophysiological state as a function of group size, realism, and interface modality, implying that the sensing-and-control design changes the very state it seeks to regulate [20,21]. An epistemically cautious stance is, therefore, to view stability and ergonomic risk not as single numbers but as regime claims supported by multi-modal evidence, benchmarked against standardized workload/ergonomics instruments and interpreted through explicit assumptions about human adaptation and co-regulation [2,3].
Recent work on ergonomic assessment increasingly operationalizes risk estimation as a perception problem, where non-contact sensing and learning-based inference replace manual scoring and enable continuous monitoring in collaborative settings. Menanno et al. propose an ergonomic risk assessment system that combines 3D human pose estimation with a collaborative robot to infer posture-dependent risk in real time, illustrating how vision pipelines can make ergonomics observable at control-relevant rates [22]. Complementary AI-based ergonomics modeling has also moved beyond purely kinematic proxies toward discomfort-centric targets: Haj Mahmoud et al. use neural networks to identify factors associated with self-reported discomfort in picking tasks, highlighting that learned mappings can capture subjective strain patterns that traditional observational scores may miss and can serve as supervision signals when direct biomechanical ground truth is unavailable [23]. More broadly, industrial ergonomic HRC surveys consolidate the sensing stack (vision, IMUs, EMG) and learning/decision layers used to close the loop between assessment and assistance, emphasizing that real deployments typically require sensor fusion, robust inference, and explicit integration with robot behaviors rather than isolated scoring modules [24].
In human–robot collaboration, post-2024 contributions increasingly couple AI perception (pose, action, object/context recognition) to online decision logic that reallocates tasks or modulates robot behavior to manage ergonomic exposure. Iodice et al. present an intelligent HRC framework where vision-based 3D pose tracking and recognition feed an ergonomics assessment and drive adaptive decisions through structured autonomy logic, exemplifying an end-to-end sensing–AI–adaptation pipeline oriented to low-latency collaboration [25]. Related sensing-driven architectures extend beyond the shopfloor by integrating immersive evaluation and wearable signals: VR-based cooperative workplace assessment with wearable sensors provides a template for combining physiological/effort measurements with AI models for prediction and proactive adaptation, particularly for design-time validation of HRC workcells [26]. On the allocation side, ergonomic role-assignment methods derived from motion data formalize how risk metrics can be embedded into task distribution between human and robot, creating a direct bridge from sensed kinematics to AI-guided planning under ergonomic constraints [27]. Finally, recent HRC perception-and-planning reviews synthesize how computer vision and AI pipelines are being combined to deliver human-aware, safety- and comfort-oriented robot behavior, reinforcing ergonomics as a primary driver for multimodal sensing and explainable decision-making in collaborative robotics [28].
Despite substantial progress in human state estimation and adaptive autonomy, the literature still lacks a unified, auditable framework that makes ergonomic load a control-relevant state with explicit requirements on estimation quality, timing, and closed-loop robustness, rather than a static score or post hoc safety audit. This work fills that gap by modeling ergonomic load as a dissipative dynamical state inferred online from effort-proximal multimodal signals and task context, and by embedding the estimate into a bounded gain moderation and compliance shaping mechanism whose stability implications are stated through reviewable sufficient conditions. Compared with prior ergonomic assessment pipelines that report risk scores without linking them to loop behavior, and compared with adaptive autonomy schemes that modulate assistance without calibration and delay margins, the proposed approach explicitly couples calibration, rare-event precision, and latency to the effective loop gain and to regime transitions under realistic impairments. The theoretical contributions are an explicit coupled human–autonomy load model, a small-gain and delay-aware sufficient-condition bridge that ties estimator properties to overload-attractor avoidance, and a fixed rule-based regime mapping that prevents circular interpretation. The practical contributions are a reproducible evaluation protocol on public datasets that stress-tests delay, downsampling, packet loss, and channel ablations, and concrete deployment constraints in which the method is expected to be safe and beneficial, including calibration targets, latency budgets, and sensing priorities for robust human–robot collaboration and ergonomic monitoring in real environments.
Furthermore, Table 1 summarizes the representative strands in ergonomics monitoring, human–robot collaboration (HRC), adaptive autonomy, and physiological workload estimation, and highlights the gap we address: most studies either estimate risk/workload without making it a control-relevant state with explicit timing and calibration requirements, or adapt autonomy without auditable bounds that tie estimator uncertainty and delay to closed-loop regime behavior.
The literature increasingly demonstrates that ergonomics monitoring can be made continuous and control-adjacent via non-contact sensing and learning-based inference, yet it remains fragmented across perception-centric risk estimation pipelines that report posture or discomfort surrogates without explicit feedback semantics, human–robot collaboration frameworks that close the loop heuristically without stating auditable bounds linking estimator uncertainty and delay to stability-relevant behavior, and adaptive autonomy approaches that modulate assistance using workload proxies but rarely operationalize calibration, rare-event precision, and latency as design constraints; consequently, there is still no unified, reviewable framework that treats ergonomic load as a control-relevant dynamical state with fixed regime semantics and explicit requirements on estimation quality and end-to-end timing. In this manuscript, we consolidate these strands by defining ergonomic load as a control-sufficient latent state inferred online from effort-proximal multimodal signals and task context, and by embedding it into bounded gain moderation and compliance shaping whose implications are stated through auditable sufficient conditions and fixed regime-labeling rules. This coupling makes the contribution falsifiable under realistic impairments (delay injection, downsampling, packet loss, channel ablation) and shifts the evidentiary standard from nominal prediction accuracy to demonstrable, interpretable changes in overload incidence and oscillatory redistribution under timing and sensing constraints, thereby providing both a theoretical bridge from estimator properties to closed-loop regime behavior and a reproducible deployment-oriented protocol with calibration targets, latency budgets, and fail-safe reversion logic when assumptions are violated.
This work contributes a unified control-theoretic framing in which ergonomic load is treated as a dynamical state that is jointly regulated with collective coordination, rather than as an external audit variable appended after the fact. The core hypothesis is that a control-sufficient ergonomic-load estimate can be inferred online from multimodal effort proxies and then embedded as a feedback channel that modulates decentralized interaction rules, thereby enlarging the closed-loop stability region and reducing overload incidence without collapsing coordination performance, even under latency and nonstationary task demand. On that basis, the paper contributes (i) an explicit coupled model of human recovery/accumulation dynamics and distributed swarm control, (ii) a synthesis principle for load-aware gain moderation and compliance shaping that makes the adaptation layer analyzable rather than ad hoc, and (iii) an interpretable regime mapping that distinguishes equilibrated, oscillatory, and overload-prone behaviors as qualitative outcomes of delay, aggressive adaptation, and partial observability, with emphasis on failure boundaries that are actionable as design constraints rather than descriptive postures.
Empirically, the contribution is operationalized through a reproducible evaluation protocol that uses public data at a high level in complementary roles: multimodal wearable stress/strain recordings to identify and validate the ergonomic-load estimator and to characterize recovery time constants; task-structured physiological datasets with phase labels or workload annotations to test sensitivity to demand shifts; physical human–robot interaction effort-proxy datasets (e.g., myography/force-myography or interaction-force surrogates) to anchor load estimation in signals proximal to exertion and interaction/coordination traces from industrial human–robot collaboration and human–swarm supervision to evaluate whether load-aware decentralization preserves task-relevant performance while suppressing overload proxies. Robustness is assessed by controlled impairment of the sensing–communication loop (delay injection, downsampling, packet-loss emulation, and channel ablation), turning otherwise qualitative claims about real-time feedback into falsifiable tests of stability degradation, hysteresis, and oscillatory redistribution under realistic constraints, while keeping the experimental design portable across heterogeneous datasets, as described in Section 2.1.
In this manuscript, ergonomic load is defined as a control-sufficient latent state constructed to be decision-relevant for closed-loop regulation, not as a direct or exhaustive measurement of classical human factors constructs such as fatigue, stress, or cognitive workload at the signal or biomarker level. While it is inferred from effort-proximal physiological and interaction observables, the variable is intentionally calibrated and bounded to support stability-oriented analysis and enforceable gain/compliance moderation under delay and uncertainty, rather than to claim physiological identifiability or diagnostic specificity. Accordingly, interpretability is asserted at the operating-regime level: the load state is used to distinguish equilibrated, oscillatory, and overload-prone behaviors and to define auditable conditions under which adaptation remains stable and non-chattering. This choice allows the paper to connect estimator properties (including calibration, rare-event precision, and latency sensitivity) to closed-loop behavior through reviewable sufficient conditions, while avoiding over-interpretation of the latent state as a precise surrogate for any single underlying physiological mechanism.

2. Materials and Methods

2.1. Datasets

Empirical evaluation was designed to be portable across heterogeneous public datasets while preserving a consistent operational definition of ergonomic load as a latent, time-varying state inferred from measurable effort proxies and task context. We, therefore, used datasets that jointly cover (i) human–swarm supervision with explicit interaction traces, (ii) industrial human–robot collaboration where task execution and anomalies yield demand shifts, and (iii) wearable or proximal physiological recordings that support estimator identification and recovery/dissipation characterization. In particular, we used the Human–swarm Interaction Dataset that captures real-time human interaction with virtual swarms in shared physical space, providing behavioral and interaction traces suited to testing how interface-mediated supervision co-varies with demand and coordination dynamics [34]. To complement the human–swarm setting with collaborative robotics and task disturbances, we incorporated RoHuCAD, which targets human–robot collaborative anomaly detection and provides time-indexed traces in which deviations and irregular events can be treated as structured perturbations to the closed loop [32], as well as HRI30, an industrial human–robot interaction action-recognition dataset that supports phase- and action-structured segmentation of interaction episodes for demand-change sensitivity analysis [31]. Finally, we used SenseCobot as an additional public reference point for collaborative robotics experimentation, enabling cross-dataset checks that the evaluation protocol does not overfit to a single laboratory or instrumentation stack [33].
To anchor load estimation in signals proximal to exertion—rather than relying only on kinematics or task labels—we included datasets that provide wearable and muscle/force surrogates with sufficient temporal resolution to support windowed estimation and recovery modeling. Specifically, we used a Force Myography dataset for human–robot interactions, which provides noninvasive pressure-based myographic measurements that act as an effort-proximal channel for estimating local muscular activation trends during interaction [29], and a multi-channel surface EMG fatigue dataset that contains multi-electrode sEMG recordings enabling explicit fatigue-related feature extraction and stressor-response characterization over repeated activation regimes [30]. In parallel, we used a PhysioNet wearable dataset collected under induced stress and structured exercise sessions, which provides multi-modal wearable signals with protocol-driven phases that are useful for identifying estimator sensitivity to demand transitions and for estimating recovery time constants under controlled conditions [35]. These sources collectively support the paper’s central requirement that load estimates be decision-relevant at the control timescale, while remaining interpretable as projections of observable effort proxies rather than as opaque scores.
In addition, we included WorkStress3D, which provides physiological recordings acquired under pressure and is, therefore, suitable for robustness checks in which the estimator is challenged by nonstationarity, protocol-induced regime shifts, and inter-subject heterogeneity [36]. Across datasets, we treated the recorded signals as time-indexed trajectories and standardized them into a common evaluation interface consisting of synchronized windows of (a) effort-proxy channels (e.g., wearable physiology, EMG, FMG), (b) task-structured context or phase indicators when available, and (c) interaction/coordination observables (e.g., HSI traces or HRC action/anomaly segments), so that the same stress tests—including delay injection, downsampling, channel ablation, and perturbation-based regime probing—can be applied without introducing dataset-specific tuning that would confound the stability and overload-regime claims advanced in this work [29,30,31,32,33,34,35,36].
The end-to-end logic is depicted in Figure 1 provides a synoptic workflow of the proposed framework, clarifying how heterogeneous datasets (Section 2.1) are converted into a common windowed interface, how this interface constrains estimator design (Section 2.4), and how the resulting estimate is injected into the autonomy adaptation layer for closed-loop evaluation (Section 2.5). Concretely, the datasets contribute three complementary roles: effort-proximal channels (wearables, EMG/FMG) define the semantics of the latent load state and recovery behavior; task/context labels (when present) provide weak supervision for identification without assuming a universal ground truth and interaction/coordination traces provide the plant-side observables needed to quantify whether load-aware moderation changes regime occupancy and overload incidence. The diagram, therefore, disambiguates the separation between learning a decision-relevant estimator from portable window features and training-only normalization, and testing control relevance through impairment sweeps and regime labeling that are fixed prior to reporting, so that improvements in overload suppression are attributable to the estimator–policy interface rather than to dataset-specific tuning.
The same workflow maps directly to industrial deployment: raw sensing (at least one effort-proximal stream plus a minimal coordination observable) is streamed to an edge inference service that computes L ^ on a fixed cadence, logs calibration/latency health metrics, and outputs a bounded adaptation signal that gates gains or compliance in the existing robot or swarm controller (Section 2.3.2). Figure 1 makes the operational coupling explicit: if communication delay or dropouts degrade calibration and rare-event precision, the policy downshifts by design (bounded schedule, conservative damping), preventing chattering near the operating point and maintaining interpretable behavior under stress. In practice, this enables a “drop-in” supervisory layer in which the proposed model does not replace certified low-level control, but overlays auditable limits and moderation rules that are compatible with standard safety workflows: (a) validate estimator calibration and end-to-end latency on-site using the same impairment operators as in Section 2.5.1; (b) select schedule parameters within verified safe regions and (c) run with continuous monitoring of timing and sensing availability to trigger fail-safe reversion when the assumptions underlying closed-loop benefit are violated. The common windowed signal and feature interface used throughout the workflow is summarized in Table 2.

2.2. Ergonomic Regulation

From the literature review, a few gaps were identified. First, the ergonomic risk and workload are frequently estimated as monitoring outputs or post hoc audit scores, while adaptive autonomy and shared-control mechanisms often modulate assistance without an auditable linkage between estimator properties (calibration, rare-event precision, bounded error) and closed-loop behavior under non-ideal timing and sensing. This motivated the present work’s central proposal that ergonomic load must be formulated as a control-relevant dynamical state, with explicit requirements on estimation quality and end-to-end latency, and with a bounded adaptation rule that is interpretable in regime terms (equilibrated, oscillatory, overload-prone). The following research questions are, therefore, posed directly from that gap and are used to guide the methodological choices in Section 2.3, Section 2.4 and Section 2.5, and to structure the interpretation in Section 4.
First, the examination of the conditions under which an ergonomic-load estimator remains control-relevant when deployed across heterogeneous public datasets using a fixed windowed preprocessing interface. In particular, it is assessed which estimator properties must hold beyond nominal discrimination in order for the estimate to be decision-relevant in feedback, including calibration near the operating point, rare-event precision under imbalance, and temporal consistency. It is then evaluated how these properties degrade under realistic impairments that directly affect feedback viability, including injected latency, downsampling, packet-loss emulation, and channel ablation, thereby establishing operational limits for using the estimate as an input to adaptation.
Within this work the measurable conditions under which embedding the estimated load into bounded gain moderation and compliance shaping yields beneficial closed-loop behavior are determined, namely the reduced overload incidence while preserving coordination proxies, as opposed to inducing oscillatory redistribution, chattering, or throughput loss. Finally, auditability is addressed by testing whether the estimator–policy interface can be made reviewable and portable by explicitly linking estimation error and end-to-end latency to sufficient-condition style bounds and to a fixed, operational regime labeling rule. These questions are addressed in the Discussion by interpreting the estimation and robustness findings as evidence for control-relevant estimator requirements, by attributing closed-loop outcomes and regime transitions to bounded adaptation under timing constraints, and by grounding stability-relevant claims in the stated auditability commitments, including leakage-free threshold selection, fixed constants, and computable upper bounds that connect assumptions to observed failure boundaries. For ease of reference, a consolidated symbol glossary is provided in Appendix C.

2.3. Coupled Human–Autonomy Load Model

2.3.1. Ergonomic Load as a Dissipative State

We model ergonomic load as a latent, continuous-time (or discretized) state that accumulates under task demand and dissipates via recovery. Let L ( t ) R 0 denote the (scalar) ergonomic-load state, interpreted as a control-sufficient proxy for the operator’s momentary strain and fatigue potential. Let u ( t ) denote an autonomy-side control input (e.g., gain schedules, arbitration weights, or compliance shaping parameters), and let y ( t ) denote measured observables (physiology, myography/EMG proxies, and interaction/coordination traces). The minimal dynamical abstraction used throughout the experiments is a dissipative state update with bounded excitation:
L ˙ ( t ) = λ L ( t ) + α d ( t ) + ω ( t ) , λ > 0 , α 0 ,
where d ( t ) is an effective demand/exertion drive inferred from observables (defined in Section 2.4.1), and  ω ( t ) is a bounded disturbance capturing unmodeled variability and sensor-driven residuals. For implementation on sampled datasets, we use the discrete-time counterpart
L k + 1 = ρ L k + α d k + ω k , 0 < ρ < 1 ,
which makes the recovery time constant explicit via ρ = exp ( λ Δ t ) for sampling interval Δ t . The load state is not assumed to be directly observed; instead, an estimator L ^ k is inferred from the multimodal signal window W k and optionally task-phase context c k , yielding a decision variable for the autonomy. This separation is essential: the stability and regime claims are stated for the closed-loop map induced by (2) and the gain modulation in Section 2.3.2, while estimator quality is assessed by the evaluation metrics and controlled stress tests defined in the subsequent methodology subsections.
To couple ergonomics to coordination, we treat the autonomy and interaction dynamics as a generic input–output system with state x k :
x k + 1 = f x k , u k , ν k , z k = h ( x k ) , y k = g z k , ϵ k ,
where z k denotes task/coordination quantities of interest (e.g., tracking error, action phase, anomaly markers, or interaction intensity), and  ν k , ϵ k represent bounded disturbances. The closed-loop coupling is introduced by letting the autonomy-side input u k depend on the estimated load, u k = π ( L ^ k , z k ) , and letting the demand drive d k depend on both measured effort proxies and coordination context, d k = ψ ( W k , z k , c k ) . In this way, ergonomics is not a post-hoc score but an internal state that participates in the loop.

2.3.2. Load-Aware Gain Moderation and Compliance Shaping

The autonomy adaptation mechanism is implemented as load-aware moderation of control gains and/or compliance shaping. Let κ k denote an adaptation gain (or more generally, a vector of gains) used by the distributed controller or shared-control arbitration. We apply a monotone non-increasing schedule with respect to L ^ k :
κ k = κ min + ( κ max κ min ) σ L 0 L ^ k s ,
where σ ( · ) is a squashing function (e.g., logistic), L 0 is a nominal operating point, and s controls transition sharpness. This implements the design principle that aggressive adaptation is admissible only when the estimated load is low; as the load rises, gains are moderated to avoid oscillatory redistribution and overload trajectories. In parallel, we introduce a compliance shaping term that acts as local damping in response to elevated load:
u k = u k base γ ϕ ( L ^ k ) Δ z k ,
where u k base is the baseline autonomy input (without ergonomics coupling), Δ z k is a coordination error or correction term, and  ϕ ( L ^ k ) [ 0 , 1 ] increases with load, thereby injecting additional damping when the loop is most prone to destabilizing gain–delay interactions. The closed-loop experiments treat { κ min , κ max , γ , L 0 , s } as design parameters and probe regime transitions under controlled impairments (latency injection, downsampling, and channel ablation) as described in Section 2.5. The resulting outcomes are organized through the experiment matrix in Table 3 and are evaluated using the regime metrics and overload proxies reported in Table 4.

2.4. Estimator and Feature Pipeline

2.4.1. Windowed Feature Map and Normalization

All datasets are converted into a common windowed representation to make estimator training and stress testing portable across heterogeneous sensing configurations. Let S k denote the multivariate signal segment in a window of duration T ending at time index k. We define a feature map Φ ( · ) producing a fixed-dimensional vector:
φ k = Φ ( S k ) R p , S k = y k T + 1 , , y k .
The feature map is organized into (i) effort-proximal features (FMG amplitude and dispersion; EMG envelope and spectral fatigue proxies), (ii) wearable physiology summaries (heart-rate-derived and variability-like surrogates when available; stress-session dynamics), and (iii) interaction/coordination context features (action phase, anomaly markers, and interaction intensity proxies when present). A representative (non-exhaustive) set is listed in Table 2. Each feature is normalized within the dataset using robust statistics to reduce sensitivity to outliers and instrumentation differences:
φ ˜ k , j = φ k , j median ( φ · , j ) IQR ( φ · , j ) + ϵ ,
with ϵ > 0 preventing division by zero. When cross-dataset transfer is evaluated, normalization parameters are learned on the source dataset and applied to the target dataset to avoid leakage.
Table 2. Signal categories and compact windowed features for φ k = Φ ( S k ) . Datasets provide subsets; missing channels are handled via masking and ablations (Section 2.5.1).
Table 2. Signal categories and compact windowed features for φ k = Φ ( S k ) . Datasets provide subsets; missing channels are handled via masking and ablations (Section 2.5.1).
CategoryChannelsWindowed Features (Compact)
Wearable physiologyHR/wearablemean, slope, variability proxy, range
Effort-proximalFMG/sEMGRMS/envelope, spectral proxy, burst stats
Kinematics/postureIMU/posespeed/accel, jerk proxy, repetition
Interaction/contextphase/action/anomalyphase ratio, transitions, intensity

2.4.2. Identification of the Load Estimator

We learn a parametric estimator f θ mapping windowed features and optional context to an estimated load:
L ^ k = f θ φ ˜ k , c k ,
with c k encoding dataset-specific structure when available (e.g., labeled phases in structured exercise or stress protocols, industrial action segments, or anomaly intervals). Because datasets differ in label availability, identification is performed in two complementary modes. In the supervised or weakly supervised mode, a proxy target k is derived from protocol phase labels or exertion-proximal channels (e.g., FMG/EMG intensity) and used to fit f θ by minimizing
min θ k w k L ^ k k 2 + η θ 2 2 ,
where w k reweights class-imbalanced phases and η is a regularization parameter. In the cross-dataset setting, k is treated as a dataset-specific proxy, while the decision-relevant property is evaluated by downstream stability and overload metrics (Table 4) rather than by proxy fit alone.
To align estimator behavior with the dissipative load dynamics assumed by (2), we optionally impose a temporal consistency penalty:
min θ k w k ( L ^ k k ) 2 + β ( L ^ k + 1 ρ L ^ k α d ^ k ) 2 ,
where d ^ k is the demand proxy inferred from the same window (or from task context), and  ( ρ , α ) are set from recovery assumptions or estimated from protocol segments with explicit recovery. This construction enforces that L ^ k is not merely predictive of labels but also compatible with a dynamical update, which is essential when the estimator is embedded in the closed-loop gain schedule (4). The end-to-end procedure for dataset standardization, estimator identification, and closed-loop evaluation is summarized in Algorithm 1.
Algorithm 1 Workflow for dataset standardization, estimator identification, and closed-loop evaluation. The procedure is implemented using common numerical tooling; the pseudo-code expresses the logical steps required for reproducibility and cross-dataset portability, independent of software dependencies
Require: Raw dataset D with time-indexed channels y ( t ) ; optional context labels c ( t )
Ensure: Trained estimator f θ ; evaluation reports R
  1:
Ingest and align: load all channels; resample to a common rate if needed; align timestamps; define validity masks for missing channels
  2:
Windowing: choose window length T and stride Δ ; for each window k, set S k { y k T + 1 , , y k } and c k context in window if available
  3:
Feature extraction: compute φ k Φ ( S k ) using the feature definitions in Table 2
  4:
Normalization: split windows into train/validation/test by protocol blocks or subject blocks when available; compute robust statistics on training data only; apply (7) to all splits
  5:
Proxy target construction: if supervised or weakly supervised, define k phase-based proxy or effort-proximal proxy
  6:
Estimator training: initialize θ ; repeat until convergence: compute L ^ k f θ ( φ ˜ k , c k ) and update θ by minimizing (9) or (10)
  7:
Closed-loop preparation: define gain schedule κ ( L ^ k ) via (4); define compliance shaping via (5); define impairment operators τ , r, p, and A
  8:
Stress tests and ablations: for each impairment setting ( τ , r , p , A ) , apply the impairment, recompute L ^ k , run closed-loop surrogate or replay evaluation, and compute metrics from Table 4
  9:
Reporting: aggregate metrics across datasets and segments; produce regime maps over ( τ , κ max , γ ) ; return R with tables and plots referenced in Section 2.5

2.5. Experimental Protocol and Metrics

2.5.1. Experiments, Stress Tests, and Ablations

Experiments are structured to separate (i) estimator fidelity as a signal-processing object from (ii) its control relevance when embedded in the load-aware loop. Accordingly, we run three experiment families summarized in Table 3. First, Estimator identification and within-dataset validation uses protocol phases (when present) or exertion-proximal surrogates to quantify prediction fidelity and temporal consistency under (2). Second, Cross-dataset portability evaluates whether the same feature/estimator interface yields stable behavior across distinct data sources without dataset-specific tuning beyond normalization, with success measured primarily by robustness and overload metrics rather than proxy-fit alone. Third, Closed-loop impairment studies explicitly degrade the sensing–communication loop through (a) latency injection, (b) downsampling, (c) packet-loss emulation, and (d) channel ablation, thereby turning claims about real-time feedback into falsifiable tests. The impairment operators act on the window stream prior to estimation, so that estimator uncertainty propagates into the gain schedule (4) and compliance shaping (5), exposing oscillation and overload regimes under realistic constraints.
Table 3. Experiment matrix (summary). Experiments are run across datasets in Section 2.1; outcomes use Table 4 and regime maps.
Table 3. Experiment matrix (summary). Experiments are run across datasets in Section 2.1; outcomes use Table 4 and regime maps.
IDFamilyFactorsOutputs
E1Estimator T , Δ , η , β RMSE, rank corr., smoothness
E2Portabilitysource → target, normgeneralization drop, robustness
E3Closed-loop τ , r , p , A , κ min / max , γ overload/oscillation, regime label

2.5.2. Metrics, Overload Proxies, and Regime Mapping

Metrics are chosen to reflect both estimation quality and closed-loop consequences, avoiding reliance on a single scalar score. The full suite is listed in Table 4, and is computed per dataset segment and then aggregated across segments using robust summaries (median and interquartile ranges). For estimator fidelity, we report (i) pointwise error against proxy targets when available, (ii) rank-based association to reduce sensitivity to proxy scaling, and (iii) temporal consistency indicators aligned with (2). For closed-loop behavior, we report overload-proxy metrics that operationalize boundedness and instability: peak load, time spent above a high-load threshold, oscillation energy in L ^ k and in coordination error proxies, and hysteresis indicators under demand transitions. Regime mapping is obtained by sweeping impairment parameters (e.g., latency τ ) and adaptation parameters (e.g., κ max and γ ) on a grid; each grid point is labeled as equilibrated, oscillatory, or overload-prone based on thresholded combinations of the overload and oscillation metrics. This produces an interpretable partition of operating conditions, which is reported as regime maps in the Results section and cross-referenced to the experimental factors in Table 3.
Table 4. Metric suite used in the evaluation protocol. Estimator metrics are reported when a proxy target k is available; closed-loop metrics are reported for all datasets via L ^ k and task/coordination observables. Symbols: L ^ k estimated load; k proxy target; z k coordination observable; N number of windows.
Table 4. Metric suite used in the evaluation protocol. Estimator metrics are reported when a proxy target k is available; closed-loop metrics are reported for all datasets via L ^ k and task/coordination observables. Symbols: L ^ k estimated load; k proxy target; z k coordination observable; N number of windows.
GroupMetricDefinition/Interpretation
Estimator fidelityRMSE 1 N k = 1 N ( L ^ k k ) 2 (proxy-available segments only)
Estimator fidelitySpearman ρ Rank association between L ^ k and k ; robust to monotone re-scalings
Estimator dynamicsSmoothness/variation 1 N 1 k = 1 N 1 | L ^ k + 1 L ^ k | ; detects chattering under noise
Estimator dynamicsConsistency residual 1 N 1 k = 1 N 1 | L ^ k + 1 ρ L ^ k α d ^ k | ; aligns with (2)
Closed-loop overloadPeak load max k L ^ k ; worst-case strain proxy
Closed-loop overloadTime-above-threshold 1 N k = 1 N I [ L ^ k L hi ] ; overload incidence proxy
Closed-loop oscillationLoad oscillation energy 1 N k = 1 N ( L ^ k L ^ ¯ ) 2 or band-limited power around dominant frequency
Closed-loop oscillationCoordination oscillationSame as above for z k (e.g., tracking error or interaction intensity), to detect redistribution oscillations
Transition behaviorHysteresis indexDifference in equilibrium levels before/after demand steps; operationalized by segmented means around transitions
RobustnessDegradation slopesMetric change rates versus impairment parameters (e.g., / τ ), summarizing sensitivity to delay/downsampling

2.6. Sufficient Conditions, ISS Regimes, and Delay Robustness

2.6.1. Operational Definitions and Assumptions

To make the latent and control-sufficient load state operational and reviewable, we fix the semantics of (i) the estimated load scale L ^ k , (ii) the demand drive d k , and (iii) the proxy target k used for identification. Throughout, the estimator outputs a dimensionless load variable on a normalized scale,
L ^ k [ 0 , 1 ] , L ^ k : = σ s ^ k μ tr σ tr + ϵ ,
where s ^ k is the raw estimator score (arbitrary units), μ tr , σ tr are training-only normalization parameters (computed under the split rules in Section 2.7), and σ ( · ) is a squashing map (logistic) that makes thresholds interpretable on [ 0 ,   1 ] . This explicitly decouples signal units (dataset-dependent) from the control variable used by the loop.
The demand drive d k is a constructed exogenous input summarizing task/effort excitation in a window, with a fixed form that is reproducible across datasets:
d k : = m M w m M k , m e ˜ k , m + w z q ˜ k , m M w m + w z = 1 ,
where M indexes effort-proximal modalities (e.g., FMG, sEMG, wearable stress proxies when present), e ˜ k , m are robust-normalized effort features (Section 2.4.1), M k , m { 0 ,   1 } is a channel-availability mask, and q ˜ k is a normalized coordination-intensity proxy derived from z k (e.g., action intensity, anomaly density, or phase-weighted interaction magnitude). The weights { w m , w z } are fixed a priori (Table A1) and are not tuned on test data.
The proxy target k used only for estimator identification is defined distinctly from d k to avoid conflating excitation with supervision labels. If protocol phase labels exist (e.g., structured stress/exercise), we define a phase-to-load map χ ( · ) ,
k : = χ ( c k ) [ 0 ,   1 ] ,
otherwise, we use an effort-proximal proxy derived from high-SNR channels (e.g., FMG/EMG envelope statistics) and then squashed to [ 0 ,   1 ] with training-only parameters as in (11). In all cases, k is not used for regime labeling; regime labels are computed from L ^ k and closed-loop metrics (Section 2.7.2), preventing circularity.
The concrete, fixed rule table for constructing d k and k (including globally fixed weights and phase mapping) is reported in Appendix A to reduce notational load; the analysis here depends only on the boundedness and no-leakage commitments stated above.
We analyze the coupled loop using explicit assumptions that match the evaluation interface.
Assumption 1 
(Dissipative load and bounded excitation). The load state obeys (2) with 0 < ρ < 1 and bounded demand/disturbance: 0 d k d max and | ω k   | ω max .
Assumption 2 
(Plant ISS w.r.t. autonomy input). The coordination subsystem (3) is input-to-state stable (ISS) with respect to u k and disturbances, i.e., there exist class- KL and class- K functions β ( · , · ) , γ u ( · ) , γ ν ( · ) such that
x k β ( x 0 , k ) + γ u sup 0 i < k u i + γ ν sup 0 i < k ν i .
Assumption 3 
(Estimator bounded error and Lipschitz policy). The estimator error is bounded: | L ^ k L k | ε L . The load-aware policy u k = π ( L ^ k δ , z k ) is Lipschitz in its load argument with constant K L , uniformly in z k and across the parameter sweeps: π ( a , z ) π ( b , z ) K L | a b | .

2.6.2. Lyapunov/ISS Conditions and Elimination of Overload Attractors

We formalize overload as the existence of an invariant attracting set in which load remains persistently high.
Definition 1 
(Overload set and overload attractor). Fix a threshold L hi ( 0 ,   1 ) . The overload set is X hi : = { ( x , L ) : L L hi } . An overload attractor is a compact invariant set A X hi that attracts a nontrivial neighborhood under the closed-loop map.
A first, dataset-independent boundedness guarantee follows directly from dissipativity.
Proposition 1 
(Uniform boundedness of the load state). Under Assumption 1, for any L 0 0 the load satisfies
0 L k ρ k L 0 + α d max + ω max 1 ρ , k 0 .
Proof. 
Iterate (2) and bound the geometric series using 0 < ρ < 1 and the uniform bounds on d k and ω k . □
To connect the load bound to the closed loop, we use an ISS-style composite Lyapunov argument. Let V k : = V ( x k ) + c L k where V ( · ) is an ISS-Lyapunov function for the plant guaranteed by Assumption 2, and c > 0 is a coupling weight. The key design requirement is that the effective loop gain induced by load-aware adaptation is sufficiently small when load is high (gain moderation), so that the coupled interconnection satisfies a small-gain condition. Concretely, we require the policy schedule to satisfy
sup L [ L hi , 1 ] π ( L , z ) u hi , γ u ( u hi ) x ¯ hi ,
for a chosen bound x ¯ hi compatible with the desired operating regime. This says the following: once load exceeds L hi , the policy caps the autonomy action magnitude, preventing aggressive adaptation from sustaining a high-load self-excitation loop.
Theorem 1 
(Sufficient condition for elimination of overload attractors). Assume Assumptions 1–3. If the gain schedule and compliance shaping are chosen such that (16) holds and
ρ + α ψ ¯ < 1 , ψ ¯ : = sup ( x , L ) X hi | d z | | z u | | u L | ,
then the closed-loop system admits no overload attractor in X hi . Moreover, trajectories enter and remain in a sublevel set { ( x , L ) : L L } with L < L hi for all sufficiently small disturbances and estimator error.
The explicit bound construction for ψ ¯ is provided in Appendix B.
Proof sketch (reviewable and constructive). 
Under Assumption 2, the plant admits an ISS Lyapunov function V ( x ) with a decrease inequality of the form V k + 1 V k W ( x k ) + σ u ( u k ) + σ ν ( ν k ) . Under (16), σ u ( u k ) is uniformly bounded in X hi . The load update (2) depends on d k , which depends on z k and hence on x k and u k ; the composite loop gain from L u z d L is upper-bounded by ρ + α ψ ¯ . The strict inequality (17) implies contraction of the load component in the high-load region, hence L cannot remain invariant above L hi unless disturbances dominate. The estimator error enters via Assumption 3 and adds a bounded perturbation to the contraction, which can be absorbed into the disturbance term, yielding forward invariance of a sublevel set below L hi for sufficiently small ε L and disturbances. Therefore, no compact attracting invariant set can lie entirely in X hi . □

2.6.3. Delay-Stability Analysis and Regime Transitions

Also, we treat sensing/communication delay explicitly. Let the policy depend on a delayed load estimate L ^ k δ , where δ = τ / Δ t for injected latency τ and window stride Δ t . Using Assumption 3, the delayed policy introduces an additional phase-lag-like effect that increases the effective loop gain. A conservative sufficient condition is obtained by bounding the induced deviation:
u k u k ( 0 ) K L | L ^ k δ L ^ k | K L i = k δ k 1 | L ^ i + 1 L ^ i | ,
where u k ( 0 ) = π ( L ^ k , z k ) is the zero-delay action. In the equilibrated regime, L ^ k has small total variation, making (18) negligible; in the oscillatory regime, variation grows and delay amplifies the oscillation energy. We use this mechanism to define a delay-aware regime boundary by requiring that the delay-amplified effective loop gain remains below unity:
ρ + α ψ ¯ + α ψ ¯ δ < 1 , ψ ¯ δ : = K L Γ ( δ ) ,
where Γ ( δ ) is an empirically reported bound on the per-step variation accumulation over δ steps, estimated on training blocks only. Condition (19) supplies an explicit sufficient condition under which increased latency cannot induce an overload attractor, and it motivates the impairment sweeps in Table 3 as regime probes rather than as descriptive stress tests.

2.7. Closed-Loop Execution Modes with Rules

2.7.1. Closed Loop vs. Replay

The evaluation contains two distinct execution modes; stability claims are attached only to the mode that implements feedback into a controlled plant.
In CL-Sim, the controlled plant is an identified low-order surrogate that is explicit and reproducible:
x ^ k + 1 = A x ^ k + B u k + w k , z ^ k = C x ^ k ,
where ( A , B , C ) are fit on training blocks using least squares with regularization (no test leakage) and where w k is a bounded residual calibrated on validation blocks. This is the plant to which Theorem 1 and the delay condition (19) are operationally connected. In CL-Replay, the paper does not claim closed-loop stabilization of the recorded interaction; rather, it reports whether the proposed estimator–policy interface would prescribe bounded actions and reduced overload metrics under impairments, making the interpretation explicitly counterfactual.

2.7.2. Regime Labeling Rule, Thresholds, Parameter Ranges, and Uncertainty

Regime labels are assigned by a fully specified decision rule based on metrics computed from L ^ k and (in CL-Sim) from z ^ k . Let T hi be the fraction of windows above L hi (Table 4), and let P osc be a band-limited oscillation power of L ^ k (computed by discrete Fourier transform on non-overlapping segments of fixed length N seg ). The regime map label r { EQ , OSC , OL } is assigned by the following:
r : = OL , T hi θ hi , OSC , T hi < θ hi and P osc θ osc , EQ , otherwise ,
with thresholds defined once using training data only, and then fixed for all reported test results, as summarized in Table 5.
All model and impairment parameters introduced in Section 2.3, Section 2.4, Section 2.5, Section 2.6.1, Section 2.6.2 and Section 2.6.3 are constrained to the explicit ranges reported in Table 6 to ensure the method is reproducible and reviewable.
To prevent pseudo-replication on human datasets, we enforce a subject-aware validation plan whenever subject identifiers are available. Specifically, splits are performed at the subject level (all windows of a subject belong to exactly one split), metrics are first aggregated per subject, and then summarized across subjects by robust statistics. Uncertainty is reported via nonparametric bootstrap confidence intervals over subjects (CL-Sim and CL-Replay), or via contiguous block bootstrap when subject IDs are not available. The exact aggregation is as follows:
Report : = median s S metric ( s ) , metric ( s ) : = 1 N s k K s metric k ,
where S is the set of subjects, and K s indexes windows belonging to subject s. This makes the reported regime statistics interpretable as subject-level effects rather than inflated window counts. The non-negotiable methodological commitments used to prevent circularity, leakage, and pseudo-replication are consolidated in Table 7.
Numerical constants, threshold selection rules, and sweep grids are consolidated in Appendix A.

2.8. Auditability of Gain Bounds, CL-Sim I/O Semantics, and Fixed Regime Constants

CL-Sim Input/Output Semantics and Identifiability Statement

CL-Sim is explicitly a closed-loop simulation on an identified surrogate, not a claim that the original recorded human–swarm or human–robot system was physically stabilized in real time. In CL-Sim, the output z k is a dataset-derived coordination observable selected once per dataset family and then fixed: for human–swarm traces it is an interaction/coordination intensity proxy (e.g., normalized interface activity or aggregate deviation from nominal swarm configuration); for industrial HRC datasets it is an action-intensity proxy (e.g., normalized activity level) or an anomaly score when anomaly annotations exist. The surrogate state x ^ k is a low-dimensional embedding obtained by stacking lagged z features and (when present) auxiliary observables (Section 2.4.1), and the surrogate output is z ^ k = C x ^ k .
The surrogate input u k is not directly observed in the public datasets; it is a constructed autonomy action defined by the load-aware gain schedule and compliance-shaping policy in Equations (4) and (5), acting on L ^ k δ and the current surrogate output z k . Therefore, the tuple ( A , B , C ) is identified by fitting (20) on training blocks under a known excitation policy, i.e., by rolling out the constructed u k on the training traces to generate an input sequence and then solving least squares for ( A , B , C ) subject to stability/regularization constraints. CL-Sim demonstrates stability properties of the surrogate closed-loop interconnection induced by the proposed estimator–policy interface, while CL-Replay evaluates counterfactual risk/robustness metrics on recorded traces without altering them (Table 8).
Further details of the explicit computation of the small-gain constant ψ ¯ (including the CL-Sim sensitivities and the schedule derivative bounds) are mentioned in Appendix B.
Time-scale consistency details for window cadence, delay discretization, and oscillation-band conversion are provided in Appendix A; the Results use the same fixed cadence mapping for all datasets.

3. Results

All results are reported for the windowed evaluation interface defined in Section 2.1 and the estimator/controller specifications in Section 2.3, Section 2.4, Section 2.5, Section 2.6, Section 2.7 and Section 2.8. Moreover, the primary benchmark uses N = n u m 50977 CoBeXR windows with p = n u m 82 numeric features (post-imputation) and three binary targets (high-load quantiles q { 0.80 , 0.85 , 0.90 } ) plus two continuous indices ( load _ index _ v 1 , load _ index _ v 2 ). For classification, we emphasize AUROC/AUPRC for discrimination, ECE/Brier for calibration, and thresholded F1/Balanced Accuracy under the fixed regime constants in Table A4. For regression, we report R 2 , RMSE, Spearman ρ , and coverage under residual-based uncertainty proxies.

3.1. Load Estimation: Discrimination, Calibration, and Cross-Target Consistency

Across high-load targets, performance was heterogeneous in a way that is consistent with the intended semantics of q-thresholding: the q = 0.90 task exhibited lower base-rate, and therefore, emphasized calibration and precision under imbalance. On q = 0.90 , the best-performing estimator achieved AUROC n u m 0.87 and AUPRC n u m 0.41 (Table 9), while the weakest baseline achieved AUROC n u m 0.71 and AUPRC n u m 0.18 . This gap of Δ AUROC = n u m 0.16 and Δ AUPRC = n u m 0.23 is practically material because it changes the feasible operating point of the regime rule in Equation (21): at the fixed decision threshold, the top model maintained F1 n u m 0.49 with Balanced Accuracy n u m 0.74 , whereas the weakest model collapsed to F1 n u m 0.22 and Balanced Accuracy n u m 0.58 , implying systematic under-detection of overload-prone windows.
Calibration was the principal differentiator between models with similar AUROC. The best-calibrated model attained ECE n u m 0.031 and Brier n u m 0.089 , while a higher-capacity ensemble with comparable AUROC exhibited ECE n u m 0.072 and Brier n u m 0.112 (Table 9). This matters for closed-loop interpretation because L ^ k enters the gain schedule via κ ( L ^ k δ ) in Equation (4): a calibration drift of n u m 0.04 n u m 0.05 in probability space corresponds to materially different κ -attenuation near L 0 when s is small, increasing the risk of premature suppression (conservatism) or delayed suppression (overload).
Continuous-index prediction further supported cross-target consistency. For load _ index _ v 2 , the best regressor achieved R 2 = n u m 0.42 with RMSE n u m 0.19 and Spearman ρ = n u m 0.68 (Table 10), whereas the weakest achieved R 2 = n u m 0.11 with RMSE n u m 0.27 and ρ = n u m 0.39 . The ordering largely matched the q = 0.85 and q = 0.90 classification leaderboards, indicating that the learned mapping is not purely threshold-artifactual but tracks monotone structure in the effort-proxy space.

3.2. Robustness: Delay, Noise, Downsampling, and Channel Ablation

Robustness was evaluated by controlled impairment of the estimator-to-policy path and of the observable channels, matching the implementation commitments in Section 2.7.2. The principal result is that delay degrades both discrimination and calibration nonlinearly, with a clear knee around τ = n u m 300 n u m 400 ms at the window cadence Δ t win used in the benchmark. At τ = n u m 0 ms , the best classifier achieved AUROC n u m 0.87 and ECE n u m 0.031 ; at τ = n u m 500 ms , AUROC reduced to n u m 0.80 and ECE increased to n u m 0.061 (Table 11). This pattern is consistent with the delay-sensitive coupling in Equation (4) through L ^ k δ and provides an empirical rationale for reporting a delay margin in the theory section (Equation (19)).
Noise injection into L ^ (additive, zero-mean) produced a smaller AUROC decrease (from n u m 0.87 to n u m 0.84 at σ L = n u m 0.10 ) but a larger calibration penalty (ECE from n u m 0.031 to n u m 0.076 ), indicating that downstream control actions are likely to be affected more by threshold crossing volatility than by rank-order disruption. Downsampling (reducing the effective sampling rate of input channels prior to windowing) showed an asymmetric effect: discrimination degraded modestly (AUROC n u m 0.87 n u m 0.83 at × 4 downsample) while AUPRC degraded more (AUPRC n u m 0.41 n u m 0.32 ), reflecting the loss of transient signatures that are disproportionately informative for rare high-load windows.
Channel ablation identified two practically important failure modes. Removing effort-proximal channels (EMG/FMG-like proxies when present) reduced AUPRC by n u m 0.07 n u m 0.10 , whereas removing kinematic/context channels reduced AUROC by n u m 0.03 n u m 0.05 but had a smaller AUPRC impact. The most realistic real-world constraint is not full ablation but intermittent dropouts; the observed sensitivity suggests that a feasible deployment must prioritize continuous acquisition of at least one effort-proximal stream, even at reduced quality, rather than relying solely on posture/kinematics for overload detection.

3.3. Closed-Loop Effects: Overload Suppression, Performance Preservation, and Realism

Closed-loop conclusions are partitioned by execution mode as defined in Section 2.8: CL-Replay evaluates counterfactual load-aware scheduling on recorded traces without altering the plant, whereas CL-Sim executes the estimator–policy loop on an identified surrogate plant with explicit u k and z k . The central quantitative result is that load-aware gain shaping reduced overload incidence consistently while preserving a throughput proxy within a narrow band, but the magnitude of the gain depended strongly on delay and on the aggressiveness of the schedule slope s.
In CL-Sim under nominal timing ( τ = n u m 0 ms ), baseline control produced overload incidence n u m 7.8 % (fraction of windows with L ^ k L hi ), whereas load-aware scheduling reduced it to n u m 4.1 % (absolute reduction n u m 3.7 points; relative reduction n u m 47 % ), while the throughput proxy decreased only from n u m 1.00 to n u m 0.97 (Table 12). This is the regime in which the theoretical elimination-of-overload-attractors claim is most plausible: the observed reduction coincides with a reduction in oscillation power (band-limited P osc decreased from n u m 0.26 to n u m 0.14 ) and with a larger stability margin under the computed small-gain bound (margin increased from n u m 0.08 to n u m 0.21 ). Under increased delay ( τ = n u m 500 ms ), the same schedule retained overload reduction but at diminished effect (baseline n u m 9.6 % to load-aware n u m 7.9 % ) and with a measurable oscillation rebound ( P osc = n u m 0.22 ), supporting the reviewer-critical point that delay-stability must be treated explicitly and that real-world feasibility is bounded by latency budgets.
In CL-Replay, the reductions are necessarily smaller because the action does not alter the recorded plant; nevertheless, the scheduling logic reduced time-in-high-load from n u m 6.9 % to n u m 5.8 % by suppressing high-amplitude excursions in the reconstructed L ^ channel. This distinction is critical: CL-Replay supports plausibility of the mapping and identifies sensitivity, while CL-Sim provides the only evidence for closed-loop stability claims, albeit on an identified surrogate rather than on the original physical system.
Top and worst configurations were identified by ranking the overload reduction Δ p over subject to a throughput constraint (throughput n u m 0.95 ). The top-5 achieved reductions of n u m 49 % , n u m 46 % , n u m 44 % , n u m 41 % , and n u m 39 % under τ n u m 300 ms with moderate slopes s [ n u m 0.08 , n u m 0.15 ] . The worst-5 either violated throughput (drops below n u m 0.90 due to overly aggressive attenuation) or induced oscillatory switching (increased P osc by n u m 0.10 n u m 0.14 ), typically at steep slopes s n u m 0.05 combined with higher delay, which matches the mechanism emphasized in the Introduction: noisy/delayed physiology can drive chattering unless bounded adaptation is enforced.

3.4. Regime Maps, Best/Worst Cases, and Mechanistic Interpretation

Using the fixed regime constants in Table A4, we computed regime occupancy over parameter/delay sweeps and summarized transitions between equilibrated, oscillatory, and overload-prone behaviors. Under nominal timing, the load-aware schedule shifted the operating distribution toward equilibrated behavior: equilibrated occupancy increased from n u m 62 % (baseline) to n u m 78 % (load-aware), overload-prone occupancy decreased from n u m 12 % to n u m 6 % , and oscillatory occupancy decreased from n u m 26 % to n u m 16 % . Under τ = n u m 500 ms , oscillatory occupancy increased (baseline n u m 29 % , load-aware n u m 33 % ), which is the empirical signature that delay can convert otherwise stabilizing moderation into phase-lag-driven oscillation, consistent with the delay-sensitive small-gain discussion in Section 2.6.3.
The top-5 performance configurations (by overload reduction with throughput n u m 0.95 ) consistently exhibited three properties: (i) moderate schedule slope s (typically n u m 0.08 n u m 0.15 ), (ii) bounded delay τ n u m 300 ms , and (iii) low calibration error (ECE n u m 0.04 ) in the estimator stage. In contrast, the worst-5 configurations fall into two classes. The first is over-suppression: steep schedules that reduce overload but also reduce throughput below n u m 0.90 , which is operationally infeasible in real deployments that require sustained task completion. The second is chattering/oscillation: intermediate throughput ( n u m 0.93 n u m 0.97 ) but elevated P osc n u m 0.28 and increased regime switching frequency (mean switches per minute n u m 2.6 versus n u m 1.1 in the top configurations), which is ergonomically undesirable because it corresponds to repeated alternation between assistance and autonomy that can increase cognitive strain.
Taking everything into account, the observed best-versus-worst split aligns with the coupling hypothesis. When estimator calibration is strong (ECE n u m 0.03 n u m 0.04 ), L ^ behaves as a control-sufficient scalar in the limited sense required here: it produces consistent gain modulation around L 0 and avoids systematic bias that would either desensitize overload detection or trigger chronic attenuation. When calibration weakens (ECE n u m 0.07 ) or delay increases, the same schedule becomes effectively more nonlinear at the wrong timescale, which is reflected in increased oscillatory occupancy and reduced overload suppression. The practical implication is that real-world feasibility is bounded by two measurable design constraints that can be validated before deployment: an estimator calibration target (ECE n u m 0.05 on subject-blocked splits) and an end-to-end latency budget (effective τ n u m 300 ms at the chosen Δ t win ). Within these bounds, the results support the claim that load-aware moderation can reduce overload incidence by approximately n u m 40 n u m 50 % while preserving throughput within n u m 3 n u m 5 % of baseline; outside these bounds, the most likely failure mode is not immediate overload, but oscillatory redistribution and unstable regime switching that undermines interpretability and operator comfort.

4. Discussion

4.1. Interpretation of Results

The results in Section 3.1 show that discrimination alone is not the bottleneck for load-aware control; calibration and rare-event precision are the limiting factors once the estimate is injected into a gain schedule. In particular, the best-performing high-load detector at q = 0.90 achieved AUROC 0.87 and AUPRC 0.41 with low ECE ( 0.031 ), whereas a similarly strong discriminative model exhibited substantially worse calibration (ECE 0.072 ) despite comparable AUROC (Table 9). This distinction is not cosmetic: under Equation (4) and the logistic slope bound in Equation (A14), a modest probability-scale bias or miscalibration around the operating point L 0 changes u / L , and therefore, the effective loop gain in Theorem 1. In practice, the observed 0.04 0.05 calibration spread between models of similar AUROC is the difference between a schedule that attenuates early enough to prevent overload escalation versus one that reacts late and amplifies oscillatory redistribution under delay. The regression results (Table 10) reinforce this interpretation: the top regressor ( R 2 = 0.42 , ρ = 0.68 ) preserves the monotonic structure that supports consistent thresholding and regime labeling, while weaker regressors ( R 2 = 0.11 , ρ = 0.39 ) are likely to produce regime-map artifacts driven by estimator noise rather than genuine demand transitions.
These findings answer the first research question by establishing which estimator properties are control relevant under the proposed coupling. The results indicate that an estimator is suitable for feedback not when it maximizes AUROC, but when it preserves probability meaning near L 0 so that the schedule in Equation (4) changes smoothly and at the intended operating point. This directly links the estimation stage to the design objective of reducing time in X hi without inducing regime switching driven by threshold noise.
The rare-event setting at q = 0.90 clarifies why AUPRC and calibration are more diagnostic than global discrimination. High-load windows are sparse, so false positives translate into chronic gain attenuation, while false negatives translate into delayed moderation and longer residence above L hi . The observed performance spread, therefore, maps to an operational trade between unnecessary suppression and missed overload prevention, which is exactly the trade that the loop must resolve at the window cadence.
Relative to existing literature that treats workload or ergonomic scores as static annotations or post hoc summaries, the measurable progress here is that estimator quality is evaluated through its downstream consequences under explicit impairment operators. The robustness sweeps demonstrate that the estimator is not only accurate in nominal conditions but also exposes a predictable degradation pattern under delay and channel loss, which is a prerequisite for making any closed-loop claim auditable. This connects the results to the second research question by showing that the estimation interface can be stressed in a way that preserves interpretability of failure modes, rather than relying on in-distribution accuracy alone.

4.2. Closed-Loop

The core result is stability improvement from load-aware closed-loop regulation. This effect holds across delay, noise, downsampling, and ablation sweeps. Model-to-model leaderboard gaps are secondary to regime-level outcomes. Overload incidence drops when moderation contracts the high-load dynamics. Oscillation energy decreases under bounded gain and added damping. Throughput remains near baseline under admissible timing and calibration. Delay knees still appear, but failure modes become more interpretable. Robustness is expressed by shifted regime occupancy, not architecture rankings. The schedule stabilizes by reducing effective loop gain in X hi . The key evidence is cross-condition persistence of overload suppression. These findings emphasize the mechanism, not a preferred learner family. The analysis ties estimator calibration and latency to stability margins. Across settings, the regulator improves resilience of the coupled loop.
On the other hand, the main claim is regime shift toward equilibrated operating conditions. Load-aware feedback reduces time above L hi under nominal timing. It limits chattering by enforcing bounded, monotone adaptation. It preserves coordination proxies while suppressing overload attractors. This pattern repeats across heterogeneous operating disturbances. Stability benefits arise from closed-loop structure, not model identity. The decisive variable is control-relevant calibration near L 0 . Latency budgets govern when moderation remains stabilizing. AUPRC and ECE matter because they shape threshold crossing dynamics. Architecture comparisons should not be read as a contribution claim. The contribution is robust stabilization with explicit operating constraints. Regime maps summarize this effect more faithfully than leaderboards.
The central contribution is not that a load estimate can be computed from multimodal data; rather, it is that embedding that estimate into a bounded gain moderation and compliance shaping mechanism yields measurable reductions in overload incidence while preserving coordination proxies, and that these effects align with auditable sufficient conditions. In CL-Sim, load-aware scheduling reduced overload incidence from 7.8 % to 4.1 % at τ = 0 ms while maintaining throughput at 0.97 (Table 12), and simultaneously reduced oscillation power from 0.26 to 0.14 . These numerical changes are consistent with the mechanism captured by Equation (17): moderation reduces u / L (Equation (A17)), and thereby contracts the high-load dynamics so that persistent residence in X hi becomes unsustainable except under dominant disturbances. However, the same table also clarifies the limits of the claim: at τ = 500 ms , overload reduction persists but weakens ( 9.6 % 7.9 % ) and oscillatory behavior rebounds ( P osc = 0.22 ). This is precisely the regime in which delay must be incorporated into the sufficient condition, which is why Equation (19) and the fixed oscillation-band definition in Appendix A.4 bridge the theory and the observed delay knee.
These outcomes answer the third research question by isolating what the closed-loop mechanism changes and how it changes it. The reduction in overload incidence co-occurs with reduced oscillation energy, indicating that moderation is not simply lowering the reported load but reshaping the interaction between gain, delay, and demand transitions. The results, therefore, support a mechanistic interpretation where stability improvement is achieved through bounded adaptation in the high-load region rather than through increased authority or more aggressive control.
Compared with existing adaptive autonomy approaches that modulate assistance based on workload proxies without explicit bounds, the progress here is the explicit linkage between estimator calibration, schedule slope, and observed chattering or oscillation. The best configurations align with moderate slopes and low ECE, while poor configurations show either throughput loss or oscillatory switching, which provides a concrete explanation for why naive load gating can degrade teaming quality. This shifts the discussion from whether adaptation helps to when it helps and under which measurable conditions it remains interpretable.
For practical human robot collaboration, the loop should be interpreted as a safety-oriented allocator rather than a performance maximizer. When load rises, the schedule moderates adaptation and increases damping, which reduces the probability of rapid policy changes that force the operator to continually re-plan or re-correct. This implies that the expected benefit in real deployments is not only fewer overload excursions but also more predictable autonomy behavior during high-demand episodes, which is a key prerequisite for trust and sustained collaboration.
The distinction between CL-Sim and CL-Replay also has a practical implication for deployment planning. CL-Replay indicates that the estimator and policy interface produces bounded actions and reduces an overload proxy under impairment, but only CL-Sim supports the claim that the loop reshapes trajectories through feedback. A real system evaluation should, therefore, replicate the CL-Sim semantics by instrumenting actuation and measuring whether the same reductions persist when the human adapts to the intervention, because that adaptation can be the dominant source of nonstationarity in collaborative settings.
Finally, the results suggest a concrete operational interpretation of the delay knee for fielded systems. When latency approaches the knee region, moderation can become phase-lagged relative to demand changes, which increases oscillatory redistribution and degrades the interpretability of regime labels. In real environments, this translates into a requirement that sensing, inference, and actuation be co-designed to maintain a stable timing budget at the chosen window stride, rather than optimizing the estimator in isolation.

4.3. Limitations

A primary limitation is that the closed-loop evidence is established on an identified surrogate interconnection in CL-Sim rather than on a physical human robot system with measured bidirectional adaptation, so the reported regime shifts and overload suppression remain model-conditional even when the sufficient conditions are auditable. In addition, the estimator semantics depend on proxy construction and fixed thresholds, which can drift across individuals and contexts despite subject-wise splitting, and this drift can change where the gain schedule transitions relative to true operator strain. Finally, delay and channel reliability are not incidental nuisances but coupled failure drivers, because timing jitter and effort-proxy dropouts primarily degrade calibration and rare-event precision, which are the exact properties that determine whether the loop moderates smoothly or chatters near the operating point.

5. Conclusions

This study advances a control-relevant treatment of ergonomic load by providing an auditable estimator to the policy interface and demonstrating, under controlled impairments, when feedback improves closed-loop behavior rather than destabilizing it. The scientific contribution is a unified pipeline that links estimator calibration and latency directly to regime transitions, enabling reproducible stress testing and interpretable operating boundaries across heterogeneous public datasets.
Practically, the results define deployment-relevant acceptance criteria for real-time load-aware adaptation, including timing and calibration budgets that can be verified before field use. The evidence indicates that when these criteria are met, bounded gain moderation and compliance shaping can reduce overload exposure while retaining coordination proxies, whereas violations primarily manifest as oscillatory redistribution and unstable regime switching.
The study is limited by its reliance on public datasets with heterogeneous semantics and by the use of an identified surrogate for closed-loop simulation, which cannot capture all strategic and nonstationary effects present in real operators and real autonomy stacks. In addition, proxy targets and fixed regime thresholds constrain how directly the reported load scale can be interpreted as subjective or biomechanical strain, and performance under long-term sensor drift or systematic domain shift was not validated in longitudinal deployments.
Theorem 1 holds only under stated assumptions and bounds. Overload attractor elimination is, therefore, a regime claim, not a promise. Parameter choices outside Table 6 can violate the contraction. Delay margins depend on Δ t win , τ , and estimator variation. Increased ε L or miscalibration changes u / L materially. CL-Replay cannot substantiate stability, because z / u = 0 there. The small-gain inequalities summarize implementable design constraints for deployment. They guide tuning, monitoring, and fail-safe reversion under sensor drift. They do not certify unconditional safety across all teaming contexts.
Interpret conclusions through operating regimes, not universal guarantees. The reported reductions apply within validated latency and calibration budgets. If latency exceeds the knee, oscillatory redistribution can reappear. If effort-proximal sensing drops out, overload detection becomes systematically biased. Regime maps should be recalculated when the window stride or instrumentation changes. Field use requires auditing ECE , τ , and channel availability online. Outside admissible regions, the policy should downshift to conservative damping. These constraints are intended to support implementation and certification workflows. They complement, rather than replace, system-level hazard analysis and testing.

5.1. Future Work

Future work should move from surrogate-closed-loop evidence to hardware- and interface-level validation with explicit actuation and measured operator responses, including end-to-end latency measurements and dropout patterns under real wireless conditions. Methodologically, two extensions are high priority: (i) a multi-objective formulation that reports Pareto fronts over overload reduction, oscillation suppression, and throughput rather than ranking single schedules, and (ii) a delay-robust synthesis step (e.g., LMI- or Lyapunov–Krasovskii-based conditions) that produces certified parameter regions for { κ min , κ max , γ , s } under bounded τ and bounded estimator error. On the learning side, the estimator should incorporate explicit drift detection and recalibration triggers to maintain ECE below the control-relevant threshold in longitudinal deployments, and should quantify uncertainty in L ^ so that the policy can reduce aggressiveness when the estimate is unreliable rather than reacting to noise.
Also, the future research should validate the full loop in hardware with measured actuation and operator response, using instrumented end-to-end timing and dropout characterization under realistic networking conditions. It should also extend the synthesis step to explicitly certify delay and estimation error tolerance over controller parameters, and integrate uncertainty-aware adaptation so that the policy downshifts when the estimate becomes unreliable. Finally, the evaluation should expand to longitudinal and cross-context studies that quantify recalibration frequency, drift detection accuracy, and human acceptance outcomes under sustained operation, while preserving the same auditability commitments for threshold selection and subject-level reporting.
Moreover, further developments should explicitly connect the proposed estimator policy interface to deployment constraints that are not represented by public traces or surrogate plants. A practical next step is an edge-oriented real-time implementation that reports measured end-to-end latency and jitter under realistic wireless conditions and computes load and control updates within a fixed time budget aligned to the chosen window cadence. Experimental validation should then be conducted with physical human autonomy teaming tasks where the adaptation policy has authority over measurable autonomy parameters and where operator response is captured through synchronized effort proximal sensing and task outcome logs, enabling causal attribution of overload reduction rather than counterfactual inference. Finally, integration with industrial systems should address interoperability with existing control stacks and safety workflows by defining a minimal interface for load conditioned gain limits, logging and audit requirements, and fail safe behavior under dropout and sensor drift, so that the method can be evaluated within production-grade monitoring and change management processes.

5.2. Take-Home Messages

The main takeaway is that ergonomics becomes control-relevant only when the load estimate is both calibrated and timely, because it directly shapes the effective loop gain, and therefore, the regime of the coupled human autonomy dynamics. When those conditions hold, moderate load-aware scheduling can deliver large reductions in overload incidence with throughput preserved, whereas aggressive schedules under delay or miscalibration are the dominant failure mode and can degrade throughput or increase switching and oscillation. Practically, the results translate into concrete design constraints for real deployments: maintain end-to-end latency below roughly 300 ms at the operating window cadence, maintain estimator calibration at ECE 0.05 under subject-wise evaluation, and prioritize reliable effort-proximal sensing because rare high-load detection is, otherwise, substantially impaired.

Author Contributions

Conceptualization, V.A.; methodology, V.A.; software, V.A. and N.G.; validation, V.A. and N.G.; formal analysis, V.A.; investigation, V.A.; resources, V.A.; data curation, V.A.; writing—original draft preparation, V.A. and N.G.; writing—review and editing, G.P.; visualization, V.A.; supervision, V.A.; project administration, V.A.; funding acquisition, V.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. The Human–swarm Interaction Dataset: Real-Time Human Interaction with Virtual Swarms in Shared Physical Space is available via Zenodo [34]. SenseCobot (v2) is available via Zenodo [33]. RoHuCAD: Robots and Humans Collaborative Anomaly Detection (v1.0) is available via Zenodo [32]. HRI30: An Action Recognition Dataset for Industrial Human–Robot Interaction (v1.0) is available via Zenodo [31]. The Force Myography dataset for Human–Robot Interactions is available via Zenodo [29]. The Multi-channel Surface EMG Dataset for Fatigue analysis is available via Zenodo [30]. The Wearable Device Dataset from Induced Stress and Structured Exercise Sessions (v1.0.1) is available via PhysioNet [35]. The WorkStress3D dataset (Version 11) is available via Mendeley Data [36]. No new data were created in this study.

Acknowledgments

The authors acknowledge the creators and maintainers of the publicly available datasets used in this study, and the infrastructure providers that host them (Zenodo, PhysioNet, and Mendeley Data).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUROCArea Under the Receiver Operating Characteristic Curve
AUPRCArea Under the Precision–Recall Curve
BAccBalanced Accuracy
ECEExpected Calibration Error
EMGElectromyography
FMGForce Myography
HITLHuman-in-the-Loop
HRCHuman–Robot Collaboration
HSIHuman–Swarm Interaction
HRHeart Rate
HRVHeart Rate Variability
IMUInertial Measurement Unit
ISSInput-to-State Stability
IQRInterquartile Range
LMILinear Matrix Inequality
MSEMean Squared Error
NASA-TLXNational Aeronautics and Space Administration Task Load Index
PSOParticle Swarm Optimization
RMSERoot Mean Squared Error
RULARapid Upper Limb Assessment
sEMGSurface Electromyography

Appendix A. Fixed Constants, Thresholds, and Sweep Ranges

This appendix collects fixed numerical specifications and sweep grids referenced by the main text to reduce cognitive load in Section 2.6 and Section 2.7. All values are defined on training blocks only unless explicitly stated otherwise.

Appendix A.1. Reproducible Construction Rules for dk and ℓk

Table A1. Reproducible rules for constructing the demand drive d k and the identification proxy k . The rules are fixed prior to testing; dataset-specific instantiations occur only through channel availability masks M k , m and the presence/absence of context labels c k .
Table A1. Reproducible rules for constructing the demand drive d k and the identification proxy k . The rules are fixed prior to testing; dataset-specific instantiations occur only through channel availability masks M k , m and the presence/absence of context labels c k .
SymbolRuleNotes
d k (12)Weighted mixture of normalized effort features and coordination-intensity proxy; weights fixed globally
k (phase)(13) χ ( · ) maps protocol phases to { 0 ,   0.5 ,   1 } (rest/moderate/high) on training only
k (effort) k = σ ( e ˜ k , m μ tr ) / ( σ tr + ϵ ) m is a designated effort-proximal modality (FMG/EMG) when phase labels are unavailable

Appendix A.2. Regime Labeling Thresholds (Training-Only Selection)

Table A2. Appendix summary of regime-labeling thresholds and training-only selection rules. Thresholds are fixed prior to test reporting to avoid circularity.
Table A2. Appendix summary of regime-labeling thresholds and training-only selection rules. Thresholds are fixed prior to test reporting to avoid circularity.
QuantityDefinitionSelection Rule (Training Only)
L hi overload threshold on [ 0 , 1 ] set to the q hi -quantile of L ^ k during high-demand phases (if labeled) or upper quantile of effort proxy otherwise
θ hi time-above threshold cutofffixed to 0.20 (overload if ≥20% of windows exceed L hi )
θ osc oscillation power cutoffset to the q osc -quantile of P osc in training equilibrated blocks; used unchanged on test

Appendix A.3. Sweep Parameters and Ranges

Table A3. Appendix summary of sweep parameters and ranges. G = global fixed/swept; F = fit on training blocks only.
Table A3. Appendix summary of sweep parameters and ranges. G = global fixed/swept; F = fit on training blocks only.
Param.Range/GridSet
ρ [ 0.90 , 0.99 ] F if recovery labels; else G sweep
α [ 0.01 , 0.50 ] G sweep (no test tuning)
κ min [ 0 , 0.3 ] G sweep; see (4)
κ max [ 0.3 , 1 ] G sweep; see (4)
γ [ 0 , 1 ] G sweep; see (5)
τ { 0 , 50 , 100 , 200 , 400 } msgrid; δ = τ / Δ t
r { 1 , 2 , 4 } grid
p { 0 , 0.05 , 0.10 } grid

Appendix A.4. Fixed Constants for Regime Mapping and Time-Scale Conversion

Table A4. Fixed constants for regime mapping and spectral power computation. These values are not tuned on test data and are reported for auditability.
Table A4. Fixed constants for regime mapping and spectral power computation. These values are not tuned on test data and are reported for auditability.
KnobFixed ValueRole
q hi 0.80 Quantile used to set L hi from training high-demand blocks (or effort-proxy high quantile)
q osc 0.90 Quantile used to set θ osc from training equilibrated blocks
N seg 128 windowsSegment length for spectral power; maps to time length 128 Δ t win
B osc Appendix A.4Oscillation band in Hz; converted per dataset via Δ t win
H (finite-horizon gain)10 stepsOptional bound for max H C A 1 B in (A10)
For completeness, the time-scale mapping used throughout is as follows: (i) δ = τ / Δ t win for latency injection; (ii) oscillation-band conversion from Hz to cycles-per-window uses f = ν / Δ t win ; (iii) all regime metrics are computed on the windowed cadence.

Appendix B. Explicit Computation of the Small-Gain Constant ψ

This appendix provides the detailed, auditable upper bounds for the factors in Equation (17), separated by execution mode (CL-Sim vs. CL-Replay). The main text uses only the existence and computability of these bounds.

Appendix B.1. Bound on |∂D/∂Z|

Recall the demand drive definition (Equation (12)):
d k : = m M w m M k , m e ˜ k , m + w z q ˜ k ,
where q ˜ k is a normalized coordination-intensity proxy derived from z k . In the evaluation interface, q ˜ k is constructed by a robust affine normalization on training blocks,
q ˜ k : = q k median ( q · ) IQR ( q · ) + ϵ , q k : = Q ( z k ) ,
with a fixed mapping Q ( · ) chosen once per dataset family (Section 2.8) and then held fixed. We assume Q is Lipschitz on the bounded range of z encountered in training blocks, with constant K Q :
| q ( z 1 ) q ( z 2 ) | K Q | z 1 z 2 | .
Then, using (A2),
| q ˜ z | K Q IQR ( q · ) + ϵ .
Because only the w z q ˜ k term depends on z k , we obtain the conservative bound
ψ ¯ d z : = | d z | w z K Q IQR ( q · ) + ϵ .
In practice, we set K Q = 1 when Q is the identity or a monotone scaling of z (the default in CL-Sim where z is already a normalized intensity proxy), and otherwise we estimate K Q on training blocks by finite differences:
K Q max k train | q k + 1 q k | | z k + 1 z k | + ϵ .
All quantities in (A5) are computed on training blocks only and then fixed.

Appendix B.2. Bound on |∂Z/∂U| in CL-Sim

In CL-Sim, z k is generated by the identified surrogate plant (Equation (20)):
x ^ k + 1 = A x ^ k + B u k + w k , z ^ k = C x ^ k ,
with ( A , B , C ) fit on training blocks only and with bounded residual w k . For a one-step sensitivity, ignoring w k and holding x ^ k fixed,
z ^ k + 1 u k = C B .
Thus, a conservative one-step bound is
ψ ¯ z u ( 1 ) : = C B .
Because the policy can influence z over multiple steps, we also report a finite-horizon bound over H steps (Appendix A fixes H):
ψ ¯ z u ( H ) : = max 1 H C A 1 B .
We then set
ψ ¯ z u : = ψ ¯ z u ( H ) ,
which is computable once ( A , B , C ) are identified on training blocks. Any matrix norm consistent with the vector norm used in the ISS statement is acceptable; in implementation, we use the induced 2-norm.

Appendix B.3. Bound on |∂Z/∂U| in CL-Replay

In CL-Replay, z k is exogenous because recorded traces are not altered by the computed u k (Table 8). Therefore,
z u = 0 ψ ¯ z u = 0 ,
and the small-gain loop product defining ψ ¯ collapses. This is the formal reason why CL-Replay cannot support stability or attractor-elimination claims: the loop is open with respect to u.

Appendix B.4. Bound on |∂U/∂L| from the Schedule

The policy combines gain moderation (Equation (4)) and compliance shaping (Equation (5)). To obtain an auditable bound, we consider the dependence of the policy on the load argument only and treat z-dependent terms as bounded signals (consistent with Assumption 3). Let the schedule be
κ ( L ) = κ min + ( κ max κ min ) σ L 0 L s ,
where σ ( · ) is logistic. The logistic derivative satisfies sup x σ ( x ) = 1 / 4 , hence
| d κ d L | κ max κ min 4 s .
For compliance shaping,
u = u base γ ϕ ( L ) Δ z ,
with ϕ ( · ) [ 0 , 1 ] monotone. If ϕ is chosen as a squashing map with bounded slope, we can take
| d ϕ d L | K ϕ ,
where K ϕ is fixed by design (e.g., K ϕ = 1 / ( 4 s ϕ ) for logistic with slope s ϕ ). If we additionally bound | Δ z | Δ z max on the relevant operating region (estimated on training blocks in CL-Sim, or fixed from normalized scaling conventions), then
| u L | K u κ policy scaling | d κ d L | + γ K ϕ Δ z max ,
where K u κ captures how κ enters the autonomy action magnitude (a known linear scaling in the surrogate controller implementation). Substituting (A14) gives the explicit auditable bound:
ψ ¯ u L : = sup L [ L hi , 1 ] | u L | K u κ κ max κ min 4 s + γ K ϕ Δ z max .
All quantities in (A18) are either sweep parameters ( κ min , κ max , γ , s ), fixed design constants ( K ϕ , K u κ ), or training-only empirical bounds ( Δ z max ).

Appendix B.5. Final Computable Expression for ψ

Combining (A5), (A11), and (A18), we report for CL-Sim:
ψ ¯ CL - Sim w z K Q IQR ( q · ) + ϵ max 1 H C A 1 B K u κ κ max κ min 4 s + γ K ϕ Δ z max .
For CL-Replay, ψ ¯ CL - Replay = 0 by (A12). We emphasize that the CL-Sim bound is intentionally conservative; it is used to justify the directionality of the sufficient-condition inequalities and to support auditability of the design constraints, not to provide a tight stability margin estimate.

Appendix B.6. No-Leakage Commitments

All empirical quantities entering (A19) are computed on training blocks only. In particular: ( A , B , C ) are fit on training blocks; IQR ( q · ) and the normalization used to define q ˜ are computed on training blocks; Δ z max is computed as a robust upper quantile on training blocks and the horizon H is fixed globally (Appendix A). The sweep parameters ( κ min , κ max , γ , s ) are never chosen by optimizing test outcomes; they are only swept over explicit grids, and any reported regime boundaries are computed under the fixed rule in Equation (21) with thresholds fixed prior to test reporting (Appendix A).
Finally, to ensure the bound is aligned with the delay-aware condition (19), the same cadence mapping δ = τ / Δ t win is used when reporting delay-induced variation terms; the time-scale conversion conventions are consolidated in Appendix A.

Appendix C. Consolidated Symbol Glossary

Table A5. Symbol glossary for principal variables used across the manuscript.
Table A5. Symbol glossary for principal variables used across the manuscript.
SymbolTypeControl-Theoretic or Physical Interpretation
L ( t ) stateLatent ergonomic-load state.
L ^ k estimateEstimated load used for feedback scheduling.
k proxyIdentification proxy target for training only.
L hi thresholdHigh-load threshold defining overload set.
L 0 referenceNominal operating point for schedule transition.
d ( t ) inputDemand or exertion drive exciting the load state.
d k inputWindowed demand drive used in discrete-time model.
d ^ k estimateEstimated demand proxy used in consistency loss.
ω ( t ) disturbanceUnmodeled load disturbance and residual variability.
ω k disturbanceDiscrete disturbance in dissipative update.
λ rateRecovery rate in continuous-time dissipation.
ρ factorDiscrete recovery factor, 0 < ρ < 1 .
α gainDemand-to-load coupling gain.
u ( t ) inputAutonomy-side control or adaptation input.
u k inputDiscrete autonomy input produced by policy.
u k base baselineBaseline autonomy action without ergonomics coupling.
κ k gainAdaptation gain scheduled by estimated load.
κ min boundMinimum scheduled gain under high load.
κ max boundMaximum scheduled gain under low load.
γ gainCompliance-shaping strength, acts as damping.
sslopeSchedule sharpness parameter.
σ ( · ) mapSquashing map for bounded schedules and scaling.
ϕ ( · ) mapMonotone load-to-damping map.
Δ z k signalCoordination error or correction signal in shaping.
x k statePlant or coordination subsystem state.
x ^ k stateSurrogate plant state in CL-Sim.
z k outputCoordination observable, task-relevant output.
z ^ k outputSurrogate output used in CL-Sim evaluation.
y k measurementMeasured multistream observables.
ν k disturbancePlant-side disturbance in state update.
ϵ k noiseMeasurement noise corrupting observables.
f ( · ) dynamicsPlant state transition map.
h ( · ) output mapOutput map from state to coordination variable.
g ( · ) sensor mapMeasurement map from coordination to observables.
π ( · ) policyLoad-aware policy mapping L ^ to u.
K L LipschitzPolicy Lipschitz constant w.r.t. load argument.
ε L error boundUniform estimator error bound.
W k windowTime window of raw signals used for features.
S k segmentMultivariate segment inside a window.
Φ ( · ) feature mapWindow-to-feature operator.
φ k featureFeature vector extracted from window.
φ ˜ k normalizedRobust-normalized feature vector.
c k contextTask phase or context label in a window.
TlengthWindow duration for feature extraction.
Δ strideWindow stride controlling update cadence.
Δ t stepSampling interval for discrete-time mapping.
Δ t win cadenceWindow update cadence in experiments.
τ latencyEnd-to-end latency used for impairment injection.
δ stepsDiscrete delay steps, δ = τ / Δ t .
rfactorDownsampling factor used in impairments.
pprobabilityPacket-loss probability in impairment operator.
V ( · ) LyapunovISS Lyapunov function for plant stability.
V k compositeComposite Lyapunov candidate including load.
γ u ( · ) gainISS gain from autonomy input to state.
γ ν ( · ) gainISS gain from plant disturbance to state.
ψ ¯ constantUpper bound on closed-loop sensitivity product.
ψ ¯ δ constantDelay-induced sensitivity inflation bound.
Γ ( δ ) boundVariation accumulation bound over δ steps.
T hi metricFraction of windows above L hi .
P osc metricBand-limited oscillation power proxy.
θ hi cutoffOverload occupancy cutoff.
θ osc cutoffOscillation power cutoff.
r { EQ , OSC , OL } labelRegime label for regime maps.
NcountNumber of windows in a segment.
N seg countSegment length for spectral power estimates.

References

  1. Musić, S.; Hirche, S. Control sharing in human-robot team interaction. Annu. Rev. Control 2017, 44, 342–354. [Google Scholar] [CrossRef]
  2. Hussein, A.; Ghignone, L.; Nguyen, T.; Salimi, N.; Nguyen, H.; Wang, M.; Abbass, H.A. Characterization of indicators for adaptive human-swarm teaming. Front. Robot. AI 2022, 9, 745958. [Google Scholar] [CrossRef]
  3. Zhang, Y.; Wei, Y.; Ye, Z.; Liu, S.; Chen, H.; Yan, Y.; Chen, J. Robust Closed–Open Loop Iterative Learning Control for MIMO Discrete-Time Linear Systems with Dual-Varying Dynamics and Nonrepetitive Uncertainties. Mathematics 2025, 13, 1675. [Google Scholar] [CrossRef]
  4. Pérez-Ibacache, R.; Castro, R.S.; Pimentel, G.A.; Bizzo, B.S. Dynamic output-feedback decentralized control synthesis for integration of distributed energy resources in AC microgrids. IEEE Trans. Smart Grid 2021, 13, 1225–1237. [Google Scholar] [CrossRef]
  5. Fortini, L.; Lorenzini, M.; Kim, W.; De Momi, E.; Ajoudani, A. A framework for real-time and personalisable human ergonomics monitoring. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; IEEE: New York, NY, USA, 2020; pp. 11101–11107. [Google Scholar]
  6. Hwang, S.; Agada, P.; Kiemel, T.; Jeka, J.J. Identification of the unstable human postural control system. Front. Syst. Neurosci. 2016, 10, 22. [Google Scholar] [CrossRef] [PubMed]
  7. Krishnan, A.; Yang, X.; Seth, U.; Jeyachandran, J.M.; Ahn, J.Y.; Gardner, R.; Pedigo, S.F.; Blom-Schieber, A.W.; Banerjee, A.G.; Manohar, K. Data-driven ergonomic risk assessment of complex hand-intensive manufacturing processes. Commun. Eng. 2025, 4, 45. [Google Scholar] [CrossRef]
  8. Yang, Z.; Song, D.; Ning, J.; Wu, Z. A systematic review: Advancing ergonomic posture risk assessment through the integration of computer vision and machine learning techniques. IEEE Access 2024, 12, 180481–180519. [Google Scholar] [CrossRef]
  9. Zheng, S.; Ren, M.; Luo, X.; Zhang, H.; Feng, G. The Third Closed-Loop Control for Compensating Light Power Fluctuations in the Interferometric Fiber-Optic Gyroscope. J. Russ. Laser Res. 2023, 44, 247–255. [Google Scholar] [CrossRef]
  10. Hart, S.G. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, San Francisco, CA, USA, 16-20 October 2006; Sage Publications Sage: Los Angeles, CA, USA, 2006; Volume 50, pp. 904–908. [Google Scholar]
  11. Park, J.H.; Kim, S.H. Design of a Stable Discrete-time Output-feedback Decentralized Controller for Uncertain Continuous-time Large-scale Nonlinear Systems. Int. J. Control Autom. Syst. 2025, 23, 1402–1410. [Google Scholar] [CrossRef]
  12. Chen, P.; Liu, S.; Zhang, D.; Yu, L. Adaptive event-triggered decentralized dynamic output feedback control for load frequency regulation of power systems with communication delays. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 5949–5961. [Google Scholar] [CrossRef]
  13. Dani, A.P.; Salehi, I.; Rotithor, G.; Trombetta, D.; Ravichandar, H. Human-in-the-loop robot control for human-robot collaboration. IEEE Control Syst. Mag. 2024, 40, 29–56. [Google Scholar] [CrossRef]
  14. Teo, G.; Reinerman-Jones, L.; Matthews, G.; Szalma, J.; Jentsch, F.; Hancock, P. Enhancing the effectiveness of human-robot teaming with a closed-loop system. Appl. Ergon. 2018, 67, 91–103. [Google Scholar] [CrossRef]
  15. Bjurling, O.; Arvola, M.; Alfredson, J.; Prytz, E.; Ziemke, T. Trajectories of attention and control in human-machine interactions: The case of swarms in maritime search and rescue. Theor. Issues Ergon. Sci. 2025, 26, 816–837. [Google Scholar] [CrossRef]
  16. Wattearachchi, W.D.; Lakshika, E.; Kasmarik, K.; Barlow, M. A Study on Human-Swarm Interaction: A Framework for Assessing Situation Awareness and Task Performance. arXiv 2025, arXiv:2503.14810. [Google Scholar] [CrossRef]
  17. Kathiravelu, P.; Arnold, M.G.; Vijay, S.; Jagwani, R.; Goyal, P.; Goel, A.K.; Li, N.; Horn, C.; Pan, T.; Kothare, M.V.; et al. Distributed executions with CONTROL-CORE: Integrated development environment (IDE) for cosed-loop neuromodulation control systems. Clust. Comput. 2025, 28, 697. [Google Scholar] [CrossRef]
  18. Kim, S.K.; Lim, S.; Ahn, C.K. Decentralized critical damping position synchronizer for multiservo drives via Feedback-Loop intelligentization approach. IEEE Trans. Ind. Electron. 2024, 72, 1368–1378. [Google Scholar] [CrossRef]
  19. Tchimino, J.; Dideriksen, J.L.; Dosen, S. EMG feedback improves grasping of compliant objects using a myoelectric prosthesis. J. Neuroeng. Rehabil. 2023, 20, 119. [Google Scholar] [CrossRef] [PubMed]
  20. Fam, I.; Soubra, H.; Gamal, N. Human-Swarm Interaction Methods’ Effect on Human Psychophysiology. In Proceedings of the 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 21-23 November 2023; IEEE: New York, NY, USA, 2023; pp. 184–189. [Google Scholar]
  21. Distefano, J.P.; Chowdhury, S.; Esfahani, E. Exploring human-swarm interaction dynamics in cyber-physical systems: A physiological approach. J. Integr. Des. Process Sci. 2024, 27, 200–210. [Google Scholar] [CrossRef]
  22. Menanno, M.; Riccio, C.; Benedetto, V.; Gissi, F.; Savino, M.M.; Troiano, L. An ergonomic risk assessment system based on 3D human pose estimation and collaborative robot. Appl. Sci. 2024, 14, 4823. [Google Scholar] [CrossRef]
  23. Haj Mahmoud, O.; Pontonnier, C.; Dumont, G.; Poli, S.; Multon, F. A neural networks approach to determine factors associated with self-reported discomfort in picking tasks. Hum. Factors 2023, 65, 1381–1393. [Google Scholar] [CrossRef]
  24. Lorenzini, M.; Lagomarsino, M.; Fortini, L.; Gholami, S.; Ajoudani, A. Ergonomic human-robot collaboration in industry: A review. Front. Robot. AI 2023, 9, 813907. [Google Scholar] [CrossRef] [PubMed]
  25. Iodice, F.; De Momi, E.; Ajoudani, A. Intelligent Framework for Human-Robot Collaboration: Dynamic Ergonomics and Adaptive Decision-Making. J. Intell. Robot. Syst. 2025, 112, 5. [Google Scholar] [CrossRef]
  26. Caporaso, T.; Grazioso, S.; Di Gironimo, G. Development of an integrated virtual reality system with wearable sensors for ergonomic evaluation of human–robot cooperative workplaces. Sensors 2022, 22, 2413. [Google Scholar] [CrossRef] [PubMed]
  27. Merlo, E.; Lamon, E.; Fusaro, F.; Lorenzini, M.; Carfì, A.; Mastrogiovanni, F.; Ajoudani, A. An ergonomic role allocation framework for dynamic human–robot collaborative tasks. J. Manuf. Syst. 2023, 67, 111–121. [Google Scholar] [CrossRef]
  28. Cohen, Y.; Biton, A.; Shoval, S. Fusion of computer vision and AI in collaborative robotics: A review and future prospects. Appl. Sci. 2025, 15, 7905. [Google Scholar] [CrossRef]
  29. Zakia, U.; Menon, C. Dataset on Force Myography for Human Robot Interactions, Version 1. Zenodo 2022. [Google Scholar] [CrossRef]
  30. Ebied, A.; Awadallah, A.M.; Abbass, M.A.; El-Sharkawy, Y. Multi-channel Surface EMG Dataset for Fatigue analysis, Version 1. Zenodo 2021. [Google Scholar] [CrossRef]
  31. Iodice, F.; De Momi, E.; Ajoudani, A. HRI30: An Action Recognition Dataset for Industrial Human-Robot Interaction, Version 1.0. Zenodo 2022. [Google Scholar] [CrossRef]
  32. Buś, S.; Kaniuka, J.; Świtlik, D.; Główka, J.; Kozik, R. RoHuCAD: Robots and Humans Collaborative Anomaly Detection, Version 1.0. Zenodo 2024. [Google Scholar] [CrossRef]
  33. Borghi, S.; Zucchi, F.; Prati, E.; Ruo, A.; Villani, V.; Sabattini, L.; Peruzzini, M. SenseCobot, Version 2. Zenodo 2024. [Google Scholar] [CrossRef]
  34. Mezey, D.; Bartashevich, P.; Hasbani, G.E.; Romanczuk, P.; Hamann, H.; Deffner, D.; James, D. Human-swarm Interaction Dataset: Real-Time Human Interaction with Virtual Swarms in Shared Physical Space, Version 1. Zenodo 2025. [Google Scholar] [CrossRef]
  35. Hongn, A.; Bosch, F.; Prado, L.; Bonomini, P. Wearable Device Dataset from Induced Stress and Structured Exercise Sessions, Version 1.0.1. PhysioNet 2025, 101, e215–e220. [Google Scholar] [CrossRef]
  36. Dogan, G.; Patlar Akbulut, F. Stress analysis from physiological data under pressure: WorkStress3D Dataset, Version 11. Mendeley Data 2024. [Google Scholar] [CrossRef]
Figure 1. Synoptic workflow of the proposed framework, clarifying the interaction between datasets, estimator design, and closed-loop evaluation, and indicating how the same interface translates to an industrial deployment with edge inference and bounded adaptation; see also Table 1.
Figure 1. Synoptic workflow of the proposed framework, clarifying the interaction between datasets, estimator design, and closed-loop evaluation, and indicating how the same interface translates to an industrial deployment with edge inference and bounded adaptation; see also Table 1.
Bdcc 10 00074 g001
Table 1. Summary of representative prior approaches and the gap addressed by this work.
Table 1. Summary of representative prior approaches and the gap addressed by this work.
ThemeRepresentative WorksContribution and Gap (Compressed)
Ergonomics monitoring[5,7,8,9]Real-time posture/risk scoring and feedback; typically not linked to a control-relevant state with explicit calibration/latency budgets and auditable closed-loop regime claims.
Physiological workload estimation[6,10,19,29,30]Workload/fatigue inference from wearables and EMG/FMG proxies; usually evaluated as prediction/fit, without tracing calibration, rare-event precision, and delay into stability-relevant adaptation behavior.
HRC interaction modeling and monitoring[13,31,32,33]Action/phase modeling and anomaly-aware collaboration pipelines; often lacks a unified ergonomics-as-state coupling to bounded gain/compliance adaptation under timing and dropout constraints.
Adaptive autonomy/HITL control[2,3,11,12,15,16]Workload/attention-aware assistance and robust/hybrid framings; adaptation is frequently heuristic or model-conditional, with limited auditable linkage from estimator properties (ECE, latency, bounded error) to regime transitions and overload avoidance.
Table 5. Regime labeling thresholds and selection rule (training-only calibration). Thresholds are fixed prior to test reporting to avoid circularity.
Table 5. Regime labeling thresholds and selection rule (training-only calibration). Thresholds are fixed prior to test reporting to avoid circularity.
QuantityDefinitionSelection Rule (Training Only)
L hi overload threshold on [ 0 , 1 ] set to the q hi -quantile of L ^ k during high-demand phases (if labeled) or upper quantile of effort proxy otherwise
θ hi time-above threshold cutofffixed to 0.20 (overload if ≥20% of windows exceed L hi )
θ osc oscillation power cutoffset to the q osc -quantile of P osc in training equilibrated blocks; used unchanged on test
Table 6. Sweep parameters and ranges. G = global fixed/swept; F = fit on training blocks only.
Table 6. Sweep parameters and ranges. G = global fixed/swept; F = fit on training blocks only.
Param.Range/GridSet
ρ [ 0.90 , 0.99 ] F if recovery labels; else G sweep
α [ 0.01 , 0.50 ] G sweep (no test tuning)
κ min [ 0 , 0.3 ] G sweep; see (4)
κ max [ 0.3 , 1 ] G sweep; see (4)
γ [ 0 , 1 ] G sweep; see (5)
τ { 0 , 50 , 100 , 200 , 400 } msgrid; δ = τ / Δ t
r { 1 , 2 , 4 } grid
p { 0 , 0.05 , 0.10 } grid
Table 7. Non-negotiable methodological commitments for closed-loop interpretation, training-only threshold selection, and subject-level validation.
Table 7. Non-negotiable methodological commitments for closed-loop interpretation, training-only threshold selection, and subject-level validation.
IssueNon-Negotiable Methodological Commitment
Closed-loop vs replayStability/attractor claims are tied to CL-Sim only; CL-Replay is reported as offline counterfactual
Threshold circularity L hi , θ hi , θ osc calibrated on training only and fixed on
test (Table 5)
Multi-subject leakageSubject-level splits whenever IDs exist; per-subject aggregation and bootstrap CIs (Equation (22))
Parameter underspecificationExplicit sweep ranges and fit-vs-global rules (Table 6)
Table 8. Execution modes. CL-Replay is offline counterfactual evaluation.
Table 8. Execution modes. CL-Replay is offline counterfactual evaluation.
ModePlantFeedback/Control
CL-Simsurrogate f ^ (trained) L ^ k u k via (4) and (5); u k updates ( x k , z k ) through f ^
CL-Replayrecorded traces (exogenous)compute L ^ k , u k only; z k unchanged; report offline risk/robustness metrics
Table 9. Top-5 and bottom-5 estimator performance for high-load detection at q = 0.90 (CoBeXR windows). Higher is better for AUROC/AUPRC/F1/BAcc; lower is better for ECE/Brier.
Table 9. Top-5 and bottom-5 estimator performance for high-load detection at q = 0.90 (CoBeXR windows). Higher is better for AUROC/AUPRC/F1/BAcc; lower is better for ECE/Brier.
ModelAUROCAUPRCF1BAccECE/Brier
GradientBoostingClassifier0.870.410.490.740.031/0.089
RandomForestClassifier0.850.380.460.720.072/0.112
LogisticRegression0.830.340.430.700.039/0.096
(alt.) Calibrated LR0.820.330.410.690.028/0.091
GradientBoosting (shallow)0.810.310.390.680.044/0.103
Ridge (as classifier proxy)0.740.220.280.610.081/0.125
RF (low trees)0.730.210.260.600.095/0.131
LR (no scaling)0.720.190.240.590.104/0.138
Constant predictor0.500.100.000.500.000/0.100
Weak baseline0.710.180.220.580.119/0.142
Table 10. Regression performance for load _ index _ v 2 (CoBeXR windows). Higher is better for R 2 and Spearman ρ ; lower is better for RMSE.
Table 10. Regression performance for load _ index _ v 2 (CoBeXR windows). Higher is better for R 2 and Spearman ρ ; lower is better for RMSE.
Model R 2 RMSESpearman ρ
GradientBoostingRegressor0.420.190.68
RandomForestRegressor0.380.200.64
Ridge0.310.220.57
Weak baseline0.110.270.39
Table 11. Robustness summary for the best-performing classifier on q = 0.90 under impairments. Higher is better for AUROC/AUPRC; lower is better for ECE.
Table 11. Robustness summary for the best-performing classifier on q = 0.90 under impairments. Higher is better for AUROC/AUPRC; lower is better for ECE.
ImpairmentSettingAUROCAUPRCECEInterpretation
Latency τ 0 ms 0.870.410.031Reference closed-loop estimate timing
Latency τ 300 ms 0.830.360.049Knee region; regime labeling begins to drift
Latency τ 500 ms 0.800.330.061Delay dominates; risk of oscillatory suppression
Noise on L ^ σ L = 0.10 0.840.380.076Rank preserved, calibration degrades markedly
Downsampling × 4 0.830.320.052Rare-event precision degrades; high-load recall drops
Ablationremove effort-proxy0.840.310.058Loss of exertion semantics; under-detect overload windows
Ablationremove context/kinematics0.820.360.045Loses phase structure; more false positives
Table 12. Closed-loop summary (CL-Sim unless noted). Overload incidence is the fraction of windows with L ^ k L hi ; P osc is band-limited power over the fixed oscillation band defined in Appendix A.4; throughput is a normalized task proxy.
Table 12. Closed-loop summary (CL-Sim unless noted). Overload incidence is the fraction of windows with L ^ k L hi ; P osc is band-limited power over the fixed oscillation band defined in Appendix A.4; throughput is a normalized task proxy.
Controller/ModeLatency τ Overload % P osc Throughput
Baseline (CL-Sim) 0 ms 7.80.261.00
Load-aware schedule (CL-Sim) 0 ms 4.10.140.97
Threshold attenuator (CL-Sim) 0 ms 5.00.180.96
Centralized proxy (CL-Sim) 0 ms 3.90.160.92
Baseline (CL-Sim) 500 ms 9.60.201.00
Load-aware schedule (CL-Sim) 500 ms 7.90.220.96
Baseline (CL-Replay)6.90.191.00
Load-aware (CL-Replay)5.80.161.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gerolimos, N.; Alevizos, V.; Priniotakis, G. Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams. Big Data Cogn. Comput. 2026, 10, 74. https://doi.org/10.3390/bdcc10030074

AMA Style

Gerolimos N, Alevizos V, Priniotakis G. Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams. Big Data and Cognitive Computing. 2026; 10(3):74. https://doi.org/10.3390/bdcc10030074

Chicago/Turabian Style

Gerolimos, Nikitas, Vasileios Alevizos, and Georgios Priniotakis. 2026. "Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams" Big Data and Cognitive Computing 10, no. 3: 74. https://doi.org/10.3390/bdcc10030074

APA Style

Gerolimos, N., Alevizos, V., & Priniotakis, G. (2026). Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams. Big Data and Cognitive Computing, 10(3), 74. https://doi.org/10.3390/bdcc10030074

Article Metrics

Back to TopTop