1. Introduction
Consolidated bioprocessing (CBP) is a process-intensified route for producing ethanol and other biochemical products through the simultaneous biological integration of enzyme production, biomass deconstruction, and fermentation [
1,
2]. Its appeal lies in the potential to reduce dependence on externally supplied cellulases and to simplify the biomass-to-products chain compared with separated enzyme production, hydrolysis, and fermentation schemes. However, CBP remains technically challenging because its performance depends on host engineering, cellulosome biosynthesis, microbial consortia design, enzyme delivery, substrate accessibility, and feedstock deconstruction. Recent studies have therefore focused on CBP strains and genetic manipulation, synthetic cellulosomes and extracellular polymeric substances, microbial communities and co-cultures, enzyme delivery strategies, substrate accessibility, and process intensification [
3,
4,
5,
6,
7]. From a monitoring perspective, CBP is not only a biochemical conversion process whose product yield should be maximized; it is also a nonlinear, partially observed dynamic system in which growth, enzyme synthesis, insoluble-substrate deconstruction, sugar release, sugar consumption, fermentation, and inhibition-related effects evolve on different time scales.
A central difficulty in developing digital-twin-assisted CBP is the limited availability of informative online measurements. Ethanol concentration is often one of the most accessible measurements, but it is a delayed product signal and cannot directly reveal whether poor batch performance originates from weak biomass growth, insufficient enzyme production, poor substrate accessibility, slow hydrolysis, sugar limitation, or product inhibition. In contrast, states that are more useful for process decision-making, such as living biomass concentration, active enzyme concentration, residual insoluble substrate, and soluble sugar concentration, are difficult to measure directly, continuously, or non-invasively. Similar measurement limitations have motivated the use of hybrid models, soft sensors, and online state-estimation methods in bioprocess monitoring and control [
8,
9,
10,
11]. Recent studies have also shown the value of soft-sensor recalibration, metabolic-heat-based soft sensing, spectroscopic monitoring, real-time biomass estimation, and Kalman-filter-based state–parameter estimation for improving the information content of bioprocess measurements [
12,
13,
14,
15,
16,
17,
18]. In parallel, digital bioprocessing and digital chemical engineering studies emphasize accurate process measurements, model integration, predictive modeling, enabling digital technologies, and the progressive introduction of process analytical technology tools during process development [
19,
20,
21,
22,
23,
24,
25]. For nonlinear systems, unscented Kalman filtering provides a convenient way to propagate uncertainty through nonlinear dynamics without local linearization [
26,
27]; therefore, the quality of soft sensing depends strongly on the informativeness of the available measurements.
The problem of choosing measurements for CBP cannot be reduced to the selection of a state-estimation algorithm. It is also necessary to determine whether the available measurements contain enough information for state reconstruction and parameter learning. State observability describes the extent to which hidden process states can be reconstructed from measured outputs, whereas parameter identifiability describes the extent to which model parameters can be estimated from available data. These concepts are especially important in partially observed biochemical systems, where unmeasured states and uncertain parameters can compensate for one another and produce similar measured trajectories [
28,
29]. Fisher-information- and sensitivity-based metrics provide practical tools for quantifying measurement informativeness, observability, and identifiability. This issue is particularly important for CBP because candidate measurements differ substantially in measurement burden, cost, delay, and online implementation difficulty. For example, ethanol, soluble sugar, biomass proxies, enzyme-activity proxies, and residual-substrate proxies are not equally easy to obtain; therefore, their value for digital-twin deployment should be evaluated before laboratory or pilot-scale implementation. Recent literature-derived CBP modeling has also shown that product prediction is strongly affected by heterogeneous feedstock–pretreatment–microbial descriptors, sparse product reporting, and missing-label structure [
30].
Although CBP has been widely studied from biological, biochemical, and process-intensification perspectives [
3,
6,
7,
31,
32], systematic evaluation of measurement sets for CBP digital twins remains limited. Existing soft-sensing and bioprocess-control literature demonstrates that software sensors and nonlinear state-estimation methods can improve process monitoring [
8,
9,
10,
11]. More recent work has further highlighted the importance of soft-sensor generalizability, sensor recalibration, biomass monitoring, spectroscopic data streams, and joint state–parameter estimation for reliable model-assisted monitoring [
12,
13,
14,
16,
17,
18]. However, there is still no systematic comparison of CBP measurement packages with respect to state observability, parameter identifiability, soft-sensor reconstruction performance, measurement burden, and robustness to measurement uncertainty and alternative scoring priorities. In particular, it remains unclear whether ethanol-only sensing is sufficient for constructing a state-aware CBP digital twin, whether ethanol and sugar measurements provide an adequate minimal package, or whether additional biomass, enzyme, and substrate proxies are required.
To address this gap, this study proposes a computational framework for evaluating CBP measurement packages according to state observability, parameter identifiability, soft-sensor reconstruction performance, measurement burden, and robustness to practical measurement imperfections. A compact hybrid gray-box CBP model was used as a virtual plant to generate finite-difference output sensitivities, Fisher-information-based observability and identifiability metrics, parameter-correlation diagnostics, eigenvalue spectra, condition-number diagnostics, and approximate uncertainty measures. Rather than restricting the analysis to a small preselected list, the workflow evaluated all ethanol-mandatory combinations of the five modeled measurement channels: ethanol, soluble sugar, biomass proxy, enzyme-activity proxy, and residual-substrate proxy. These candidate sensor sets were then tested using a Monte Carlo unscented Kalman filter reconstruction experiment under model–plant mismatch and measurement noise conditions, with common plant-mismatch and initial-estimate realizations paired across sensor sets within each replicate. Additional analyses assessed the sensitivity of the sensor-set ranking to uniform and sensor-specific noise changes, alternative operating trajectories, missing measurements, assay delay, systematic measurement bias, alternative scoring weights, and alternative sensor-specific measurement-burden scenarios.
The contribution of this paper is threefold. First, it provides a pre-experimental computational pipeline for ranking CBP measurement candidates before wet-lab or pilot-plant implementation while making the candidate-set dependence of the ranking explicit. Second, it combines state observability, parameter identifiability, and nonlinear UKF reconstruction performance rather than relying only on endpoint prediction or product monitoring. Third, it evaluates whether the resulting ranking remains defensible under common measurement and implementation stress conditions, including sensor-specific noise, missingness, delay, bias, operating-trajectory variation, and measurement-burden assumptions. The results are intended to support digital-twin readiness assessment and experimental planning for CBP, not to claim experimental validation of a specific organism, sensor platform, or pilot-scale process.
3. Soft-Sensor Evaluation, Sensor Ranking, and Robustness Assessment
3.1. Soft-Sensor Reconstruction Test
After the observability and identifiability analyses, each candidate sensor set was evaluated for nonlinear soft-sensing reconstruction under model–plant mismatch, initial-state uncertainty, and measurement noise conditions. An unscented Kalman filter (UKF) was used because it propagates mean and covariance information through nonlinear and phase-dependent dynamics without local linearization, which is important when some CBP states are only partially measured. Recent bioprocess monitoring studies emphasize that soft sensors, hybrid models, and model-based state-estimation methods are essential for digital bioprocessing because key physiological states are often unavailable from direct online measurements [
11,
40,
41,
43]. Related work on sensor-assisted bioprocess monitoring has also demonstrated the value of metabolic-heat-based soft sensing, spectroscopic monitoring, biomass estimation, soft-sensor recalibration, and joint state–parameter estimation for improving process-state reconstruction [
12,
13,
14,
15,
16,
17,
18].
For the five-state CBP model, the UKF state estimate and covariance matrix at time
are denoted by
and
, respectively. The sigma points were generated as
with
and
The UKF parameters were
,
, and
. Each sigma point was propagated over one plant step using the same fourth-order Runge–Kutta integration scheme as the virtual plant:
where
denotes the CBP model integrated over one internal time step. The predicted mean and covariance were calculated as
where
Q is the process-noise covariance matrix, and
and
are the conventional UKF mean and covariance weights.
At sampling instants, the measurement equation for sensor set
was
where
is the diagonal measurement-noise covariance matrix for the channels included in
. The predicted measurement sigma points and predicted measurement mean were calculated as
The innovation covariance matrix and state–measurement cross-covariance matrix were calculated as
The Kalman gain and measurement-update step were computed as
The pseudoinverse was used to improve numerical stability when innovation covariance matrices were close to singular.
Estimation accuracy was evaluated using a Monte Carlo simulation experiment. For each of the 16 ethanol-mandatory sensor-set configurations, simulation replicates were performed. In each replicate, multiplicative plant–model mismatch was imposed on the growth, enzyme-yield, hydrolysis-capacity, ethanol-yield, decay, inhibition, and feedstock-accessibility factors. The UKF uses the nominal model and therefore does not have access to the replicate-specific plant perturbation. Initial-state uncertainty was introduced by perturbing the initial state used by the estimator relative to the true plant initial state while enforcing physical nonnegativity bounds.
To ensure a fair paired comparison among sensor sets, the Monte Carlo design uses common plant and estimator realizations across all sensor packages. For a given replicate r, the same plant-parameter mismatch and the same initial-estimate perturbation were used for every candidate sensor set. Thus, differences in reconstruction error between two sensor sets are attributable to the measurement package rather than to different simulated plants. Measurement noise was generated consistently by channel: for each replicate, time point, and measurement channel, a channel-specific random-noise stream was used, so sensor sets sharing a channel received the same noise realization for that channel. Additional sensors introduced additional channel-specific noise streams without changing the underlying plant or initial-condition realization. This common-random-number design provides paired replicate-wise RMSE differences for the statistical comparisons.
For replicate
r, the reconstruction error of state
j is defined as
where
is the number of simulated time points. The latent-state RMSE was computed across the four non-product states:
Ethanol is excluded from this latent-state average because ethanol is measured in every ethanol-mandatory candidate set and therefore does not represent a hidden-state reconstruction challenge. Additional reported statistics include the final absolute estimation error:
the mean absolute error, and the final covariance trace:
Each non-baseline sensor set was compared with ethanol-only monitoring using the paired replicate-specific latent-state RMSE differences:
A positive value of
indicates that sensor set
reduces latent-state reconstruction error relative to ethanol-only monitoring for the same replicate. Because the candidate space contains 16 ethanol-mandatory packages, there are 15 ethanol-only contrasts. The paired Wilcoxon signed-rank test was used as a secondary nonparametric comparison [
44]. Bootstrap confidence intervals were also computed for the paired RMSE reduction, and false-discovery-rate-adjusted
p-values are reported to account for the multiple ethanol-only contrasts. These statistical tests are not used to define the sensor ranking; instead, ranking is based on the combined observability, identifiability, UKF reconstruction, and measurement-burden scoring framework described below.
3.2. Scoring Sensor Values and Rankings
The final ranking step combines four criteria: state observability, parameter identifiability, UKF reconstruction accuracy, and measurement burden. Because these quantities have different units and numerical ranges, each metric is normalized to the interval
across the evaluated ethanol-mandatory candidate set. For a metric
for which larger values are desirable, the normalized score is defined as
For a metric for which smaller values are desirable, the normalized score is defined as
If the metric range is zero, all normalized scores for that metric are set to 0.5. This min–max normalization places information value, reconstruction accuracy, and measurement burden on a common scale. However, because min–max normalization is candidate-list-dependent, the ranking should be interpreted as a ranking within the explicitly evaluated candidate space rather than as an absolute sensor value. The primary analysis therefore evaluated all 16 ethanol-mandatory combinations, and additional candidate-list, Pareto, and measurement-burden sensitivity outputs were generated to assess whether the recommendation depends on the set of candidates included in the comparison.
To score state observability, both total information volume and the weakest resolved active state direction were used:
where
is the minimum eigenvalue of
above the numerical threshold defined in
Section 2.3. Parameter identifiability is scored in the same way:
The pseudodeterminant term rewards total active information volume, whereas the minimum-eigenvalue term penalizes sensor sets that leave at least one active state or parameter direction weakly informed. This avoids assigning a high score to a configuration that performs well only in a small number of dominant directions.
The UKF reconstruction score is derived from the Monte Carlo mean latent-state RMSE:
where
is the mean latent-state RMSE across the 100 paired Monte Carlo replicates. The measurement-burden score is defined as
where
is the total dimensionless measurement-burden index of sensor set
.
The primary aggregate sensor-value score is then calculated as
The identifiability term is assigned a slightly larger weight than the observability term because the envisioned digital-twin use case includes both state reconstruction and model learning through parameter refinement. The burden term is included to discourage automatically selecting the most measurement-intensive package when a reduced package gives comparable information and reconstruction value.
A secondary value-per-burden diagnostic is also calculated as
This quantity is not used as the primary ranking criterion. Instead, it is used to interpret trade-offs between information gain and implementation burden. In addition, Pareto screening was performed using information, reconstruction accuracy, and measurement burden so that candidate packages can be identified as dominated or non-dominated. A sensor set is considered dominated if another evaluated package provides no worse information and reconstruction performance while having no greater burden. These diagnostics help distinguish the best aggregate package from lower-burden alternatives and from full-proxy monitoring, which represents the upper measurement-completeness benchmark.
3.3. Noise, Operating-Trajectory, and Scoring Robustness Analyses
Robustness analyses were performed to assess whether the sensor-set hierarchy depends strongly on measurement-quality assumptions, operating trajectory, measurement imperfections, scoring weights, or measurement-burden assumptions. This step is important because laboratories may differ in sensor calibration, assay availability, online implementation difficulty, and the relative priority placed on observability, identifiability, reconstruction accuracy, and burden.
First, a uniform measurement-noise sensitivity analysis was performed using three noise multipliers as
In this analysis, all nominal sensor standard deviations are scaled as
The weighted sensitivity matrices, Fisher-information metrics, and aggregate scores were then recomputed. Because Fisher-information matrices depend on the inverse measurement variance, this analysis tested whether the ranking was preserved when all measurements were assumed to be uniformly more accurate or less accurate.
Second, an independent sensor-specific noise analysis was performed so that sensor channels could improve or degrade independently. For each scenario, the standard deviation of each sensor
was multiplied by an independently sampled factor as
A total of 200 independent sensor-noise scenarios were evaluated. For each scenario, the observability and identifiability matrices were reweighted, the aggregate score was recomputed across the 16 ethanol-mandatory candidates, and the rank distribution was recorded. This analysis tested whether the ranking was stable when one sensor channel became noisier or more accurate relative to the others.
Third, the effect of operating trajectory was examined because sensitivity-based observability and identifiability metrics are trajectory-dependent. In addition to the nominal temperature–pH schedule, four alternative feasible schedules were tested: a milder excitation profile, an extended hydrolysis profile, an earlier fermentation profile, and a shifted feasible profile in which temperature–pH levels were moved away from the nominal values while keeping all process phases active. For each trajectory, the observability, identifiability, scoring, and ranking calculations were repeated for all candidate sensor sets and sampling intervals. The resulting rankings were compared with the nominal ranking using rank-correlation and maximum-rank-shift diagnostics.
Fourth, full UKF stress tests were performed for practical measurement imperfections. Unlike an information-only approximation, these tests reran the complete reconstruction workflow under stressed measurement conditions and then recomputed the corresponding ranking. Random missingness is represented by dropping measurement updates according to specified channel-availability probabilities. Two missingness cases were considered: 20% missing observations for all sensors and higher missingness for biomass, enzyme, and substrate proxy channels. Assay delay was tested by delaying measurement availability by before UKF correction. Systematic measurement bias was injected directly as an additive offset in the affected measurement channels rather than being treated only as zero-mean variance. Two bias cases are considered: a moderate all-sensor bias of and a proxy-bias case in which biomass, enzyme, and substrate proxy channels carry larger bias than ethanol and sugar measurements.
Fifth, the aggregate score was recalculated using alternative weighting schemes. For each weighting scheme,
and
The following weight vectors were evaluated as
These cases represent balanced performance, observation-oriented design, identification-oriented design, reconstruction-oriented design, burden-sensitive preference, and strongly burden-averse preference.
Finally, alternative sensor-specific measurement-burden scenarios were tested. These scenarios changed the channel-specific burden indices to represent different practical workflows, such as a spectroscopy-assisted workflow with lower burden for calibrated optical measurements, an offline-assay workflow with higher burden for enzyme and residual-substrate measurements, and a solids-intensive workflow with elevated burden for residual insoluble-substrate monitoring. The aggregate rankings were recomputed in each workflow-specific burden scenario. This analysis separated the effect of changing score weights from the effect of changing the assumed practical cost of individual sensor channels.
Together, these robustness analyses tested whether a sensor package remained attractive when measurement noise, operating policy, data missingness, assay delay, systematic bias, scoring priorities, and measurement-burden assumptions were varied. A package that maintains a high rank across these cases is more defensible for pre-experimental CBP digital-twin planning, whereas a package preferred only in one weighting or workflow scenario should be interpreted as objective specific rather than universally optimal.
The workflow used for pre-ranking CBP sensor packages before detailed soft-sensor evaluation and digital-twin deployment is shown in
Figure 1.
3.4. Computational Reproducibility
All simulations, state-estimation routines, sensitivity analyses, statistical comparisons, tables, and figures were implemented in Python 3.13.5. The final production run was executed with the base random seed 42. The production run used the ethanol-mandatory candidate-set mode, giving 16 candidate sensor packages, and used Monte Carlo replicates for each candidate sensor set.
The hybrid CBP virtual plant was simulated using a fixed-step fourth-order Runge–Kutta scheme. The same numerical integration approach was used for nominal trajectory simulation, finite-difference sensitivity analysis, and UKF prediction. The production run used GPU acceleration for vectorized finite-difference sensitivity batches when available, with automatic CPU fallback. The final run used an NVIDIA GeForce RTX 2070 for the GPU-enabled sensitivity calculations. The model equations, sensor definitions, observability and identifiability metrics, UKF update equations, scoring procedure, and robustness-test definitions are provided in
Section 2 and
Section 3. Additional UKF Monte Carlo settings required to reproduce the soft-sensing RMSE values and pairwise statistical comparisons are summarized in
Table 4.
The numerical values in this study were treated as computational design assumptions for comparing candidate sensor packages, not as platform-calibrated experimental constants. The assumptions are consistent with the use of Fisher-information-based experimental design, nonlinear state estimation, and soft-sensing analysis in partially observed bioprocess systems [
9,
11,
27,
28,
37,
38]. The main assumptions are summarized in
Table 5.
4. Results and Discussion
4.1. Nominal CBP Trajectory with the Excitation Schedule
The nominal CBP trajectory with the temperature–pH excitation profile showed the expected phase-dependent behavior. Biomass and enzyme activity increased mainly during the early phase, hydrolysis increased soluble sugar during the intermediate phase, and ethanol accumulation became dominant during the later phase, as shown in
Figure 2. This behavior is consistent with CBP as a coupled process involving growth, enzyme production, substrate deconstruction, sugar release, and product formation on different time scales [
6,
7,
31,
32]. The delayed ethanol response also illustrates why ethanol-only monitoring is insufficient for diagnosing earlier causes of poor conversion, such as weak growth, insufficient enzyme activity, limited hydrolysis, or sugar limitation.
4.2. State-Observability Enhancement with Increasingly Informative Sensors
State-observability information increased substantially when additional measurements were added to ethanol-only monitoring. Across the 16 ethanol-mandatory candidate packages, ethanol-only monitoring provided the lowest state-information content, as shown in
Figure 3. At a
sampling interval, the state-observability log-pseudodeterminant increased from 4.18 with ethanol-only monitoring to 8.56 after soluble sugar was added. Full-proxy monitoring gave the largest state-observability information, with values of 16.42, 15.06, and 13.81 at sampling intervals of 6, 12, and
, respectively.
The all-combination analysis also identified strong reduced four-channel packages. The ethanol–sugar–biomass–substrate package reached state-observability log-pseudodeterminants of 15.12, 13.76, and 12.51 at 6, 12, and , respectively, while the ethanol–sugar–enzyme–substrate package gave very similar values of 14.94, 13.65, and 12.51. These results show that residual-substrate information can provide a major state-observability gain when combined with ethanol, sugar, and a biological or enzymatic proxy. The previously emphasized ethanol–sugar–biomass–enzyme package also remained informative, with values of 14.39, 13.30, and 12.31, but it no longer represented the strongest reduced observability option once all ethanol-mandatory combinations were included.
Overall, the observability analysis confirms that product-only sensing is weak for state-aware CBP digital twins. Intermediate and latent-state proxy measurements provide much stronger information about the initial-state directions that drive the batch trajectory. This finding agrees with broader bioprocess-monitoring experience, where endpoint or product signals alone are often insufficient for reconstructing hidden physiological states, whereas intermediate measurements and proxy variables can substantially improve soft-sensor performance [
8,
9,
10,
11]. Within the tested candidate set, full-proxy monitoring remained the maximum state-observability benchmark, while ethanol–sugar–biomass–substrate and ethanol–sugar–enzyme–substrate provided strong reduced alternatives.
4.3. Parameter-Identifiability Improvement with Biomass and Enzyme Sensors
The parameter-identifiability results were more nuanced than the state-observability results because different metrics emphasize different properties of the Fisher information matrix. For the active log-pseudodeterminant criterion, the ethanol–sugar–biomass–enzyme package gave the strongest parameter-identifiability performance among the evaluated packages, as shown in
Figure 4. Its log-pseudodeterminants were 10.82, 9.06, and 6.67 at sampling intervals of 6, 12, and
, respectively. The ethanol–biomass–enzyme–substrate package was close behind, with values of 10.68, 8.93, and 6.59. These results show that biomass and enzyme proxies are especially informative for separating growth, enzyme-production, and hydrolysis-related parameter effects.
For comparison, full-proxy monitoring gave active parameter log-pseudodeterminants of 8.69, 6.63, and 3.94 at 6, 12, and , respectively. The lower active pseudodeterminant for the full-proxy set does not mean that the full-proxy set is less informative in an absolute sense. Rather, it reflects the behavior of the active pseudodeterminant when additional weak eigenvalue directions are retained. The pseudodeterminant sums only eigenvalues above the numerical threshold, so it can favor a reduced package whose information is concentrated in fewer active directions.
The numerical-rank results clarify this interpretation, as shown in
Figure 5. Although the ethanol–sugar–biomass–enzyme package had the largest active-information volume, its parameter-information matrix had numerical rank 6 in the tested sampling cases. In contrast, full-proxy monitoring retained full numerical rank 7 across the tested sampling intervals. Therefore, the ethanol–sugar–biomass–enzyme package is best viewed as the strongest active-information package for parameter learning, whereas full-proxy monitoring provides the most complete coverage of all seven parameter directions.
The eigenvalue spectra further illustrate the distinction between active information volume and full-dimensional coverage, as shown in
Figure 6. In the
case, the ethanol–sugar–biomass–enzyme package produced a larger active pseudodeterminant than full-proxy monitoring, but the full-proxy set preserved an additional weak parameter direction and gave the larger fixed-dimension regularized determinant. The fixed-dimension regularized log determinant was 6.94 for full-proxy monitoring and 6.36 for the ethanol–sugar–biomass–enzyme package. Thus, full-proxy monitoring remains the most complete parameter-information configuration, while ethanol–sugar–biomass–enzyme is the strongest reduced package for the active-information criterion.
The difference between observability and identifiability is therefore important. A sensor configuration that is strong for state reconstruction is not necessarily the same configuration that is most efficient for parameter identification. Identifiability depends on whether parameter perturbations produce sufficiently distinct output responses. In the present model, biomass and enzyme measurements complement ethanol and sugar measurements by helping to separate growth, enzyme-yield, and hydrolysis effects from ethanol-yield, inhibition, and feedstock-accessibility effects. Residual-substrate information, by contrast, was especially valuable for state observability and the aggregate ranking, but it did not replace the parameter-learning value of the enzyme proxy.
The inclusion of biomass and enzyme surrogates lowered the uncertainty of parameters associated with growth, enzyme production, and hydrolysis, as shown in
Figure 7. However, strong correlations were still observed for selected parameter pairs, especially growth and decay, ethanol yield and inhibition, and hydrolysis capacity and feedstock accessibility, as shown in
Figure 8. These correlations show that some biological mechanisms can still produce similar measured trajectories even when additional proxy measurements are available. Therefore, Fisher-information metrics, eigenvalue spectra, fixed-dimension regularized determinants, and parameter-correlation diagnostics should be used together when judging practical identifiability and selecting sensor packages before experimental implementation [
28,
37,
38].
4.4. Impact of the Sensor Set on UKF Reconstruction Quality
More informative sensor sets improved latent-state reconstruction under model–plant mismatch conditions. Ethanol-only monitoring gave the weakest reconstruction performance, with a mean latent-state RMSE of 1.1899. Adding soluble sugar alone reduced the mean latent-state RMSE only slightly to 1.1398, confirming that a minimal product–sugar package is still insufficient for reconstructing hidden biomass, enzyme, and residual-substrate dynamics. In contrast, packages that included residual-substrate measurements strongly reduced substrate RMSE, while packages that included biomass and enzyme proxies improved the corresponding biological and enzymatic state estimates. These trends are consistent with the observability results and with previous bioprocess soft-sensing studies, where estimator performance depends strongly on whether the available measurements excite the dominant latent-state directions [
8,
9,
10,
11].
Among all 16 ethanol-mandatory packages, full-proxy monitoring gave the lowest mean latent-state RMSE, 0.3756. The ethanol–biomass–enzyme–substrate package was close behind with 0.3843, followed by the top aggregate-scoring ethanol–sugar–biomass–substrate package with 0.4121. Thus, full-proxy monitoring remained the best reconstruction benchmark, but several four-channel packages achieved similar UKF accuracy with lower measurement burden. The Monte Carlo distributions of latent-state RMSE and representative state-wise errors are shown in
Figure 9 and
Table 6.
Paired Wilcoxon tests were used to compare each non-baseline sensor set with ethanol-only monitoring because the replicate-wise RMSE differences were not assumed to be normally distributed [
44]. The revised Monte Carlo design used common plant mismatch and initial-estimate perturbations across all sensor sets, so the replicate-wise RMSE differences were paired observations. In addition to the raw test statistics, bootstrap confidence intervals were calculated for mean RMSE reductions, and the Wilcoxon
p-values were adjusted across the 15 ethanol-only contrasts.
All non-baseline sensor sets reduced mean latent-state RMSE relative to ethanol-only monitoring. The largest reconstruction improvement was obtained with full-proxy monitoring, which reduced mean latent-state RMSE from 1.1899 to 0.3756. The corresponding mean reduction was 0.8143, with a bootstrap 95% interval of 0.7398–0.8894. The ethanol–biomass–enzyme–substrate package provided a nearly equivalent reconstruction reduction of 0.8056, while the ethanol–sugar–biomass–substrate package reduced RMSE by 0.7778. These results show that residual-substrate information is especially valuable for UKF state reconstruction when combined with biomass or enzyme proxies. The paired comparison results are summarized in
Table 7.
4.5. Recommended Sensor-Set Ranking and Measurement-Burden Trade-Off
The aggregate sensor-value ranking changed when all 16 ethanol-mandatory combinations were evaluated instead of only the original seven preselected packages. The ethanol–sugar–biomass–substrate package achieved the highest overall score, followed by full-proxy monitoring and the ethanol–biomass–enzyme–substrate package, as shown in
Table 8. Ethanol-only monitoring remained the least effective option, confirming that product-only sensing is inadequate when state observability, parameter identifiability, nonlinear reconstruction accuracy, and measurement burden are considered together.
This ordering clarifies the trade-off among information value, reconstruction accuracy, and measurement burden. Full-proxy monitoring gave the best UKF reconstruction accuracy and the most complete measurement coverage, but it also had the highest burden index. The ethanol–sugar–biomass–substrate package ranked first overall because it combined strong state observability, high UKF reconstruction accuracy, competitive identifiability, and lower burden than the full-proxy package. The ethanol–biomass–enzyme–substrate package gave nearly full-proxy reconstruction performance and a strong parameter-identifiability score, but its higher burden and lack of a sugar measurement placed it below the top-ranked ethanol–sugar–biomass–substrate package in the aggregate ranking.
The previously emphasized ethanol–sugar–biomass–enzyme package remained important for parameter learning and ranked sixth overall, but it was no longer the best reduced aggregate package once residual-substrate-containing combinations were included. Therefore, the practical recommendation is conditional on the design objective: full-proxy monitoring is preferred when maximum reconstruction and completeness are required; ethanol–sugar–biomass–substrate is preferred for the primary aggregate score; and ethanol–sugar–biomass–enzyme remains a strong reduced option when parameter identifiability is prioritized. This supports the broader digital bioprocessing view that process analytical measurements, state-estimation methods, and uncertainty-aware decision support should be integrated before closed-loop digital-twin deployment [
11,
19,
20,
24].
4.6. Robustness to Measurement Noise, Operating Trajectories, Measurement Imperfections, and Scoring Weights
The robustness analyses showed that the main recommendation was generally stable with changes in measurement quality, operating trajectory, measurement imperfections, scoring weights, and sensor-specific burden assumptions. With uniform measurement-noise scaling, the ethanol–sugar–biomass–substrate package remained top-ranked in the low-noise, nominal-noise, and high-noise cases, as shown in
Figure 10 and
Table 9. Spearman rank correlations relative to the nominal-noise ranking were close to one, and the maximum rank shift was at most two. This indicates that the primary ranking was not an artifact of a single assumed global measurement-noise level. Such robustness is important because sensor quality, assay uncertainty, and data availability can differ substantially across laboratories and development stages [
11,
19,
24].
When individual sensor-noise levels were varied independently, the competition between the top reduced package and full-proxy monitoring became clearer, as shown in
Figure 11. Across 200 sensor-specific noise scenarios, full-proxy monitoring had the best mean rank, 1.57, and was ranked first in 99 scenarios. The ethanol–sugar–biomass–substrate package had a mean rank of 1.83 and was ranked first in 96 scenarios. The ethanol–biomass–enzyme–substrate package ranked first in five scenarios. Thus, the full-proxy package is slightly more robust when individual sensor-noise assumptions vary, whereas ethanol–sugar–biomass–substrate remains the strongest lower-burden aggregate recommendation.
The operating-trajectory analysis confirmed that the ranking was not driven only by the nominal temperature–pH schedule, as shown in
Figure 12 and
Table 10. Full-proxy monitoring provided the highest state-observability score for all tested trajectories. The top aggregate package was ethanol–sugar–biomass–substrate with the nominal and hydrolysis-extended trajectories, while full-proxy monitoring became the top aggregate package for the mild, fermentation-early, and shifted-feasible trajectories. The Spearman correlation relative to the nominal trajectory ranged from 0.9647 to 1.0000, with a maximum rank shift of three. Therefore, the reduced-set recommendation should be interpreted in relation to the operating regime, whereas full-proxy monitoring is the most trajectory-robust information-complete configuration.
Full UKF stress tests were then used to examine practical measurement imperfections, including missing observations, assay delay, and systematic measurement bias. In these tests, the complete UKF reconstruction and ranking workflow was rerun under each stress condition. The ethanol–sugar–biomass–substrate package remained top-ranked in all stress scenarios, as shown in
Figure 13 and
Table 11. The largest degradation in the top-package reconstruction error occurred in the
assay-delay case, where the mean latent-state RMSE of the top package increased to 0.5953. Systematic bias was injected as an additive measurement offset rather than treated only as zero-mean variance. These results show that the main aggregate recommendation remained stable when measurements were missing, delayed, or biased.
The weight-sensitivity analysis showed that the preferred package was also stable across most scoring priorities, as shown in
Figure 14 and
Table 12. The ethanol–sugar–biomass–substrate package remained top-ranked with the primary, equal-weight, identifiability-focused, reconstruction-focused, burden-sensitive, and burden-averse formulations. Full-proxy monitoring became top-ranked only when the observability term was given dominant weight. This confirms that the top-ranked reduced package is not merely a consequence of a single arbitrary weight choice, although full-proxy monitoring remains the preferred option when the main objective is maximum state-observability coverage.
Finally, alternative sensor-specific measurement-burden scenarios were tested to distinguish weight sensitivity from changes in the assumed practical workflow. The ethanol–sugar–biomass–substrate package remained top-ranked in all tested burden workflows, as summarized in
Table 13. This suggests that the primary recommendation is not solely caused by the nominal burden values, although the relative ranks of nearby packages still changed when biomass, enzyme, spectroscopy, or solids-measurement assumptions were altered.
4.7. Implications for Practical CBP Digital-Twin Development
Within the tested five-state virtual-plant benchmark, the results support a hierarchical approach to measurement selection for CBP digital-twin development. Ethanol-only monitoring was consistently the weakest configuration because ethanol is a delayed product signal and cannot indicate whether limited batch performance originates from biomass limitation, enzyme insufficiency, substrate scarcity, slow hydrolysis, sugar accumulation, or product inhibition. This interpretation is consistent with bioprocess soft-sensing studies showing that delayed quality indicators and hidden physiological states can limit real-time monitoring and control [
8,
9,
10,
11,
42].
Adding soluble sugar provided a small, low-burden improvement because sugar links substrate hydrolysis to ethanol formation. However, the ethanol–sugar package should be viewed as a minimal measurement configuration rather than a complete digital-twin sensor package. It does not directly capture biomass growth, enzyme activity, or residual substrate availability. This agrees with process analytical technology and digital-bioprocessing approaches, where intermediate measurements are informative but often need to be combined with soft sensors, model-based estimation, and digital-twin architectures [
11,
19,
22,
24,
45,
46].
The all-combination analysis changed the practical recommendation relative to the original restricted seven-package comparison. Across all 16 ethanol-mandatory combinations, ethanol–sugar–biomass–substrate achieved the highest primary aggregate score. This package combines a product signal, a fermentable-intermediate signal, a biological-state proxy, and a hydrolysis or solids-state proxy. It therefore provides strong state observability and UKF reconstruction while avoiding the highest burden of full-proxy monitoring. Full-proxy monitoring remains preferred when maximum reconstruction accuracy, state-observability coverage, and information completeness are required. The ethanol–sugar–biomass–enzyme package remains important for parameter learning because biomass and enzyme measurements provide strong information about growth, enzyme-production, and hydrolysis-related parameter directions, but it is no longer the strongest aggregate reduced package once residual-substrate-containing combinations are included.
In practice, low-burden screening experiments could begin with ethanol, sugar, and biomass measurements. Experiments focused on overall digital-twin readiness should add a residual-substrate or solids-related proxy, giving the ethanol–sugar–biomass–substrate package. Model-refinement experiments that prioritize kinetic identifiability should include enzyme activity, especially when growth and hydrolysis parameters must be separated. High-quality benchmark experiments should use full-proxy monitoring when the measurement burden is acceptable. This staged development route is consistent with digital bioprocessing and digital chemical engineering roadmaps, in which digital twins evolve through improved modeling, state estimation, process analytics, enabling digital technologies, and control-oriented decision support [
19,
20,
21,
23,
24,
25,
45,
46].
These practical implications should be interpreted as design guidance for the tested virtual-plant benchmark, not as a universal sensor prescription. Platform-specific sensor errors, costs, delays, organisms, feedstocks, product portfolios, missing-data patterns, and operating regimes should be incorporated before implementation. This is especially important for literature-derived CBP datasets, where product prediction can be affected by heterogeneous feedstock–pretreatment–microbial descriptors, sparse reporting, and missing-label structure [
30].
4.8. Limitations
Several limitations apply to this study. First, a computational virtual plant was used to generate observations and evaluate sensor-set performance. The results therefore provide guidance for sensor prioritization and digital-twin readiness assessment, but they should not be interpreted as experimental validation. Real CBP systems may include organism-specific regulation, feedstock heterogeneity, mass-transfer limitations, inhibitor formation, contamination, evaporation, sensor drift, and assay-specific bias. Future work should test the proposed hierarchy using synchronized experimental measurements of ethanol, soluble sugars, biomass proxies, enzyme activity, and residual solids.
Second, the five-state hybrid model is a compact representation of lignocellulosic CBP. It captures the main information pathways needed to study state observability, parameter identifiability, and soft-sensor reconstruction, but it does not resolve all biochemical details. More detailed models could separate cellulose, hemicellulose, glucose, xylose, cellobiose, individual enzyme classes, inhibitor species, co-products, and organism-specific metabolic states. This extension is relevant because literature-derived CBP datasets show uneven product support and missing-label structure across ethanol and co-products [
30]. Such refinements may change the relative importance of candidate sensors and may introduce additional identifiability challenges unless richer measurements are available [
28,
29].
Third, the observability and identifiability criteria are local and depend on the operating trajectory around which sensitivities are evaluated. This study examined nominal, mild, hydrolysis-extended, fermentation-early, and shifted-feasible temperature–pH trajectories. The overall ranking was highly correlated across these cases, but the top aggregate package changed for some trajectories. The ethanol–sugar–biomass–substrate package was preferred in the nominal and hydrolysis-extended cases, whereas full-proxy monitoring became preferred for the mild, fermentation-early, and shifted-feasible profiles. Therefore, the recommendation should be interpreted as conditional on the explored operating region. Other feedstocks, organisms, pretreatment severities, solids loadings, batch durations, or control policies may generate different sensitivity patterns. This is a general limitation of sensitivity-based experimental design and practical identifiability analysis [
37,
38].
Fourth, the measurement-noise levels, missingness assumptions, bias levels, assay-delay approximation, and measurement-burden indices were chosen as computational design scenarios rather than calibrated values for a specific experimental platform. The ranking was stable with uniform noise scaling, missing observations, assay delay, systematic bias, alternative scoring weights, and alternative burden workflows. However, the sensor-specific noise Monte Carlo analysis showed that full-proxy monitoring and the ethanol–sugar–biomass–substrate package can exchange the top rank when individual sensor uncertainties vary independently. Actual measurements can also have platform-specific error structures, detection limits, sampling losses, maintenance requirements, operator-time costs, and latency constraints. Future studies should replace the abstract burden and error assumptions with experimentally measured values for the intended laboratory or pilot-plant platform [
21,
23,
42].
Fifth, the expanded analysis evaluated all ethanol-mandatory combinations of the five modeled measurement channels, but it did not evaluate sensor packages that omit ethanol. Ethanol was treated as mandatory because it is the direct product signal and the practical product-monitoring baseline. Therefore, the ranking should be interpreted within the ethanol-mandatory design space. Other objectives, such as early-stage hydrolysis diagnostics before ethanol formation, could motivate a different candidate space.
Sixth, the UKF case study evaluated latent-state reconstruction under model–plant mismatch conditions, but it did not test closed-loop control performance in an experimental process. Good estimator performance is only one requirement for digital-twin deployment. Practical implementation also requires suitable actuators, acceptable measurement latency, robust controller design, reliable data transfer, and safe interaction among the physical process, model updates, operators, and controllers. The present framework should therefore be viewed as a sensor-prioritization and soft-sensing readiness tool, not as complete closed-loop digital-twin validation [
45,
46].
Finally, the aggregate ranking depends on the normalized multi-criteria scoring scheme. Although the weight-sensitivity analysis showed that the ethanol–sugar–biomass–substrate package remained preferred under most weighting schemes, full-proxy monitoring became preferred when state observability was given dominant weight. The preferred package therefore depends on whether the digital twin is intended for maximum state-information coverage, parameter learning, nonlinear reconstruction, or lower-burden screening. The ranking should be used as a decision-support tool rather than a universal sensor prescription.
5. Summary and Conclusions
This paper presented a computational methodology for selecting informative measurement packages for digital-twin-assisted consolidated bioprocessing (CBP). The framework combines state-observability analysis, parameter-identifiability analysis, UKF-based soft-sensor reconstruction, measurement-burden assessment, and robustness testing with changes in measurement noise, operating trajectory, measurement imperfections, scoring weights, and sensor-specific burden assumptions. The objective was to support pre-experimental sensor-set design before laboratory or pilot-scale digital-twin validation.
Within the tested five-state virtual-plant benchmark, ethanol-only sensing was inadequate for state-aware CBP digital-twin reconstruction because ethanol is a delayed product signal. At a sampling interval, the state-observability log-pseudodeterminant increased from 4.18 with ethanol-only sensing to 8.56 after adding soluble sugar and to 16.42 with full-proxy monitoring. The ethanol–sugar–biomass–substrate package also provided strong reduced state-observability performance, with log-pseudodeterminants of 15.12, 13.76, and 12.51 at 6, 12, and , respectively. Parameter-identifiability analysis showed that biomass and enzyme proxies were especially valuable for model learning: the ethanol–sugar–biomass–enzyme package gave the strongest active-information performance, with log-pseudodeterminants of 10.82, 9.06, and 6.67 at 6, 12, and , respectively. Full-proxy monitoring provided the most complete all-parameter information coverage.
The paired UKF Monte Carlo reconstruction test showed that additional measurements substantially improved latent-state estimation under model–plant mismatch conditions. Ethanol-only monitoring gave a mean latent-state RMSE of 1.1899, whereas full-proxy monitoring gave the lowest RMSE, 0.3756, followed by ethanol–biomass–enzyme–substrate at 0.3843 and ethanol–sugar–biomass–substrate at 0.4121. After evaluating all 16 ethanol-mandatory candidate packages, the aggregate ranking changed relative to the original seven-package comparison. Ethanol–sugar–biomass–substrate achieved the highest primary aggregate sensor-value score, 0.8432, with a burden index of 7.0. Full-proxy monitoring ranked second, with a score of 0.8173 and a burden index of 10.0, while ethanol–biomass–enzyme–substrate ranked third, with a score of 0.8086. The previously emphasized ethanol–sugar–biomass–enzyme package remained important for parameter learning but ranked sixth overall once residual-substrate-containing combinations were included.
The robustness analyses supported the main recommendation while clarifying its conditional nature. Ethanol–sugar–biomass–substrate remained top-ranked with uniform noise scaling, full UKF missingness, delay and bias stress tests, most scoring-weight scenarios, and all tested sensor-specific burden workflows. For independent sensor-specific noise variation, full-proxy monitoring and ethanol–sugar–biomass–substrate had similar top-rank frequencies, and for some alternative operating trajectories full-proxy monitoring became top-ranked. Overall, ethanol-only monitoring is suitable only as a minimal product baseline; ethanol–sugar–biomass sensing can support lower-burden screening; ethanol–sugar–biomass–substrate sensing is recommended for the primary aggregate digital-twin readiness score; ethanol–sugar–biomass–enzyme sensing remains attractive for parameter learning; and full-proxy monitoring is recommended for benchmark experiments when maximum reconstruction accuracy and information completeness are required. Because the results were obtained from a computational benchmark, the hierarchy should be validated with platform-specific experimental measurements, sensor errors, delays, costs, organisms, feedstocks, and operating regimes before practical deployment.