1. Introduction
With the acceleration of global population aging and the increasing burden of chronic diseases, the demand for long-term and continuous physiological monitoring in community and home environments has become increasingly prominent [
1,
2]. In real-world care scenarios, limitations in cognitive function, communication ability, and caregiving resources often hinder timely and accurate reporting of physiological discomfort. Traditional care models relying on manual observation and subjective judgment are insufficient for continuous and standardized monitoring, potentially amplifying health risks due to delayed responses [
3,
4]. Driven by the concept of “Active Health,” portable and non-invasive monitoring technologies with automated analysis capabilities that can operate under resource-constrained conditions are increasingly required in primary and community healthcare systems [
5].
Among various vital signs, continuous monitoring of cardiac activity plays a central role. Cardiovascular diseases remain one of the leading causes of mortality worldwide [
6], and their progression is often accompanied by changes in the coordination between cardiac electrical and mechanical activities [
7]. Accordingly, in non-clinical environments, cardiac monitoring systems are expected to provide stable operation, automated analysis, and interpretable feedback, with an emphasis on continuous physiological state perception rather than direct diagnostic decision-making.
The electrocardiogram (ECG) reflects cardiac electrical activity [
8,
9], while the phonocardiogram (PCG) captures valve motion and related hemodynamic characteristics. Synchronous ECG–PCG acquisition enables comprehensive characterization of cardiac electromechanical processes, allowing quantitative assessment of key mechanical phases such as the pre-ejection period, systole, and diastole. Compared with single-modality analysis, ECG–PCG fusion provides improved information completeness for cardiac state monitoring [
10]. However, most existing ECG–PCG studies [
11,
12,
13,
14,
15] focus on disease classification or diagnostic performance enhancement, whereas systematic frameworks addressing long-term monitoring under nonstationary and motion-corrupted conditions remain limited. In contrast to accuracy-driven fusion paradigms, this study adopts a stability-oriented algorithmic design, emphasizing physiologically consistent coupling and temporal robustness rather than optimizing standalone detection or classification performance.
Distinct from accuracy-driven diagnostic paradigms, the technical contribution of this study lies not in improving the precision of individual detection modules, but in establishing an ECG–PCG electromechanical coupling feature construction paradigm tailored for long-term monitoring. By incorporating rule-driven event consistency constraints and multi-stage correction mechanisms, the proposed approach preserves the temporal logic and cross-state stability of electromechanical events under nonstationary and noisy conditions.
Based on this perspective, we propose an automated ECG–PCG coupling analysis and semantic reporting framework designed for community and home environments under resource constraints. The primary objective is to validate its engineering feasibility and stability rather than clinical diagnostic performance. A structured multimodal feature system is constructed to support a fully automated pipeline, including signal preprocessing, event localization, ECG-guided mechanical event correction, and electromechanical parameter extraction. A large language model (LLM) is further introduced as a strictly constrained semantic interface to transform rule-based structured features into standardized and readable textual reports. This hybrid architecture—combining rule-driven analysis with LLM-assisted expression—provides a deployable and physiologically interpretable ECG–PCG coupling analysis paradigm.
2. Materials and Methods
2.1. Dataset Introduction
This study employs two types of synchronized ECG-PCG datasets—publicly available and self-built—to construct and validate the proposed ECG-PCG coupling analysis system. Data sources include: the publicly accessible EPHNOGRAM database and a self-built multi-state physiological signal database. Public data were used for model development and algorithm performance comparison, while self-built data were used to assess the system’s robustness and transferability under dynamic physiological and emotional variations. Both datasets comprise synchronously acquired signals, enabling temporal coupling analysis and feature modeling of electro-mechanical activities.
- (1)
EPHNOGRAM Database
The EPHNOGRAM database was recorded using a portable, low-power acquisition system (hardware version 2.1) developed by an international research team [
16], designed to provide high-quality synchronized ECG–PCG data to support cardiac electro-mechanical coupling research. This system simultaneously records three-lead ECG and single-channel PCG. The auscultation position was selected between the tricuspid and mitral valve auscultation areas to simultaneously obtain clear first and second heart sounds.
The database contains ECG-PCG synchronized recordings from 24 healthy adults (aged 23–29 years, mean 25.4 ± 1.9 years). All signals were recorded at an 8 kHz sampling rate with 12-bit quantization precision (approximately 10.5 effective bits). Experimental tasks encompassed multiple typical physiological activity states, including resting, walking, running, and cycling. This database provides a reliable baseline data resource for multimodal cardiac signal research, offering crucial support for algorithm performance evaluation and validation of noise-reduction methods.
- (2)
Self-Built ECG–PCG Dataset with Physiological Perturbations
To evaluate the robustness of the proposed ECG–PCG coupling analysis algorithm under physiological perturbations and nonstationary conditions, a self-built synchronous ECG–PCG dataset was constructed. The purpose of this dataset was to introduce controlled variations in heart rate, respiratory modulation, and motion-related noise, thereby systematically evaluating algorithmic stability under distribution shifts, rather than to represent specific populations, age groups, or clinical disease conditions, nor to support diagnostic or prognostic clinical conclusions.
Data acquisition was conducted using a PowerLab/16sp system (AD Instruments, Dunedin, New Zealand). Five healthy young volunteers aged 22–26 years were recruited, and all experiments were performed in a quiet environment. The study protocol was approved by the Institutional Review Board of the University of Shanghai for Science and Technology (IRB-AF98-V1.0). Written informed consent was obtained from all participants prior to data acquisition, and all procedures were conducted in accordance with the principles of the Declaration of Helsinki. ECG signals were recorded using a standard Lead I configuration (left arm positive, right arm negative, right leg ground). PCG signals were collected using an MLT209 electronic stethoscope (AD Instruments, Dunedin, New Zealand) positioned between the tricuspid and mitral auscultation areas. Both ECG and PCG signals were sampled at 8 kHz to ensure compatibility with publicly available datasets.
To introduce different types of physiological perturbations, each participant completed three experimental tasks: approximately 5 min of resting recording, 5 min of post-exercise recovery, and approximately 10 min of emotion-induction tasks. The post-exercise recovery task was designed to induce heart rate acceleration, increased respiratory activity, and chest-wall vibration–related noise [
17,
18,
19]. The emotion-induction task, based on standardized video stimuli, elicited mild and transient emotional responses (e.g., stress or positive affect), resulting in nonstationary heart rate dynamics and occasional motion artifacts [
20,
21,
22]. These conditions were intentionally incorporated to evaluate algorithmic robustness under non-ideal and dynamically varying signal conditions.
2.2. Overall System Framework
This study proposes an automated ECG-PCG coupling analysis system tailored for community and home settings, aiming to achieve a complete closed-loop process from multimodal physiological signal acquisition to semantic interpretation of cardiac function. The system adopts a modular pipeline design, sequentially performing signal preprocessing, event detection and calibration, multimodal feature construction, and LLM-driven intelligent report generation.
In its overall design, the system adheres to three core principles:
- (1)
Algorithmic robustness: Improving reliability under noisy and motion-intensive conditions through multi-stage processing and redundant verification strategies;
- (2)
Cross-modal consistency: Fully leveraging the temporal correlation between ECG and PCG signals, using electrical events as a global reference for mechanical event detection;
- (3)
Intelligent interpretability: Achieving a natural transition from quantitative analysis to semantic interpretation through a structured feature system and large language models.
The overall system architecture is shown in
Figure 1. The proposed system establishes an end-to-end processing pipeline from low-level signal acquisition to high-level semantic representation, with an architectural design that explicitly considers scalability, deployability, and physiological interpretability at the system and implementation levels. Rather than claiming validated real-world deployment, the proposed framework is intended as a methodologically feasible and engineering-oriented technical reference for cardiac physiological state monitoring under resource-constrained and uncontrolled conditions, such as those commonly encountered in community and home scenarios. Further details are presented in
Section 2.3,
Section 2.4,
Section 2.5,
Section 2.6 and
Section 2.7.
2.3. ECG Signal Processing
The ECG signal reflects the electrical activity of the heart during depolarization and repolarization. The R wave, as the most prominent feature within the QRS complex, serves as the primary indicator of ventricular depolarization. Accurate detection of the R wave is critical for heart rate analysis, heart rate variability (HRV) calculation, and multimodal signal synchronization. To achieve robust R-wave identification, this study developed a processing workflow incorporating multi-stage filtering, feature enhancement, and adaptive threshold estimation.
First, the raw ECG signal undergoes preprocessing to suppress noise. Specifically, a 0.5 Hz high-pass filter is first applied to eliminate baseline drift, followed by a 49–51 Hz notch filter to suppress power-line interference, and finally a 40 Hz low-pass filter to remove high-frequency electromyographic noise. Subsequently, amplitude normalization and 20-millisecond moving-average smoothing are applied to standardize the signal’s dynamic range and smooth waveform transitions, laying the groundwork for subsequent feature enhancement.
During the R-wave detection phase, this study established a multi-stage recognition framework based on the principles of the Pan–Tompkins algorithm, comprising “frequency domain filtering—feature enhancement—adaptive threshold detection.” First, the preprocessed signal undergoes band-pass filtering in the 5–15 Hz range to enhance the dominant frequency components of the QRS complex. Subsequently, first-order differentiation and the squaring operations are applied to the filtered result to further sharpen the steepness of the R-wave rising edge. Then, a 150-millisecond sliding window is used to compute the signal’s energy envelope, smoothing high-frequency fluctuations and highlighting the overall QRS energy profile.
Based on the statistical properties of this envelope signal, the adaptive threshold is defined as
Here, and represent the mean and standard deviation of the signal envelope, respectively. This threshold is used for preliminary detection of candidate peaks, with a minimum inter-peak interval of 300 ms imposed to suppress false detections. Finally, the original smoothed signal is re-examined within ±100 ms of each candidate peak to identify the local maximum as the definitive R-wave location. This process integrates physiological features with spectral characteristics, achieving an integrated processing pipeline from noise suppression to precise R-wave localization.
2.4. PCG Signal Processing
The PCG signal captures the mechanical-acoustic information generated by valve motion and hemodynamic changes, serving as a key modality for characterizing mechanical event timing in electro-mechanical coupling analysis. The S1/S2 identification strategy adopted in this study follows a rule-based pipeline comprising “frequency-domain enhancement–envelope extraction–heuristic temporal discrimination.” This design targets a computationally efficient and interpretable baseline that remains stable under typical and moderately noisy conditions, particularly for deployment in low-power and low-computation environments.
First, the original PCG signal undergoes the DC-offset correction to eliminate the constant baseline component, facilitating subsequent processing. Considering that the primary spectral energy of PCG signals is concentrated in the low-to mid-frequency range, while simultaneously suppressing low-frequency drift caused by respiration and high-frequency electromyographic and quantization noise, this study employs a 20–150 Hz band-pass filter to preserve key frequency components related to the cardiac cycle. This frequency band has been demonstrated in prior studies to effectively cover the primary acoustic energy bands of S1 and S2, while maintaining good discrimination capability even in dynamic or noisy environments. Following filtering, the signal undergoes mild smoothing using a 10 ms moving-average window to suppress spike noise and enhance the stability of subsequent envelope estimation.
To extract the instantaneous energy profile of the PCG signal, this study employs the Hilbert transform to compute the instantaneous envelope:
Here, denotes the smoothed bandpass PCG, represents the Hilbert transform, and is the envelope signal. The Hilbert envelope preserves transient energy details while avoiding the excessive amplification of extreme values caused by squaring operations.
In the envelope domain, a peak-detection algorithm identifies candidate PCG events, setting the minimum peak height to 30% of the envelope maximum and the minimum inter-peak interval to 0.2 s to filter out adjacent false peaks caused by noise. Furthermore, based on the physiological pattern of PCG cycles—where systole is typically shorter than diastole—candidate peaks are classified as S1 or S2: if the interval between a pair of adjacent peaks is shorter than the following interval, the earlier peak is assigned as S1 and the later peak as S2, and this rule is iteratively applied to reconstruct the complete PCG cycle sequence. This method demonstrates excellent noise robustness while maintaining computational efficiency.
It should be noted that this rule-based approach represents an engineering simplification by design. Its robustness may be limited under extreme heart rate conditions (pronounced tachycardia or bradycardia), complex arrhythmias, pronounced split heart sounds, or recordings with strong motion artifacts. Addressing such scenarios by incorporating adaptive thresholds or lightweight machine learning models as complementary or fallback solutions constitutes an important direction for future work.
2.5. ECG-Based S1/S2 Localization Assistance and Correction
To improve the consistency and physiological plausibility of PCG event localization, this study introduces the ECG R-wave as a cross-modal temporal reference for assisting the correction of preliminarily detected S1/S2 events. This strategy leverages the relatively stable electromechanical coupling between ECG and PCG to explicitly constrain event timing and compensate for the inherent uncertainty of PCG under low signal-to-noise conditions.
In the current implementation, fixed search windows derived from typical physiological ranges (e.g., 30–150 ms after the R-wave for S1 and 250–400 ms for S2) are employed to constrain candidate PCG events. Importantly, these windows are not treated as rigid physiological constants, but rather as initial search constraints that balance algorithmic stability, interpretability, and deterministic execution in embedded systems.
We acknowledge that fixed windows may underfit inter-subject variability and state-dependent changes, such as differences in resting heart rate, physical activity level, or autonomic modulation. A more generalizable solution would involve adaptive windowing strategies, for example, by parameterizing search ranges as functions of individual baseline heart rate. The current system design leaves explicit interfaces for such personalization, which is identified as a key direction for future work.
2.6. Multimodal Coupling Features and Other Feature Extraction
To deeply reveal the associative patterns between cardiac electrical activity and mechanical acoustic responses, this study constructed a multimodal feature system based on synchronously acquired electrocardiogram (ECG) and phonocardiogram (PCG) signals. This system encompasses temporal, biomechanical amplitude, statistical trend, and coupling synchrony features. This framework not only incorporates independent information from ECG and PCG signals but also emphasizes characterizing their cross-modal dynamic interactions—specifically, multimodal electro-mechanical coupling features, which constitute a core focus of this methodology. All features were extracted from the R-wave peaks and S1/S2 heart sound localization results corrected by the aforementioned algorithms, ultimately forming a 60-dimensional structured feature matrix (
Table 1). This provides a unified data foundation for subsequent cardiac state assessment and automated interpretation by large language models.
First, in the temporal dimension, the study examined the relative temporal relationship between ECG events (R waves) and PCG events (S1/S2), extracting multiple indicators characterizing electro-mechanical conduction efficiency. To enhance the quantification of temporal stability, this study further calculated the RS1 delay coefficient of variation, S1–S1 period variability, and time-domain heart rate variability (HRV) metrics such as SDNN, RMSSD, and pNN50. This approach provides a multi-layered characterization of autonomic nervous system regulation and periodic stability.
Second, in the amplitude and energy dimensions, a fixed 100-millisecond window was designed for each PCG event to extract key acoustic parameters. The coefficient of variation (CV) of S1 and S2, along with morphological features such as skewness and kurtosis, was also extracted to characterize stability and morphological structural differences within PCG cycles.
Third, in the statistical trend dimension, this study focused on the dynamic changes in PCG intensity over time. By performing linear regression on the S1 RMS sequence of consecutive heartbeats, the slope of the S1 intensity trend was obtained to quantify whether myocardial contraction exhibits an enhancing or attenuating acoustic trend. Additionally, the S1/S2 amplitude ratio and its variability were extracted. This ratio serves as a crucial indicator of the coordination between ventricular systolic and diastolic mechanical responses, reflecting the impact of valve state changes or load alterations on PCG.
Finally, this study focuses on constructing a set of multimodal coupling features to quantify the electro-mechanical synchrony between ECG and PCG. For instance, calculating the Pearson correlation coefficient between RR intervals and S1–S1 periods can assess synchrony between cardiac electrical and mechanical rhythms. Metrics such as cycle-difference distributions, RS1 delay variability, and S1/S2 dynamic ratios can reveal potential electro-mechanical decoupling, valve delay, or abnormal myocardial mechanical responses. Furthermore, this study incorporates simplified frequency domain features based on RR intervals (LF, HF, and their ratio) alongside a series of clinical threshold-based anomaly markers—such as bradycardia, tachycardia, abnormal RS1 delay, and abnormal S1/S2 ratio—to enable fundamental automated screening capabilities.
Overall, the 60 multimodal features extracted in this section form the foundational data structure for subsequent large language model interpretation, automated summarization, and personalized cardiac state assessment. Among these, cross-modal electro-mechanical coupling features are particularly crucial, as they effectively integrate complementary information from ECG and PCG signals, thereby establishing the core technological foundation for deploying intelligent cardiac health monitoring systems in everyday settings such as communities and homes.
2.7. LLM-Based Automated Data Integration and Intelligent Interpretation System
Building upon the aforementioned multimodal ECG–PCG coupling features, this study developed an LLM-driven automated interpretation module for ECG-PCG signal analysis. This module enables efficient bridging from structured features to clinical semantic outputs. Operating under unified input specifications, it achieves hierarchical semantic mapping of multidimensional physiological indicators through three steps: feature summarization, prompt construction, and structured text generation.
First, the system automatically computes and compresses key temporal, energy, and electromechanical coupling features, generating structured feature entries containing range information and anomaly markers. This ensures the large language model receives inputs with clear boundaries, stability, and medical significance. Subsequently, prompt templates constructed based on cardiac physiology and clinical interpretation logic impose explicit constraints on the output format, restricting the generated content to three sections: “Comprehensive Assessment,” “Current Status,” and “Abnormal Indicators.” Minimal physiological background information is embedded within the prompts to guarantee semantic consistency and medical controllability of the generated results. During inference, the LLM generates standardized report text based on structured features. A hierarchical parsing module then extracts the structure and standardizes the format to support subsequent clinical review and system-level invocation. Additionally, the module automatically generates a simplified summary for non-specialist users, enhancing readability and practicality in home and community health monitoring scenarios.
This interpretation module achieves an automated closed-loop process from “feature computation” to “semantic interpretation” by deeply integrating the language generation capabilities of LLMs with structured physiological features. Its design emphasizes structural constraints, semantic consistency, and medical controllability, providing a scalable and deployable technical pathway for the intelligent interpretation of complex multimodal physiological signals.
4. Discussion
4.1. Overall System Performance and Methodological Positioning
This study presents an automated ECG–PCG coupling analysis system organized along a unified processing pipeline comprising signal-level denoising, event-level calibration, feature-level fusion, and semantic-level expression. It should be explicitly noted that the primary objective of this work is to validate the methodological feasibility and engineering robustness of such a system under resource-constrained and uncontrolled community or home environments, rather than to assess its effectiveness for disease diagnosis or clinical decision-making. In contrast to existing ECG–PCG studies that predominantly pursue improved diagnostic accuracy or disease classification performance, this work deliberately adopts a monitoring-oriented paradigm, focusing on stable physiological state perception, cross-condition robustness, and interpretable feedback suitable for long-term deployment.
The system design leverages the complementary characteristics of ECG and PCG signals, combining the high temporal precision of ECG with the mechanical information captured by PCG. By introducing cross-modal temporal constraints, the proposed framework improves the stability and consistency of heart sound event localization. Based on this foundation, a structured multimodal feature set is constructed to describe cardiac rhythm characteristics, acoustic energy patterns, and electro-mechanical coupling relationships, providing reliable inputs for subsequent state-level analysis and semantic representation.
The robustness of the overall system is reflected at multiple levels: (1) the preprocessing module effectively mitigates common nonstationary noise encountered in home and community settings; (2) the R-wave–guided heart sound calibration strategy substantially enhances S1/S2 temporal alignment under conditions of weak heart sounds or motion artifacts, shifting the system from acoustics-dependent detection to a cross-modal constraint–driven approach; and (3) the multimodal feature set spans rhythm, energy, and electro-mechanical timing dimensions, enabling the characterization of global physiological state variations across different conditions. (4) The system integrates real-time evaluation and failure protection mechanisms based on signal quality. When signal quality is too low, it prompts users to retake measurements rather than outputting unreliable results, thereby ensuring safety and practicality in actual deployment.
On top of the structured feature computation, a semantic expression module is incorporated solely as an interface for translating predefined features into standardized textual descriptions, thereby improving output interpretability across different user groups. This module does not perform medical reasoning or diagnosis; instead, it serves to reduce the cognitive burden associated with understanding complex multimodal physiological information.
In summary, this work demonstrates a multi-level collaborative analysis framework tailored for community and home environments, with its main contribution lying in the validation of ECG–PCG coupling analysis and structured expression under constrained conditions. The proposed system is intended as a physiological state awareness and decision-support tool, providing early and understandable indications of potential changes and encouraging users to seek professional medical evaluation, when necessary, rather than replacing clinical diagnosis or treatment.
4.2. Contribution of R-Wave-Assisted Correction to PCG Analysis
Introducing the R-wave as a temporal prior for auxiliary correction of heart sound events (S1/S2) is a critical step in achieving stable PCG analysis under complex conditions. Under pure PCG segmentation, heart sound localization is significantly impacted by reduced signal-to-noise ratio, envelope distortion, and motion artifacts—whether at rest or during dynamic movement—resulting in PCG counts that deviate markedly from actual beat counts. Particularly in high-interference scenarios like walking or cycling, baseline methods become nearly ineffective, with some recordings yielding single-digit or significantly underestimated PCG counts. This further highlights the structural limitations of unimodal acoustic strategies in non-stationary environments.
In contrast, the R-wave-assisted correction strategy leverages the high temporal precision of ECG signals and the stability of electro-mechanical coupling to provide reliable cross-modal temporal anchors for PCG events, thereby mechanistically addressing these shortcomings. Based on ablation experiment results (
Section 3.1.1), R-wave correction effectively compensates for missed detections caused by weak PCG, restoring counting accuracy to levels consistent with physiological heart rate, or achieving false detection suppression. This demonstrates that introducing the R-wave prior establishes a robust “structural supervision channel,” significantly enhancing system stability in non-resting environments.
From a system workflow perspective, this correction strategy not only enhances the accuracy of PCG localization but also provides high-quality foundational data support for subsequent PCG classification, temporal feature calculation, and multimodal coupling index analysis. The stability of S1/S2 locking directly impacts the reliability of key temporal metrics such as RR–S1, S1–S2, and mechanical delay. These metrics serve as crucial inputs for automated interpretation and ECG–PCG coupling assessment in this study. Therefore, R-wave-assisted correction functions not merely as a localized optimization module but as a foundational safeguard throughout the entire system analysis chain.
At the methodological level, the strategy employed in this study demonstrates distinct advantages in robustness and scene adaptability. Traditional PCG segmentation methods predominantly rely on envelope detection [
27], Hilbert transform [
28], wavelet analysis [
29], or statistical learning models [
30], which perform well under quiet conditions but exhibit a sharp decline in performance in dynamic scenarios such as low signal-to-noise ratios or motion. While prior studies [
31,
32,
33] have demonstrated the auxiliary value of ECG for PCG segmentation, most follow a data-driven paradigm. Their performance heavily depends on the scale and quality of training data, limiting their generalization capability and interpretability in unseen noisy scenarios. To address this, this study proposes a mechanism-first regularization framework centered on a systematic cross-modal temporal coupling mechanism: “window constraint—candidate matching—physiological tolerance—missed detection compensation.” This approach drives the segmentation process by explicitly encoding the physiological relationship between PCG and ECG, rather than implicitly learning it from data. This design significantly reduces reliance on large-scale annotated datasets, ultimately achieving stable and consistent PCG localization performance under complex conditions such as motion and long-term recordings—a feat previously difficult for conventional methods.
It should be emphasized that the R-wave-assisted correction strategy is not intended to rigidly enforce fixed physiological windows across all subjects and states. Rather, it functions as a structural constraint layer that mediates between rule-based PCG segmentation and complex physiological variability. Its core value lies in explicitly encoding the temporal relationship between electrical and mechanical events, thereby substantially mitigating the disruptive impact of false positives and false negatives without relying on large-scale training data. When signal quality deteriorates substantially or the rule-based pipeline fails, incorporating modern PCG segmentation approaches—such as lightweight learning models or template-based methods—as secondary checks or fallback modules represents a reasonable and extensible system evolution path. Within this framework, the lightweight rule-based pipeline proposed in this study should be regarded as a stable and interpretable baseline layer rather than an exclusive final solution.
In summary, R-wave-assisted correction effectively mitigates the acoustic sensitivity issues of PCG in dynamic environments by introducing cross-modal temporal constraints, providing a fundamental mechanism for reliable localization of core cardiac sound events. Results demonstrate that this strategy constitutes an effective approach for achieving stable cardiac sound analysis across diverse scenarios. Its rule-based, low-data-dependency design also lays a crucial foundation for constructing more robust and interpretable multimodal physiological signal analysis systems.
4.3. Physiological Interpretability and Application Potential of Multimodal Coupling Features
The ECG–PCG multimodal coupling feature framework developed in this study is designed to provide a structured representation of physiological state variations by integrating cardiac electrical activity, mechanical acoustic responses, and their temporal relationships. It should be emphasized that the following discussion focuses on the potential of these features to reflect physiological state changes and regulatory trends, rather than their use for diagnosing specific pathological conditions.
Systematic evaluation results indicate that multiple coupling features exhibit good short-term stability and cross-subject consistency, supporting their methodological reliability as descriptors of physiological states. For instance, RR interval–related temporal features demonstrate low variability and high agreement across different experimental conditions, suggesting their robustness in capturing heart rate regulation and rhythm dynamics. Such indicators have been widely used in prior studies to characterize autonomic modulation and load responses [
34], and their stability forms a necessary basis for continuous monitoring in community and home environments.
In addition, heart sound energy- and morphology-related features show favorable inter-subject consistency in resting and certain dynamic conditions, indicating their ability to consistently reflect variations in cardiac mechanical activity [
35]. In the context of this study, these features are primarily interpreted as physiological descriptors of mechanical response changes, rather than indicators of specific valvular abnormalities or disease states.
More notably, cross-modal coupling features (e.g., RS1, S1–S2 intervals) integrate the temporal relationship between electrical and mechanical cardiac events, offering a quantitative means to analyze electro-mechanical synchrony under different physiological conditions. Experimental results demonstrate that these features maintain good measurement stability across multiple states and exhibit consistent trends in response to perturbations such as exercise, respiration, and emotion induction, highlighting their capacity to reflect dynamic cardiac adaptations to physiological load.
Further analysis reveals that feature stability is significantly modulated by physiological state. Resting and low-interference conditions are more suitable for extracting baseline-related characteristics, whereas rhythmic movement conditions may enhance the consistency of certain temporal and coupling features. These findings suggest that feature selection should be adapted to specific usage scenarios to maximize reliability in practical deployments. At this stage, the feature set was intentionally kept comprehensive to enable a systematic evaluation of ECG–PCG coupling information under diverse physiological conditions. Feature selection or dimensionality reduction was therefore not performed in this study, as the primary goal was methodological feasibility assessment rather than application-specific model optimization. Future work will focus on task-oriented feature selection to reduce redundancy and improve computational efficiency.
Compared with previous studies relying primarily on single-modality ECG or PCG features [
36,
37], this work systematically evaluates the stability and state responsiveness of multimodal coupling features, demonstrating their potential advantages in capturing physiological state variations. Importantly, these features are not intended to directly indicate specific diseases but rather to provide richer information for long-term monitoring, physiological trend analysis, and individual state awareness.
In conclusion, the proposed multimodal coupling feature framework exhibits favorable stability and physiological consistency at the methodological level, making it suitable as a foundational tool for physiological state monitoring in community and home settings. Its value lies in supporting longitudinal observation of cardiac regulatory dynamics, rather than replacing standard clinical assessment or disease diagnosis.
4.4. Potential and Areas for Improvement of LLMs in Heart Health Monitoring
Based on a multimodal ECG–PCG coupling feature framework, this study developed an embedded lightweight LLM-based intelligent interpretation module, achieving a fully automated closed-loop pipeline from feature extraction to structured semantic report generation. It should be clarified that the primary objective of this study is not to validate independent medical reasoning by LLMs, but rather to investigate whether lightweight models can still accomplish structured feature integration and dual-mode semantic expression under extreme resource constraints. Experimental results demonstrate that the module effectively integrates multidimensional ECG and PCG indicators across resting and exercise scenarios, generating context-sensitive and semantically consistent dual-layer explanatory texts for different user groups. This data-driven semantic abstraction approach significantly lowers the expertise barrier for interpreting complex physiological signals and provides a feasible technical pathway for long-term cardiac monitoring in community and home settings. Sample reports show that the system delivers coherent interpretations of rhythm stability, heart rate variability, and autonomic balance, while distinguishing physiological load responses from potential abnormal fluctuations during exercise—bridging the gap between “metric output” and “understandable health interpretation.”
The effectiveness of the LLM module stems not only from its language generation capability but also from the highly stable ECG–PCG electromechanical coupling feature system developed in this study. Compared with prior approaches relying primarily on single heart rate or HRV metrics [
38], the proposed coupling features exhibit improved physiological consistency across states, providing reliable and verifiable structured inputs for semantic integration. By extending monitoring from the “rhythmic level” to the “electromechanical dynamics level,” the system enhances the informational depth of home-based monitoring, approaching that of clinical equipment. Leveraging the organizational and expressive capabilities of LLMs, complex cardiac dynamics can thus be transformed into medically coherent, follow-up-oriented textual descriptions that support long-term monitoring, load adjustment analysis, and individualized risk awareness.
At the same time, systematic evaluation reveals that lightweight LLMs struggle to independently generate medical reports without stringent constraints. Comparative experiments with DeepSeek-R1:1.5B and Qwen2.5:1.5B demonstrate notable limitations in medical terminology comprehension and numerical reasoning. Accordingly, this study adopts a rule-driven and template-constrained hybrid generation architecture, in which all numerical computations and state determinations are completed and validated by rule-based modules, while the LLM is restricted to generating natural language expressions within predefined semantic boundaries. This strategy enhances interpretive safety and consistency while preserving readability. Notably, the selection of 1.5B-scale models was intentional, aiming to evaluate feasibility under extreme constraints on computation, memory, and energy consumption.
Although this study adopts a cautious stance regarding independent medical reasoning by lightweight LLMs, it does not preclude the potential of larger models under richer resource conditions [
39]. The structured feature tables, interpretation templates, and semantic interfaces produced by the system are designed to be interoperable with external interpretation tools, allowing users to submit results to trusted, networked large-scale models or professional intelligent systems for further analysis. Recent advances in traceable LLM-driven medical reasoning agents demonstrate that, with appropriate model scale and system design, language models may play a more central role in future decision-support paradigms.
Overall, rather than positioning LLMs as medical reasoning engines, this study validates a practical pathway for embedding lightweight LLMs as constrained semantic integration and expression interfaces within cardiac monitoring systems under resource-limited conditions. This work provides an engineering reference for flexible deployment of language models across varying resource environments and for the gradual integration of higher-level intelligent interpretation capabilities.
5. Conclusions
This study proposes an automated ECG–PCG coupling analysis system for community and home use, implementing a full workflow including signal preprocessing, event detection and calibration, multimodal feature construction, and rule-constrained LLM-assisted semantic report generation. Results show that the system exhibits robust performance and physiological consistency across multiple states, outputs logically coherent PCG sequences, and maintains accurate coupling parameter extraction. The introduced LLM module integrates pre-computed multimodal metrics under rule-based constraints, focusing on improving report readability and consistency rather than autonomous medical interpretation or diagnosis. Thus, it serves as an interpretation-friendly interface for non-professional settings, not as a clinical decision-support tool.
Leveraging its automation, lightweight design, and deployability, the system shows potential for cardiac monitoring in community and home environments. It supports long-term monitoring without specialized operation, aiding in the observation and tracking of heart rate variations, load fluctuations, and changes in electro-mechanical coupling abnormalities in primary care contexts. However, it must be emphasized that the system remains a research prototype, lacking the clinical validation and regulatory certification required for medical devices, and it should not be used for medical decision-making. Future product development would require substantial work in clinical validation, quality management, data security, and privacy protection.
This study proposes a systematic methodology for validation and engineering feasibility demonstration. Its core value lies in providing a proof-of-concept for subsequent long-term scenario research, rather than direct application in clinical or daily health decision-making. From an algorithmic perspective, this work exemplifies a stability-oriented multimodal system design, where physiologically constrained coupling is used to enhance temporal robustness and system consistency under nonstationary, low-quality signal conditions, rather than pursuing accuracy-driven standalone detection or classification optimization. The currently constructed ECG-PCG coupled feature set focuses on systematic exploration and has not yet been optimized for specific applications. Future research may target specific monitoring tasks (e.g., home follow-up for heart failure patients) by conducting feature selection and lightweight model design to enhance computational efficiency, system stability, and practical deployment feasibility. Overall, this study provides a methodologically viable technical framework for daily cardiac monitoring in low-resource, uncontrolled environments, laying the foundation for further engineering and standardized development of multimodal cardiac assessment systems.