1. Introduction
Lower-limb loss affects millions of individuals worldwide, primarily due to vascular diseases and diabetes [
1]. The prevalence of these conditions is rising in low- and middle-income countries, further increasing amputation rates [
2]. In Brazil, for example, 31,190 lower-limb amputations were performed in 2022 through the public health system [
3]. Yet, an active knee prosthesis can cost around R
$100,000 [
4], while the average monthly per-capita income in the North region is only R
$1107 [
5], making such devices inaccessible to most individuals. This disparity underscores the urgent need for affordable and scalable prosthetic technologies, particularly in resource-constrained settings.
Beyond economic barriers, limb loss severely impacts mobility, employability, and overall quality of life [
6]. Limited access to prosthetic devices and rehabilitation services further restricts social participation, reinforcing cycles of exclusion [
2]. Expanding access to assistive technologies is therefore critical to improving independence and social integration [
7].
Building on this need, recent advances in active prostheses with powered joints and control systems have offered substantial improvements over traditional passive devices. By using actuators to generate movement, they closely replicate natural limb function [
8], resulting in improved gait symmetry, walking speed, and metabolic efficiency [
9]. These devices also offer superior adaptability to diverse terrains and activities, enhancing user mobility and quality of life [
10,
11]. The effectiveness of such prostheses depends on their ability to predict and adapt to movement intent in real-time. Predictive control systems interpret signals from sensors such as EMG or IMUs to dynamically adjust actuators, ensuring smooth transitions between gait phases and adapting to different walking conditions [
12,
13]. This kind of approach improves safety, reliability, and user acceptance, making efficient predictive algorithms essential for clinical translation [
14,
15].
Ensuring responsive and reliable prosthetic control requires models that are not only accurate but also practical for on-device execution. In this context, embedded systems provide a promising platform for real-time signal processing and control [
16]. Yet, while architectures such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) have demonstrated strong predictive accuracy [
17], their computational demands often exceed the capacity of low-power hardware [
18]. This gap highlights the need for more efficient alternatives—such as convolutional models optimized for microcontrollers—and for signal processing methods specifically tailored to constrained environments [
19].
This proof-of-concept study addressed these challenges by leveraging affordable IMU sensors, computationally efficient CNN-based models, and low-cost embedded hardware with dedicated accelerators for real-time knee angle prediction. The main contributions were:
- Multi-objective optimization of CNN architectures to balance accuracy and efficiency, yielding specialized models for short, natural, and long strides. 
- A lightweight gait classifier to select specialized models in real-time. 
- Deployment and validation of the optimized framework on a $40 TinyML platform (Sipeed MaixBit with Kendryte K210), demonstrating robust accuracy and low latency suitable for prosthetic applications. 
Recent studies in the field have been presented as proof-of-concept investigations, emphasizing technological feasibility as a critical first step toward clinical translation [
20,
21,
22,
23]. In line with this approach, our work integrated efficient model design with hardware-aware optimization and established the viability of real-time gait prediction on resource-constrained systems. These findings provided a foundation for larger-scale evaluations and pointed toward the development of prosthetic solutions that are effective and economically accessible.
The remainder of this paper is organized as follows. 
Section 2 reviews related work on IMU-based gait analysis and predictive modeling for prosthetics. 
Section 3 presents the materials and methods, including data acquisition, model architectures, and optimization strategies. 
Section 4 describes the experiments and evaluation procedures. 
Section 5 reports and discusses the experimental results. Finally, 
Section 6 concludes the paper with limitations and directions for future work.
  2. Related Work
Studies have shown that inertial measurement units (IMUs) can accurately measure joint angles in the sagittal plane, particularly knee flexion/extension, in gait analysis without magnetometers [
24]. Achieving this accuracy, however, often requires complex multistep algorithms to extract meaningful parameters from the recorded raw data [
25]. To overcome this computational burden, efficient algorithms using only accelerometers and gyroscopes have been developed for precise limb angle estimation [
26]. Sensor placement and computational strategies strongly affect the performance of gait event detection, with thigh- and shank-mounted sensors proving effective for temporal analysis [
27]. Furthermore, complementary filtering techniques further improve IMU data fusion accuracy [
28]. Our approach builds on these advances, focusing specifically on sagittal-plane prediction while addressing cost and complexity considerations.
For predictive modeling, LSTM networks effectively capture temporal dependencies in gait data, showing good predictive accuracy in terms of Root Mean Squared Error (RMSE) and other metrics [
29,
30]. However, deploying LSTMs on embedded systems remains challenging due to their computational demands. Furthermore, recent efforts in deploying deep learning models on embedded microcontrollers offer promising, cost-effective solutions for active prosthesis control [
31]. These studies find that Convolutional Neural Networks (CNN) generally outperform Multi-Layer Perceptrons (MLP) with limited data, although MLPs offer faster, real-time performance. Real-time knee joint angle prediction using RNNs on microcontrollers has also achieved robust predictive performance [
32], highlighting the potential of embedded deep learning for gait analysis.
This study advanced prior work by addressing two key gaps: (i) the limited demonstration of CNN-based gait prediction on TinyML platforms [
31], and (ii) the dependence on costly or proprietary sensor systems [
32]. By leveraging affordable IMUs, multi-objective optimization of CNN architectures, and embedded deployment on a microcontroller, we demonstrate the feasibility of real-time knee angle prediction for accessible motorized prosthetics.
  4. Experiments
  4.1. Data Acquisition
Four healthy adult volunteers (two male, two female) participated in this study. Participants were recruited through personal invitation and volunteered to take part without compensation. Inclusion criteria required the absence of any self-reported musculoskeletal or neurological disorders that could affect gait. All participants were adults capable of walking independently without assistive devices. Informed consent was obtained from all participants prior to data collection. The sample size was defined in alignment with the exploratory scope of the study, allowing controlled evaluation while incorporating basic variability in gender and age group (young and middle-aged adults). While the cohort size was limited, this design is consistent with other exploratory studies in biomedical signal processing, where the objective is to establish feasibility of the approach [
23]. 
Table 1 summarizes their demographic information.
Participants performed walking trials using three different stride lengths to simulate realistic gait variations:
- Short strides measuring 0.6 m. 
- Natural strides, equivalent to the participant’s average step length, at 0.82 m. 
- Extended strides of 1 m. 
For each stride length, participants performed five walking trials, resulting in approximately 2–3 min of recordings per stride type. All five trials were analyzed, but only segments without signal artifacts or sensor displacement were retained for processing. Noisy or incomplete samples were excluded through visual inspection and stability checks to ensure data quality and reliability.
All tests were conducted on a flat, uniform surface, with participants wearing standardized footwear to minimize external variability. Prior to data collection, participants underwent a familiarization period to ensure consistent execution of each stride length condition.
  4.2. Data Processing
In our methodology, the thigh angle served as the independent variable (X), while the knee angle was defined as the dependent variable (Y), computed as the angular difference between the thigh and shank segments at each point in the gait cycle (). This approach enabled modeling how variations in thigh movement influence knee behavior across different stride lengths.
  4.3. Data Preprocessing and Augmentation
To improve model robustness and generalization, we applied data augmentation techniques, including Gaussian noise addition, scaling, Arbitrary Time Deformation (ATD), and Stochastic Magnitude Perturbation (SMP). These techniques were selected because they preserve the temporal dynamics of gait signals while improving model robustness to inter- and intra-subject variability. Both methods (ATD and SMP) have been shown to effectively expand IMU-based datasets without distorting stride-phase characteristics [
40].
These transformations expanded the dataset from 33,800 to 169,000 data points. The data from all participants were then combined into a single dataset before being randomly split into training (70%), validation (20%), and testing (10%) sets.
  4.4. Training Methodology
A range of neural network architectures were trained and adapted for deployment using TensorFlow Lite for Microcontrollers (
https://www.tensorflow.org/lite/microcontrollers, accessed on 1 September 2025) All training was conducted using an NVIDIA GeForce GTX 960M GPU, with TensorFlow version 2.12.0 and Python version 3.11.1. Each model was trained for a maximum of 1000 epochs, using an 
EarlyStopping callback with a patience of 20 epochs to prevent overfitting. An adaptive learning rate was employed, starting with an initial value of 
 and utilizing a learning rate decay. Specifically, the 
ReduceLROnPlateau callback was configured to monitor the validation loss (
val_loss). If this metric did not improve after 5 consecutive epochs (
patience = 5), the learning rate was reduced by 50% (
factor = 0.5). The minimum learning rate was set to 
 (
min_lr = 1e-6) to prevent it from becoming too small.
  4.5. Multi-Objective Optimization
To systematically explore the trade-offs between model complexity and its predictive performance, we conducted a structured hyperparameter search. 
Table 2 summarizes the discrete values tested for 
 (model width), 
 (model depth), input window size, and prediction horizon during the multi-objective optimization process. By integrating methods outlined in 
Section 3.3.1 and 
Section 3.3.2, these hyperparameters were systematically adjusted to select the best model configuration. This approach allowed simultaneous optimization of generalization capabilities, minimization of overfitting, and maintenance of computational efficiency, essential for deployment in embedded environments.
The optimization procedure began with the computation of the Pareto Frontier, capturing the trade-offs between model accuracy and computational complexity. Root Mean Square Error (RMSE) and model size (in bytes) were used as primary metrics, with each model configuration defined by different  and  values.
To incorporate additional architectural considerations, such as input window size and forecast horizon, we adopted a weighted metric approach as described in 
Section 3.3.2. 
Table 3 details the empirical weights assigned to each metric, reflecting the constraints and priorities defined in [
38].
The input window size and forecast horizon are key parameters that significantly impact model performance. Larger input windows can capture more temporal dependencies, potentially enhancing accuracy for long-term patterns [
41], but they also increase computational complexity and the risk of overfitting [
36]. In contrast, smaller windows reduce computational demands but may overlook essential patterns.
Longer forecast horizons, though more challenging due to increased uncertainty [
41], are vital in active prostheses to predict user intentions over sufficient time spans for smooth movements [
42]. Short-term predictions may not allow proactive adjustments, leading to delayed responses. Balancing the forecast horizon ensures meaningful predictions with acceptable accuracy, thereby enhancing prosthetic functionality through anticipatory responses.
The results of our model selection process are illustrated in 
Figure 4. Models marked with a red cross (×) represent those on the Pareto Frontier, indicating they offer the most efficient trade-offs between the evaluated metrics. The model highlighted with a green circle is the one selected through this weighted evaluation (knee point).
  4.6. Combined vs. Specialized Gait Prediction Models
The combined gait prediction model was trained on a unified dataset comprising short, natural, and long stride data, aiming to predict knee joint angles from thigh motion sequences in a generalized fashion. While this approach captures common temporal and spatial patterns across gait types, it may underperform when facing type-specific variability.
To overcome this limitation, we adopted a dynamic gait adaptation strategy that classifies stride type in real time and routes the input to a specialized model optimized for that pattern. This modular design enables each predictor to capture the distinct dynamics of its respective gait type, thereby enhancing predictive accuracy while maintaining computational efficiency [
43,
44].
As illustrated in 
Figure 5, thigh motion data are first processed by a gait classification module. This classifier was implemented as a lightweight MLP model trained to categorize input sequences into short, natural, or long stride types. Each input consisted of time-series windows of filtered thigh IMU signals, with a window size of 64 samples and a stride of 10 samples between windows. According to its output, the system dynamically selects the appropriate specialized predictor. Both the classifier and the specialized models were deployed on the embedded platform, enabling fully on-device end-to-end inference.
Using specialized models for each gait type reduces the learning demand on individual models, enhancing their performance [
45]. This approach combined the advantages of model specialization and adaptive selection, resulting in better performance compared to a single, generalized model.
  4.7. Embedded Hardware and Implementation
To enable real-time inference, all neural models were converted from TensorFlow to TensorFlow Lite (TFLite) format and optimized through post-training quantization to 8-bit integer precision. The resulting TFLite models were then compiled into Kendryte KModel (.kmodel) format using the nncase toolchain (version 2.10) (
https://github.com/kendryte/nncase, accessed on 1 September 2025), which enables hardware acceleration through the KPU. No structured pruning was applied, as the quantized and compiled models already met the real-time execution and memory constraints of the embedded platform. To leverage the dual-core architecture of the Kendryte K210 microcontroller, our system assigned real-time gait classification to one core, while the second core executed the corresponding specialized prediction model. This division of labor allowed for low-latency operation and ensured that the entire pipeline meets real-time processing requirements without excessive computational overhead.
  6. Conclusions
This proof-of-concept study introduced efficient CNN models for knee angle estimation in lower-limb prosthetics using low-cost IMUs and multi-objective optimization, optimized for deployment on resource-constrained microcontrollers. Specialized CNN models tailored to short, natural, and long strides consistently outperformed a combined baseline model, achieving an average RMSE of  (over 48% improvement) with inference times of 16.8 ms, thereby demonstrating both competitive predictive accuracy compared to the state-of-the-art and real-time feasibility. Deployment on the $40 Sipeed MaixBit with the Kendryte K210 further validated that advanced deep learning models can be executed on affordable hardware without compromising performance.
As with other exploratory studies, the main limitation lies in the small participant pool, which constrains generalizability. Moreover, the present analysis was restricted to sagittal-plane knee motion estimation, not accounting for multi-planar dynamics such as frontal or transverse movements. Future research will address these limitations by expanding the dataset, incorporating multi-planar analysis, and testing the models in embedded prosthetic prototypes under real-world conditions.
In summary, this work demonstrated the feasibility of specialized CNN-based TinyML models for real-time gait analysis on embedded platforms. By bridging efficient model design with low-cost deployment, it paves the way toward more responsive, adaptive, and accessible prosthetic technologies that can enhance mobility and quality of life for individuals with lower-limb impairments, particularly in resource-constrained communities.