Automated Deadlift Techniques Assessment and Classification Using Deep Learning

Grymyr, Wegar Lien; Lawal, Isah A.

doi:10.3390/ai6070148

Open AccessFeature PaperArticle

Automated Deadlift Techniques Assessment and Classification Using Deep Learning

by

Wegar Lien Grymyr

and

Isah A. Lawal

^*

Department of Applied Data Science, Noroff University College, Tordenskjoldsgate 9, 4612 Kristiansand, Norway

^*

Author to whom correspondence should be addressed.

AI 2025, 6(7), 148; https://doi.org/10.3390/ai6070148

Submission received: 29 May 2025 / Revised: 4 July 2025 / Accepted: 4 July 2025 / Published: 7 July 2025

Download

Browse Figures

Versions Notes

Abstract

This paper explores the application of deep learning techniques for evaluating and classifying deadlift weightlifting techniques from video input. The increasing popularity of weightlifting, coupled with the injury risks associated with improper form, has heightened interest in this area of research. To address these concerns, we developed an application designed to classify three distinct styles of deadlifts: conventional, Romanian, and sumo. In addition to style classification, our application identifies common mistakes such as a rounded back, overextension at the top of the lift, and premature lifting of the hips in relation to the back. To build our model, we created a comprehensive custom dataset comprising lateral-view videos of lifters performing deadlifts, which we meticulously annotated to ensure accuracy. We adapted the MoveNet model to track keypoints on the lifter’s joints, which effectively represented their motion patterns. These keypoints not only served as visualization aids in the training of Convolutional Neural Networks (CNNs) but also acted as the primary features for Long Short-Term Memory (LSTM) models, both of which we employed to classify the various deadlift techniques. Our experimental results showed that both models achieved impressive F1-scores, reaching up to 0.99 for style and 1.00 for execution form classifications on the test dataset. Furthermore, we designed an application that integrates keypoint visualizations with motion pattern classifications. This tool provides users with valuable feedback on their performance and includes a replay feature for self-assessment, helping lifters refine their technique and reduce the risk of injury.

Keywords:

deadlift classification; exercise classification; pose estimation

1. Introduction

Exercise, particularly weightlifting, plays a significant role in many people’s fitness routines [1]. Among the various weightlifting exercises, the deadlift is a popular multi-joint movement that targets key muscle groups such as the back, hips, and glutes [2]. Deadlifts also come in various forms, each requiring specific execution criteria and offering distinct benefits. For instance, a conventional deadlift involves a shoulder-width stance with hands gripping the bar outside the knees. In contrast, the sumo deadlift employs a wider stance with feet pointing outward and the grip inside the knees. The Romanian deadlift closely resembles the conventional form but necessitates a slight knee flexion and a hip hinge for lifting. Despite its effectiveness, deadlifts carry a high risk of injury, especially when performed with heavy weights or poor technique [3,4,5]. This can lead to uneven distribution of weight and force across the joints and muscles.

Research highlights the prevalence of injuries among weightlifters. A 2018 study by Strömback et al. [4] found that out of 104 sub-elite weightlifters, 70% reported being currently injured with 87% experiencing an injury within the past year. The most frequently affected areas included the lower back, shoulders, and hips—all of which are heavily engaged during deadlifting. Similarly, a study by Alekseyev et al. [3] on CrossFit athletes revealed that 33.3% of them were currently injured with back injuries accounting for 32.2% of all injuries. Notably, deadlifts were the second leading cause of injuries, following squats [3]. These data underscore the importance of proper guidance when engaging in such exercises. Trainers or specialists can help ensure correct technique and reduce the risk of injury [6,7,8]. However, access to this expertise is not universal; personal or financial constraints often limit individuals’ opportunities to receive adequate instruction. Consequently, many newcomers may feel intimidated by complex lifts like deadlifts and worried about the potential for injury or embarrassment without proper support.

In recent years, machine learning techniques, particularly deep learning, have gained traction in analyzing performance across various sports, including golf [9], skiing [10], and cricket [11]. However, the application of these advanced methods in analyzing weightlifting performance remains largely uncharted. Exploring the potential of deep learning in this area could provide valuable insights and enhance the understanding of weightlifting techniques and athlete performance.

To address the challenges associated with deadlift training, this study proposes the development of a deep learning model that assesses and classifies deadlift techniques using video input captured from a user’s phone. By employing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) models, and human pose estimation (HPE), the proposed method aims to classify different deadlift variations, evaluate the correctness of technique execution, and provide constructive feedback to users as needed. The main contributions of this research include the following:

A custom dataset of deadlift repetitions, which covers three styles (conventional, sumo, and Romanian) and different execution forms. This dataset features consistent annotations and data augmentations.
An implementation and evaluation of two approaches for classifying deadlift styles and forms: the 2+1D CNN and the LSTM, utilizing raw videos with keypoint annotations and solely keypoint scores, respectively.
Experimental insights into the limitations of using 3D CNNs for generalization, highlighting the effectiveness of a keypoint-based LSTM approach.
The development of an app that enables real-time deadlift classification and offers intelligent feedback. The app has potential not only for deadlift exercises but also for other similar repetition-based workouts.

The remainder of this paper is structured as follows: Section 2 provides an overview of related studies, Section 3 describes the methodology, Section 4 presents the experimental results, and Section 5 details the implementation of the deadlift web application. Finally, Section 6 provides the conclusions of the paper.

2. Related Work

A deep learning system can effectively utilize HPE to identify key body joints and their coordinates for classifying sports activities, such as the deadlift [12]. These joints, known as keypoints, connect to form pairs, which create a vector representing the human body. One significant advancement in HPE is Google’s DeepPose, which was developed by Toshev and Szegedy [13]. This methodology employs a deep neural network to estimate all joints in a frame without relying heavily on handcrafted features. Another notable advancement is BlazePose [14], which is a lightweight, real-time HPE system optimized for various platforms through MediaPipe integration. MoveNet [15] is also commendable for its high accuracy with minimal computational requirements, making it suitable for mobile applications. The intersection of deep learning and pose estimation has been notably applied in sports activity classification [16]. For instance, Ingwersen et al. [9] employed a CNN to analyze golf skills by predicting scores and creating classification groups. Similarly, Wang et al. [10] developed an AI coach that assesses freestyle skiing performance using pose estimation, providing feedback on execution. In another application, Singh et al. [17] used pose estimation to classify military press exercises, converting keypoint data into multivariate time series for detailed analysis. However, the benefits of these technologies extend beyond sports; they are instrumental in various motor learning applications.

When it comes to video classification for weightlifting analysis, while traditional CNNs are commonly employed, they primarily extract spatial features from individual images. Video classification necessitates capturing both spatial and temporal information, making single-frame analysis insufficient for distinguishing among similar techniques, like various deadlift styles. To tackle this challenge, spatiotemporal architectures such as 3D CNNs are often utilized. The R(2+1)D model, developed by Tran et al. [18], successfully performs action recognition in videos while minimizing the computational load associated with standard 3D CNNs. An alternative method for handling sequential data involves LSTMs, which are effective for human action recognition when poses are represented numerically. For example, Moodley and van der Haar [11] developed a cricket stroke recognition system that integrates CNN and LSTM, relying solely on extracted keypoint values as features, thereby overcoming limitations typically faced in standard CNN approaches.

To maximize the effectiveness of sports activity classification models, feedback mechanisms must be tailored to the tasks at hand. Feedback can take various forms, such as concurrent (provided during the task) or terminal (given after task completion), and they can be visual, auditory, haptic, or a combination of these. A study by Walsh et al. [8] examined novice endoscopists’ performance with both types of feedback during a colonoscopy simulation. The findings revealed that although both groups performed similarly in pre-, post-, and retention tests, those receiving terminal feedback significantly outperformed their peers in a transfer test. This suggests that terminal feedback enhances understanding, enabling learners to apply their knowledge in different contexts. In the realm of deadlift feedback, this means that users can transfer their understanding of proper form to other lifts requiring similar techniques, such as maintaining a neutral spine and executing a hip hinge. Additionally, terminal feedback can alleviate cognitive load during the task, allowing users to focus more on movement patterns rather than on instructions. This delayed form of feedback encourages self-reflection, promoting a deeper understanding of task execution and ultimately fostering independence in techniques [8].

Inspired by these successful methodologies, this paper employs pose estimation, CNNs, and LSTMs to classify deadlift styles and execution forms. These advanced techniques are well-suited for capturing precise body positioning and temporal movement intricacies, thus enhancing the overall accuracy of activity classification. Additionally, the paper presents an app that offers real-time feedback to users on their deadlift performance. Table 1 compares the main characteristics of various state-of-the-art methods for assessing and classifying sports activity with those of our proposed approach.

3. Methodology

3.1. Data Generation and Preprocessing

We developed our dataset for deadlift exercises due to the limited availability of publicly accessible datasets in this area. To achieve this, we recorded videos of various deadlift styles and common mistakes using a smartphone mounted on a tripod. Each video focused on a single repetition of a specific deadlift style to ensure clarity and detailed analysis. We captured the recordings from the side to highlight crucial aspects, such as whether the lifter maintained a straight or rounded back during the lift. Our dataset comprises three styles of deadlifts: conventional, Romanian, and sumo. Each style is categorized based on its execution form, including Correct, Rounded Back (characterized by excessive spinal flexion), Overextension (hyperextension of the lower back at lockout), and Early Hip Elevation (lifting the hips before the back). Notably, Early Hip Elevation is excluded from the Romanian style, as a raised hip is a natural aspect of its execution. For clarity, the labeling system we used indicates the type of deadlift and the execution form. For example, “Conv_corr_1” refers to the first sample of a conventional deadlift performed correctly. Figure 1, Figure 2 and Figure 3 provide visual examples of each style executed correctly, while Figure 4 shows an example of wrong deadlift style. Table 2 summarizes the dataset distribution.

The videos were recorded on an iPhone 12 in high-quality MOV format. However, reading these high-resolution videos using the OpenCV library caused delays during training. To address this issue, we converted the videos to the more widely used MP4 format with FFMPEG, resizing them to a resolution of 480 × 270 pixels while maintaining their aspect ratio. This conversion significantly reduced the file sizes from 15–60 MB down to approximately 75–700 KB each, which enhanced processing speed. Additionally, to enrich our dataset, we utilized OpenCV to flip the videos horizontally. This technique enabled our model to learn from recordings taken from both sides of the subject. Each sample was processed to create a flipped version before being saved as an MP4 file, further enhancing the diversity and usability of our dataset.

In this initial phase of the project, the data collection was carried out solely by the authors, two males in their 30s and 40s. We recognize that this limitation may affect the size and diversity of our dataset. However, we plan to expand our dataset in the future by involving a more diverse group of participants. This initial work serves as a proof of concept, laying the groundwork for our ongoing research in this area.

3.2. Feature Generation

We employed the MoveNet Thunder model [15] to extract keypoint scores from videos due to its lightweight architecture and seamless integration with Keras machine learning models, which results in efficient output. The generated keypoint scores are structured in a tensor array with the dimensions (frames, n, m). Here, n denotes the number of keypoints detected, while m comprises the x and y coordinates of these points along with a confidence score that reflects the certainty of the model in predicting their positions. Since MoveNet identifies 17 keypoints, the tensor array is organized as (frames, 17, 3). There are three primary approaches to leverage the generated keypoint features:

Direct Model Input: The keypoint coordinates can be directly utilized in machine learning models like LSTM. This approach effectively minimizes noise from the background, clothing, and lighting, allowing the model to concentrate more accurately on joint locations. This focus can enhance the ability of the model to generalize across various environments.
Skeletal Representation for CNNs: The keypoint data can also be transformed into a skeletal representation of the individual, which is suitable for use in CNN models. Similar to the first method, this skeletal representation helps reduce noise and directs the model’s attention toward significant joint locations.
Overlaying on Video: Lastly, the keypoint data can be utilized to overlay a skeletal representation onto the subject within the original video. This technique preserves the context of the video, enabling the model to capture cues that might be overlooked by the human pose estimator.

By employing these methods, we optimize the analysis of human motion in the deadlift videos while maintaining clarity and focus on key joint locations.

3.3. Classification Approach

To classify deadlift activities more effectively, we used a two-step approach. This method is necessary because the different deadlift styles are quite similar in execution. In the first step, we developed a model to identify which of the three deadlift styles (conventional, Romanian, or sumo) each movement corresponds to. In the second step, we created individual models for each style to evaluate the specific execution details. For this project, we utilized two deep learning techniques: a CNN and an LSTM network. Figure 5 shows the flowchart of the two-step classification approach.

In our study, we employed a CNN approach based on the 2+1D model framework proposed by Tran et al. [18]. This innovative model diverges from conventional CNNs, which typically analyze images in two dimensions (height and width) along with color channels. The 2+1D framework introduces a time dimension by segmenting the 3D convolution process into two distinct steps: spatial convolution, which extracts features from individual frames, and temporal convolution, which captures movement across multiple frames. For our model, we input 10 evenly spaced frames from each video sample after resizing them to 224 × 224 pixels. This results in an input size of (10, 224, 224, 3). The data are processed through a 2+1D convolutional layer utilizing 16 filters with a kernel size of (3, 7, 7) for both spatial and temporal convolutions. We follow this step with batch normalization and a ReLU activation function; then, we downsample the spatial data. The downsampled output is passed through four consecutive residual CNN blocks, enhancing the feature extraction from the video sequences. Finally, a global average pooling layer converts the feature map into a singular, compact vector, which is flattened for the classification of the deadlift style. The model architecture is similarly adapted to classify execution forms within each deadlift style, modifying the final dense layer to accommodate the number of forms unique to each style. For example, the Romanian deadlift has three classifications (correct, rounded back, and overextension), while the conventional and sumo styles have four.

Additionally, we explored the use of an LSTM model to assess whether pose keypoint data alone could effectively classify different deadlift styles and identify errors in form execution. The LSTM algorithm was selected for its capability to understand and learn patterns in sequential movements, which is a feature that is crucial for analyzing deadlift techniques. In our approach, keypoint sequences serve as input to the LSTM, as outlined in Section 3.2. Given that the LSTM requires a consistent number of frames, we standardized our input by managing the frame count in the deadlift videos, which typically range from 83 to 301 frames. For videos exceeding 200 frames, we retained only the first 200 frames, focusing on the initial lifting phase critical for evaluating proper technique. In contrast, for videos shorter than 200 frames, we applied linear interpolation to evenly sample keypoints, ensuring that essential details from shorter clips remained intact. The interpolation is based on standard linear interpolation, where for a keypoint with coordinates

(x, y)

, across two frames

(x_{1}, y_{1})

to

(x_{2}, y_{2})

, the interpolated value y at a new position x is computed as

y = y_{1} + (\frac{(x - x_{1}) \times (y_{2} - y_{1})}{(x_{2} - x_{1})})

(1)

For instance, if a clip has only 150 frames, we calculated positions for 50 additional frames to complete the 200-frame requirement by spreading the original frames across the full range from 1 to 200, utilizing Numpy’s interpolation tools [19]. After generating the complete keypoint sequences, we reshaped the initial array from (frames, 17, 3) to (frames, 51) by flattening the keypoints along with their coordinates and confidence scores. These keypoint sequences are then introduced to the first LSTM layer to extract temporal features, which are subsequently refined through a second LSTM layer for further condensation. To mitigate overfitting, we incorporated a dropout layer that randomly deactivates some neurons during training. Following this, a fully connected layer with a ReLU activation function introduces non-linearity, enabling the model to learn more intricate patterns. Finally, we classify the deadlift style using a dense layer with a softmax activation function, producing the final output.

4. Experimentation and Results

4.1. Experimental Setup

We processed the video dataset of deadlifts using OpenCV-python 4.11.0.86 and developed classification models with a machine-learning library in Python 3.9. To ensure a robust evaluation of our models, we divided the dataset into three parts: training, testing, and validation, using a 70/15/15 split. For our initial model, we employed a 2+1D CNN architecture, training it on raw video data annotated with keypoints. We utilized the Adam optimizer with a learning rate of 1 × 10⁻⁵ and set a batch size of 10 to achieve a balance between training and validation losses. To mitigate overfitting, we implemented early stopping and restored the best weights when the model’s performance showed no further improvement. In addition to the CNN model, we trained an LSTM model that focused specifically on the annotated keypoints extracted from the raw videos. For this model, we also used the Adam optimizer but opted for a higher learning rate of 1 × 10⁻⁴, employing categorical cross-entropy as the loss function. A batch size of 32 was selected, and similar to the CNN, we applied early stopping and weight restoration to address overfitting once the model’s learning plateaued.

To assess the performance of both models, we evaluated their classification accuracy using key performance metrics: precision, recall, and the F1-score, which are all applied to a test set. The F1-score, which ranges from 0 to 1, provides a comprehensive measure of model effectiveness by calculating the harmonic mean of precision and recall. A score close to 1 is particularly desirable, as it signifies strong classification performance. These metrics were selected due to their broad acceptance as standards for assessing the effectiveness of classification models, ensuring a reliable evaluation process.

Finally, we compared the strengths and weaknesses of the 2+1D CNN and LSTM models in the context of deadlift style and form classification, aiming to glean insights into their performance differences. All experiments were conducted on a PC with the following specifications: Intel(R) Core(TM) i7-14700F processor at 2.10 GHz, 32 GB RAM, NVIDIA GeForce RTX 4070 Ti SUPER, and Microsoft Windows 11 Home.

4.2. Results and Discussion

We performed three detailed experiments, each focusing on specific objectives. Our goals included identifying the most effective model for recognizing deadlift styles, determining the best model for assessing deadlift execution, and comparing the robustness of LSTM models with CNN-based methods. A summary of each experiment’s specifications is shown in Table 3, and the results are discussed in the following sections.

4.2.1. Deadlift Style Classifications

Figure 6 (left) presents the confusion matrix illustrating the LSTM model’s effectiveness in classifying deadlift styles. The model demonstrated impressive performance, successfully identifying all sumo and conventional deadlift styles and only missing one prediction for the Romanian style, resulting in an average F1-score of 0.99. In comparison, the 2+1D CNN model yielded a slightly lower mean F1-score of 0.97, with five misclassifications in the test set, as depicted in Figure 6 (right). To ensure a fair evaluation, we also trained another variant of the 2+1D CNN using raw video samples without keypoint overlays. This model achieved a test set mean F1-score of 0.96, which is marginally lower than the 0.97 of the CNN trained with keypoint overlays. This difference may be attributed to random weight initializations. Table 4 summarizes the performance comparison among the LSTM trained on keypoints and the two variations of the 2+1D CNN models. Overall, the LSTM consistently outperformed both versions of the CNN, while the CNN with keypoint overlays slightly surpassed the model that utilized only raw videos.

We evaluate the robustness of our top-performing LSTM-based model by comparing it with a state-of-the-art deep learning architecture for human activity classification. Specifically, we implement the deep 3DCNN proposed by Roberta Vrskova et al. [20]. For this comparison, we train the 3DCNN on the same deadlift video dataset both with and without keypoint annotations. As shown in Table 5, the performance metrics for all methods on the test set reveal that our proposed LSTM-based model performs comparably to the 3DCNN. Notably, both models achieve impressive mean F1-scores with the LSTM model scoring 0.99. The 3DCNN trained with annotated videos achieves a mean F1-score of 0.98, while the model trained using only raw video data scores slightly lower at 0.97. These findings reinforce our assertion that combining keypoint coordinates with a simpler deep learning model, such as LSTM, effectively enhances the recognition of weightlifting activities like the deadlift.

4.2.2. Deadlift Execution Form Classifications

We evaluated the models designed to analyze the execution forms of three deadlift styles: conventional, sumo, and Romanian. Our study included two distinct experimental setups. In one, we utilized LSTM models trained solely on keypoint data. On the other hand, we employed 2+1D CNN models trained on raw video data supplemented with keypoint overlays.

In Figure 7, we present the confusion matrices illustrating the performance of the models for conventional deadlift execution forms. The left matrix reflects results from the LSTM model, which achieved a mean F1-score of 0.91. However, it faced challenges, misclassifying a correctly executed form as overextension and confusing early hip elevation with rounded-back executions. The CNN model, depicted on the right, demonstrated superior performance with an average F1-score of 0.99. It made only one misclassification, mistakenly identifying an instance of overextension as early hip extension. A review of the corresponding videos (Figure 8 and Figure 9) revealed that the two forms share notable similarities particularly in their initial and final frames. However, there are clear differences in the movement patterns of the hips and shoulders, prompting us to consider whether the model’s decisions could be influenced by misleading features.

Turning to the Romanian deadlift execution forms, showcased in Figure 10, the LSTM model achieved an average F1-score of 0.85. It encountered difficulties, particularly with overextension samples, misclassifying seven of them as the correct form. In contrast, the CNN model excelled, achieving a perfect mean F1-score of 1.00 by accurately identifying all Romanian deadlift execution forms.

Figure 11 outlines the performance of both LSTM and CNN models for sumo deadlift execution forms. Here, the LSTM model achieves an average F1-score of 0.78 as it struggles to differentiate among correct forms, early hip elevation, and overextension, though it performed adequately in identifying rounded-back executions. The CNN model, on the other hand, achieved a higher mean F1-score of 0.95, albeit with some misclassifications between early hip elevation and rounded-back forms.

Lastly, we conducted a comparative analysis of model performance across the different deadlift styles and execution forms. Our findings, summarized in Table 6, indicate that the 2+1D CNN trained on videos with keypoint overlays consistently outperformed the LSTM model. The enhanced performance of the CNN can be attributed, in part, to its ability to leverage the context captured in the original video, allowing it to recognize cues that might be overlooked by the human pose estimator.

4.3. Comparative Analysis

Table 4 and Table 6 provide a comparison of the performance between the LSTM and CNN models in classifying deadlift style and form. Notably, Table 6 reveals that the CNN model, particularly when supplemented with keypoint overlays, consistently achieved better classification results for deadlift execution forms compared to the LSTM model. This aligns with the findings of Ingwersen et al. [9], who noted that incorporating keypoint data can enhance model accuracy. However, this improved performance comes at the cost of increased computational resources due to the additional processing requirements. In contrast, the LSTM model, trained solely on keypoint values, still demonstrated promising results. Despite its simpler input data and lower computational load, the LSTM produced competitive outcomes, suggesting that the keypoint sequences contain sufficient information to differentiate between various execution forms. One challenge faced by the LSTM model was its difficulty in distinguishing between overextension and proper execution, as these two forms are remarkably similar, with only subtle differences visible mid-lift before they converge again. This complexity is illustrated in Figure 12 and Figure 13. Interestingly, the LSTM was effective at identifying rounded-back executions, which raised initial concerns due to the absence of keypoints on the back. It turned out that the subject’s downward head tilt played a crucial role in this classification, as the head keypoints became significant indicators (see Figure 14). While this explains the LSTM’s strong performance in certain aspects, it is important to consider that such cues may not always be applicable in real-world scenarios, where spinal curvature during a lift could be influenced by heavy bar weights rather than merely head movement.

5. Deadlift Classification and Feedback Application

We have developed an innovative deadlift classification app that leverages our trained LSTM model to provide users with real-time feedback on their performance. This feedback is delivered through a combination of visual and textual prompts generated from the classifier’s analysis, ensuring that users receive clear and actionable insights. The choice of an LSTM model was strategic; it not only delivers excellent performance but also requires less computational power thanks to its streamlined input process. The app is built using Python and incorporates several key features that enhance the user experience:

A webcam feed powered by the OpenCV library, which captures the user’s movements in real time.
A detection pipeline that analyzes joint angles to accurately identify completed deadlift repetitions.
A sequence generation feature that prepares keypoints for classification using the LSTM model.
A feedback system that overlays visual cues on the user’s movements and delivers text prompts generated by the Gemini Large Language Model.

For those interested in the technical details, the implementation code and the generated dataset are available on our GitHub repository: https://github.com/weggry/deadlift-classifier, accessed on 26 May 2025.

5.1. Hip Angle-Based Rep Detection

When the app is launched, the webcam feed automatically begins. However, the user needs to get into position to start the lift, and recording this process does not provide useful information for our classification model. Therefore, we implement a system to start and stop the recording only during the deadlift activity. To identify when the deadlift begins and ends, we use keypoints from every frame of the webcam feed. All deadlift variations have specific angle thresholds between the knee, hip, and shoulder joints that indicate the start and end of the lift. We calculate the hip angle (

θ

) using vectors from these keypoints as follows:

θ_{h i p} = {cos}^{- 1} (\frac{(\vec{s} - \vec{h}) \cdot (\vec{k} - \vec{h})}{∥ \vec{s} - \vec{h} ∥ \cdot ∥ \vec{k} - \vec{h} ∥})

(2)

where

\vec{s}

= shoulder position,

\vec{h}

= hip position, and

\vec{k}

= knee position, keypoint vector, respectively. Once we have the hip angle, we can determine when to start recording the deadlift. We set a start threshold at a hip angle below 85 degrees, which indicates the beginning of the lift. For the top of the deadlift, we set a threshold of 160 degrees, corresponding to the highest point of the lift, which should be close to 180 degrees. The recording stops when the hip angle drops below 85 degrees again, as long as the start and top thresholds are previously met.

5.2. Classification and Feedback

Once the app detects a completed deadlift rep, it kicks off a series of steps to assess the style and form of the lift. First, it takes the key features captured during the lift and organizes them into a structured format (200 frames with 51 features each) suitable for our LSTM model. This prepared data are then fed into the LSTM for classification. After the LSTM processes the data, it sends its output to a custom-built Gemini language model, which generates clear and informative feedback for the user. We chose Google’s Gemini (Google-Gemai 1.9.0) model for its easy API access and implementation. The feedback begins by acknowledging how the lift was executed. If any issues are identified, such as a rounded back, early hip elevation, or overextension, the app provides helpful tips on how to correct those specific problems. On the other hand, if the lift is classified as correct, users receive encouraging words to keep up the good work. To enhance understanding, the app replays ten evenly spaced keypoint-annotated frames throughout the recorded lift in a GIF-like format. This visual representation helps users grasp the context of the feedback they receive. A screenshot of the app displaying the classification results and user feedback is shown in Figure 15.

6. Conclusions

This paper explores the assessment and classification of deadlifts through video analysis. We emphasize the significant advantages that deep learning, particularly models focused on keypoints, brings to the automatic evaluation of deadlift techniques. Our research involved a comparison between two deep learning approaches: CNN and LSTM. Our findings reveal that a specialized CNN known as the 2+1D CNN can accurately classify various deadlift styles and forms, achieving a mean F1-score of at least 0.95. However, this method necessitates substantial computational power for training. In contrast, the LSTM model, which utilizes raw sequences of keypoint data, demonstrated even more impressive results, achieving an average F1-score of 0.99 in style classification and at least 0.78 for execution forms. This indicates that keypoint data alone are valuable for effective motion analysis. To further this research, we developed an application that implements real-time keypoint extraction, repetition detection, and LSTM-based classification, all while providing intelligent feedback through an LLM. This application represents a significant advancement in delivering accessible, objective, and personalized feedback for lifters, which can help mitigate injury risks and enhance training efficiency. Looking ahead, we plan to improve our training dataset by involving a more diverse group of participants. We also aim to investigate the integration of multiple sensors for richer data representation and conduct thorough user testing to evaluate our system’s performance and to understand its impact on the lifting community effectively.

Author Contributions

W.L.G. conducted this research as part of his thesis project. I.A.L. contributed by planning and supervising the work as well as drafting and reviewing the manuscript. The methodology, analysis, and discussion were carried out collaboratively by all participants. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable, as the studies only involved the two authors, who have consented to publish this work.

Informed Consent Statement

We did not use pictures of individuals other than the authors of the manuscript, and we have all consented to publish this work.

Data Availability Statement

The source code and the generated dataset are available on our GitHub repository: https://github.com/weggry/deadlift-classifier, accessed on 26 May 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Statistics Norway (SSB). Sports and Outdoor Activities, Survey on Living Conditions. 2021. Available online: https://www.ssb.no/en/kultur-og-fritid/idrett-og-friluftsliv/statistikk/idrett-og-friluftsliv-levekarsundersokelsen (accessed on 21 September 2024).
Piper, T.J.; Waller, M.A. Variations of the Deadlift. Strength Cond. J. 2001, 23, 66–73. [Google Scholar] [CrossRef]
Alekseyev, K.; John, A.; Malek, A.; Lakdawala, M.; Verma, N.; Southall, C.; Nikolaidis, A.; Akella, S.; Erosa, S.; Islam, R.; et al. Identifying the Most Common CrossFit Injuries in a Variety of Athletes. Rehabil. Process Outcome 2020, 9, 1–9. [Google Scholar] [CrossRef] [PubMed]
Strömback, E.; Aasa, U.; Gilenstam, K.; Berglund, L. Prevalence and Consequences of Injuries in Powerlifting: A Cross-sectional Study. Orthop. J. Sport Med. 2018, 6, 2325967118771016. [Google Scholar] [CrossRef] [PubMed]
Mateus, N.; Abade, E.; Coutinho, D.; Gómez, M.Á.; Peñas, C.L.; Sampaio, J. Empowering the Sports Scientist with Artificial Intelligence in Training, Performance, and Health Management. Sensors 2025, 25, 139. [Google Scholar] [CrossRef] [PubMed]
Sigrist, R.; Rauter, G.; Riener, R.; Wolf, P. Terminal Feedback Outperforms Concurrent Visual, Auditory, and Haptic Feedback in Learning a Complex Rowing-Type Task. J. Mot. Behav. 2013, 45, 455–472. [Google Scholar] [CrossRef] [PubMed]
Tharatipyakul, A.; Choo, K.T.W.; Perrault, S.T. Pose Estimation for Facilitating Movement Learning from Online Videos. In Proceedings of the International Conference on Advanced Visual Interfaces, New York, NY, USA, 28 September–2 October 2020. [Google Scholar] [CrossRef]
Walsh, C.M.; Ling, S.C.; Wang, C.S.; Carnahan, H. Concurrent Versus Terminal Feedback: It May Be Better to Wait. Acad. Med. J. Assoc. Am. Med. Coll. 2009, 84, 54–57. [Google Scholar] [CrossRef] [PubMed]
Ingwersen, C.K.; Xarles, A.; Clapés, A.; Madadi, M.; Jensen, J.N.; Hannemose, M.R.; Dahl, A.B.; Escalera, S. Video-based Skill Assessment for Golf: Estimating Golf Handicap. In Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports, New York, NY, USA, 29 October 2023; pp. 31–39. [Google Scholar] [CrossRef]
Wang, J.; Qiu, K.; Peng, H.; Fu, J.; Zhu, J. AI Coach: Deep Human Pose Estimation and Analysis for Personalized Athletic Training Assistance. In Proceedings of the 27th ACM International Conference on Multimedia, New York, NY, USA, 21–25 October 2019; pp. 374–382. [Google Scholar] [CrossRef]
Moodley, T.; van der Haar, D. CASRM: Cricket Automation and Stroke Recognition Model Using OpenPose. In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Posture, Motion and Health, Proceedings of the 11th International Conference, DHM 2020, Copenhagen, Denmark, 19–24 July 2020; Duffy, V.G., Ed.; Springer: Cham, Switzerland, 2020; pp. 67–78. [Google Scholar] [CrossRef]
Rossi, A.; Pappalardo, L.; Cintia, P. A Narrative Review for a Machine Learning Application in Sports: An Example Based on Injury Forecasting in Soccer. Sports 2022, 10, 5. [Google Scholar] [CrossRef] [PubMed]
Toshev, A.; Szegedy, C. DeepPose: Human Pose Estimation via Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar] [CrossRef]
Bazarevsky, V.; Grishchenko, I.; Raveendran, K.; Zhu, T.; Zhang, F.; Grundmann, M. BlazePose: On-device Real-time Body Pose tracking. arXiv 2020, arXiv:2006.10204. [Google Scholar] [CrossRef]
Votel, R.; Li, N. Next-Generation Pose Detection with MoveNet and TensorFlow.js. 2021. Available online: https://blog.tensorflow.org/2021/05/next-generation-pose-detection-with-movenet-and-tensorflowjs.html (accessed on 9 November 2024).
Moustakas, L. Game Changer: Harnessing Artificial Intelligence in Sport for Development. Soc. Sci. 2025, 14, 174. [Google Scholar] [CrossRef]
Singh, A.; Le, T.; Le Nguyen, T.; Whelan, D.; O’Reilly, M.; Caulfield, B.; Ifrim, G. Interpretable Classification of Human Exercise Videos through Pose Estimation and Multivariate Time Series Analysis. In AI for Disease Surveillance and Pandemic Intelligence; Studies in Computational Intelligence; Shaban-Nejad, A., Michalowski, M., Bianco, S., Eds.; Springer: Cham, Switzerland, 2021; Volume 1013, pp. 181–199. [Google Scholar] [CrossRef]
Tran, D.; Wang, H.; Torresani, L.; Ray, J.; LeCun, Y.; Paluri, M. A Closer Look at Spatiotemporal Convolutions for Action Recognition. arXiv 2017, arXiv:1711.11248. [Google Scholar]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
Vrskova, R.; Hudec, R.; Kamencay, P.; Sykora, P. Human Activity Classification Using the 3DCNN Architecture. Appl. Sci. 2022, 12, 931. [Google Scholar] [CrossRef]

Figure 1. Sample of the generated dataset showing a conventional deadlift style execution.

Figure 2. Sample of the generated dataset showing a Romanian deadlift style execution.

Figure 3. Sample of the generated dataset showing a sumo deadlift style execution.

Figure 4. Sample frames illustrate improper deadlift style, where the hips remain fixed in a bent position, and the weight is moved toward the stomach, resembling a rowing motion.

Figure 5. Flowchart of our proposed deadlift classification framework, showing the two-step classification process. First, estimate the subject pose and extract keypoints from the frames using MoveNet. Then, we classify the style of the deadlift. Based on the recognized style, we select one of the three classifiers designed to classify the deadlift execution form further.

Figure 6. The deadlift style classification confusion matrices illustrate the performance of the LSTM model (on the left) and the 2+1D CNN model (on the right) on the test dataset. The diagonal entries of the matrices represent the correct predictions made by the classifiers, while the off-diagonal entries indicate the misclassifications.

Figure 7. The conventional deadlift execution form classification confusion matrices illustrate the performance of the LSTM model (on the left) and the 2+1D CNN model (on the right) when tested on the dataset. The diagonal entries of the matrices represent the correct predictions made by the classifiers, while the off-diagonal entries indicate the misclassifications.

Figure 8. The sample of the conventional deadlift with overextension was misclassified as early hip extension. The similarity in the two execution forms can be seen by comparing the example of a conventional deadlift with an early hip elevation execution, as shown in Figure 9.

Figure 9. An example of a conventional deadlift performed with early hip elevation during the execution.

Figure 10. The Romanian deadlift execution form classification confusion matrices illustrate the performance of the LSTM model (on the left) and the 2+1D CNN model (on the right) when tested on the dataset. The diagonal entries of the matrices represent the correct predictions made by the classifiers, while the off-diagonal entries indicate the misclassifications.

Figure 11. The sumo deadlift execution form classification confusion matrices illustrate the performance of the LSTM model (on the left) and the 2+1D CNN model (on the right) when tested on the dataset. The diagonal entries of the matrices represent the correct predictions made by the classifiers, while the off-diagonal entries indicate the misclassifications.

Figure 12. A sample of a conventional deadlift style executed with the correct form.

Figure 13. A sample of a conventional deadlift style executed with an overextension form.

Figure 14. An illustration of the keypoints annotated on a sequence of a conventional deadlift style performed with a rounded back.

Figure 15. An illustration of the proposed app output, showing the identified deadlift style and execution form, feedback provided to the user for improvement, along with a replay of the annotated user’s deadlift for self-reflection.

Table 1. Main features of existing sports activity classification methods and their comparison with our proposed deadlift assessment and classification method.

Properties	[9]	[10]	[17]	[11]	Ours
Employs non-intrusive	✓	✓	✓	✓	✓
activity classifications	✓	✓	✓	✓	✓
Achieves substantial	✓	✓	✓	✓	✓
classification performances	✓	✓	✓	✓	✓
Requires inexpensive	✓	✓		✓	✓
feature engineering	✓	✓		✓	✓
Employs human		✓	✓	✓	✓
pose estimation		✓	✓	✓	✓
Incorporates feedback		✓			✓
on user performance		✓			✓
Incorporates replay feature					✓
for self-evaluation					✓

Table 2. The class distribution of the generated dataset.

Dataset Files
File Name	Style	Number of Files
Conv_corr_n	Conventional Correct	100
Conv_ehe_n	Conventional Early Hip Elevation	100
Conv_oe_n	Conventional Back Over Extension	100
Conv_rb_n	Conventional Rounded Back	100
R_corr_n	Romanian Correct	100
R_oe_n	Romanian Back Over Extension	100
R_rb_n	Romanian Rounded Back	100
S_corr_n	Sumo Correct	100
S_ehe_n	Sumo Early Hip Elevation	100
S_oe_n	Sumo Back Over Extension	100
S_rb_n	Sumo Rounded Back	100

Table 3. The details of three experiments conducted to evaluate the proposed deadlift classification method are presented. For each experiment, we specify its goal, classifier input, and predictions along with corresponding figure and table numbers that display the results.

Exp.	Goal	Input	Prediction	Result
1	To assess the best classifier for deadlift style recognition	All deadlift video and or keypoint data	Labels of the style of the deadlift being performed from previously unseen deadlift videos	Table 4 and Table 5, Figure 6
2	To assess the best classifier for identifying the correct deadlift execution form	All videos depicting the different forms for each deadlift style	Labels of the execution form from previously unseen deadlift videos	Table 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11
3	To compare the robustness of LSTM against the CNN-based model designs	All deadlift video and or keypoint data	Labels of the style and execution forms of the deadlift being performed from previously unseen deadlift videos	Table 4, Table 5 and Table 6

Table 4. The performance of our proposed methods for the three deadlift styles (conventional, Romanian, and sumo) classification on the test dataset.

Method	Metric	Romanian	Sumo	Conventional	Average
2+1D CNN (with video frames only)	Precision	0.92	0.98	0.97	0.96
	Recall	0.98	0.97	0.94	0.96
	F1-score	0.95	0.98	0.95	0.96
2+1D CNN (with video + keypoint overlay)	Precision	1.00	0.95	0.97	0.97
	Recall	0.96	0.97	0.98	0.97
	F1-score	0.98	0.96	0.98	0.97
LSTM (with keypoint vectors)	Precision	1.00	1.00	0.98	0.99
	Recall	0.98	1.00	1.00	0.99
	F1-score	0.99	1.00	0.99	0.99

Table 5. Comparison of the performance of our proposed LSTM-based method with another state-of-the-art deep learning model used for human activity classification on the test dataset.

Method	Metric	Romanian	Sumo	Conventional	Average
3DCNN (with annotated videos) [20]	Precision	1.00	1.00	0.97	0.99
	Recall	1.00	0.97	1.00	0.99
	F1-score	1.00	0.98	0.98	0.98
3DCNN (with raw videos) [20]	Precision	1.00	1.00	0.93	0.97
	Recall	0.94	0.97	1.00	0.97
	F1-score	0.97	0.98	0.96	0.97
LSTM (with keypoint vectors) (proposed)	Precision	1.00	1.00	0.98	0.99
	Recall	0.98	1.00	1.00	0.99
	F1-score	0.99	1.00	0.99	0.99

Table 6. A summary of the performance of the proposed 2+1D CNN and LSTM-based classifiers on the test dataset in predicting the execution forms for three deadlift styles.

Method	Style	Metric	Back Overextension	Rounded Back	Early Hip Elevation	Correct Form	Average
2+1D CNN	Conventional	Precision	1.00	1.00	0.94	1.00	0.99
		Recall	0.94	1.00	1.00	1.00	0.99
		F1-score	0.97	1.00	0.97	1.00	0.99
LSTM	Conventional	Precision	0.94	0.92	0.79	1.00	0.91
		Recall	1.00	0.75	0.94	0.94	0.91
		F1-score	0.97	0.83	0.86	0.97	0.91
2+1D CNN	Romanian	Precision	1.00	1.00	-	1.00	1.00
		Recall	1.00	1.00	-	1.00	1.00
		F1-score	1.00	1.00	-	1.00	1.00
LSTM	Romanian	Precision	1.00	1.00	-	0.70	0.90
		Recall	0.56	1.00	-	1.00	0.85
		F1-score	0.72	1.00	-	0.82	0.85
2+1D CNN	Sumo	Precision	0.94	0.88	1.00	1.00	0.96
		Recall	1.00	0.94	0.88	1.00	0.96
		F1-score	0.97	0.91	0.93	1.00	0.95
LSTM	Sumo	Precision	0.57	1.00	1.00	0.72	0.82
		Recall	0.75	1.00	0.56	0.81	0.78
		F1-score	0.65	1.00	0.72	0.76	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Grymyr, W.L.; Lawal, I.A. Automated Deadlift Techniques Assessment and Classification Using Deep Learning. AI 2025, 6, 148. https://doi.org/10.3390/ai6070148

AMA Style

Grymyr WL, Lawal IA. Automated Deadlift Techniques Assessment and Classification Using Deep Learning. AI. 2025; 6(7):148. https://doi.org/10.3390/ai6070148

Chicago/Turabian Style

Grymyr, Wegar Lien, and Isah A. Lawal. 2025. "Automated Deadlift Techniques Assessment and Classification Using Deep Learning" AI 6, no. 7: 148. https://doi.org/10.3390/ai6070148

APA Style

Grymyr, W. L., & Lawal, I. A. (2025). Automated Deadlift Techniques Assessment and Classification Using Deep Learning. AI, 6(7), 148. https://doi.org/10.3390/ai6070148

Article Menu

Automated Deadlift Techniques Assessment and Classification Using Deep Learning

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Generation and Preprocessing

3.2. Feature Generation

3.3. Classification Approach

4. Experimentation and Results

4.1. Experimental Setup

4.2. Results and Discussion

4.2.1. Deadlift Style Classifications

4.2.2. Deadlift Execution Form Classifications

4.3. Comparative Analysis

5. Deadlift Classification and Feedback Application

5.1. Hip Angle-Based Rep Detection

5.2. Classification and Feedback

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI