1. Introduction: Digital Rediscovery of Anthropometry
The use of human body measurements to establish identity has a long forensic tradition dating back to the late 19th century and the Bertillonage system developed by Alphonse Bertillon. Bertillon believed that individuals could be uniquely identified by a combination of physical measurements, such as head, finger, or ear length [
1].
However, this system was critically questioned early on: the measurements were often inconsistent, depending on the person taking them and the equipment used, and many of the characteristics used proved to be insufficiently individual. In addition, the method was time-consuming and difficult to standardize, which limited its practical application [
2,
3].
With the advancement of forensics, Bertillonage was eventually replaced by other biometric characteristics, dactyloscopy (fingerprint analysis), and later also by DNA. These were more reliable, easier to capture and offered better proof of uniqueness [
4].
However, in recent decades, anthropometric analysis has regained importance, especially in situations where established passive biometric features (such as the face or fingerprint) are obscured or unavailable. With the spread of digital video surveillance and the increase in visual data sources in forensic investigations, body proportions, posture and movement patterns are increasingly coming into focus as alternative identifiers [
5]. Digitization has opened up new ways to extract anthropometric features from image and video material and processing them in a three-dimensional (3D) space. These computer-assisted methods enable measurements to be taken under less controlled conditions and form the basis for modern dynamic anthropometry. Studies on height estimation and body measurement analysis from video material already show that precise measurements are possible even under non-ideal conditions [
6].
At the same time, methods of human pose estimation (HPE) and thus the automatic estimation of human joint positions in two-dimensional (2D) and 3D have developed rapidly. Recent reviews show considerable progress in accuracy, robustness and efficiency, as well as a steadily growing database for training and validating such models [
7,
8]. These developments show potential and new perspectives for possible use in digital forensics while reducing the time required. However, their use in forensic applications must be critically scrutinized.
This article therefore describes how the use of body measurements in a forensic context has developed from historical bertillonage to modern methods of digital anthropometric rig matching and 3D human pose estimation. It also examines the mathematical and methodological foundations of these developments, analyzes current progress and existing limitations, and discusses future opportunities for forensic integration. The aim is to critically trace the evolution of these methods, classify the forensic validity of digital approaches and highlight potential for future research and application.
2. Literature Search and Review Methodology
This article is structured as a narrative overview. The aim is to summarize the developments and current status of forensic methods based on the use of image and video material that track perpetrator-suspect comparisons using body measurements. A comparison of manual methods and the potential of variants based on human pose estimation is sought.
The literature search was conducted between July 2025 and October 2025 using the PubMed, IEEE Xplore, CORE, and Google Scholar databases. The period covered was up to September 2025. Search terms used included: forensic anthropometry, video-based anthropometry, forensic body measurements, CCTV anthropometry, human pose estimation, 3D human pose estimation, 2D human pose estimation and OpenPose. In addition, reference lists of relevant articles were searched manually.
Publications were considered that dealt with anthropometric measurements for personal comparison or forensic applications involving body proportions or silhouettes, using video, CCTV (Closed-Circuit Television), or similar image material, as well as fundamental methodological developments in 2D or 3D HPE in this context. Peer-reviewed articles (journals and conferences) were considered, and historically relevant sources were included where necessary.
Exclusion criteria were studies that focused exclusively on the recognition of actions, medical or clinical anthropometry without forensic relevance, approaches to gait recognition without anthropometric interpretation, and non-peer-reviewed material, with the exception of historically relevant basic research.
The preselection was based on a review of titles and abstracts, followed by a full-text review of potentially relevant sources. Additional literature was added through reference chain searches.
3. Suitability of Anthropometric Measures in a Forensic Context
Anthropometric measurements, in particular linear body proportions and relationships between joint points, have great potential for forensic use, as they can be obtained from image or video material without the active involvement of the person concerned [
9]. In forensic contexts in particular, where traditional biometric features such as the face or fingerprint are obscured or unavailable, they offer significant opportunities to supplement identity verification.
A series of empirical studies has shown that combinations of body measurements reflect individual differences to a sufficient degree to make them useful for personal identification. One of these studies [
10] is based on the Army Anthropometric Survey (ANSUR) database [
11], which contains anthropometric measurements from 3982 individuals. A duplicate probability of
was determined for a combination of all eight body measurements examined, highlighting the individualization characteristics. Duplicate probability describes the probability that two different individuals will have identical anthropometric patterns. In general, it can be said that the duplicate probability decreases as the number of measurements included increases [
10].
The study “A frequentist estimation of duplicate probability as a baseline for person identification from image and video material using anthropometric measurements” [
12] also presented a basic statistical model for the discriminatory power of body patterns. Here, the authors determined duplicate probabilities using a dataset of 340 individuals. By applying a frequentist density estimation, they showed that this duplicate probability for the body measurements used here is in the range of approximately
to
. This also points to the high theoretical discriminatory power of body proportions. The results thus provide a quantitative basis for the use of anthropometric characteristics for personal identification [
12].
A follow-up study, “Analysing Distributions of Feature Similarities in the Context of Digital Anthropometric Pattern Matching Probability” [
13], analyzed the statistical distributions of similarity values between anthropometric features of different individuals. By modeling the multivariate distributions, matching probabilities could be quantified, which serve as the basis for calculating forensically relevant probability statements. The authors emphasize that the precise definition of measurement points and the standardization of the extracted measurements are crucial for minimizing uncertainties in classification [
13].
In addition, recent empirical studies in digital anthropometry show that measurements from real image and video material can achieve a high degree of accuracy, provided that camera calibration and reference standards are taken into account. For example, Ciampini et al. [
6] were able to achieve body height deviations of less than 1 cm between digital estimates and real measurements using video analyses with multiple camera perspectives, suggesting a level of precision that is forensically usable.
In the project “Computer-based forensic motion analysis for the identification of individuals” (COMBI), Becker et al. [
14] also demonstrated that physical characteristics reconstructed from 2D video recordings can potentially contribute to distinguishing between individuals or groups of individuals.
These results underscore that digitally determined anthropometric characteristics can represent a valid, passive source of biometric information when calibrated and applied correctly. For forensic use, however, the systematic quantification of uncertainties and error rates remains crucial in order to enable legally admissible conclusions.
In special cases, particularly in connection with children or older people, the changeability of anthropometric measurements must also be taken into account [
10].
4. Comparison Using 3D Reconstruction of Silhouettes
The silhouette method can be understood as a further development of traditional photogrammetric methods, in which classic principles (capturing multiple views, reference scales, camera calibration) are combined with 3D models of silhouettes [
9].
In classical photogrammetry, spatial measurements are performed using superimposed or stereoscopic images. Important fundamentals include the geometric determination of projection rays and the correction of perspective distortions, as well as the use of multiple viewing angles to estimate depth and relationships in space [
15]. This requires a precise camera calibration (determination of the focal length, position, and orientation of the camera), as well as correction for distortions [
15,
16].
In the method a 3D model of the suspect’s silhouette is first created using photogrammetric reconstructions in order to compare suspects and perpetrators. Several images are taken in different poses to better identify the joint points. The joint points are important because they ensure that the 3D model of the suspect can move correctly in later stages [
9].
In addition to photogrammetry (stereo or multivision), other technologies could also be used for this purpose. One example would be structured light, whereby known light patterns are projected and the deformations of the projected light pattern are used to reconstruct the body surface [
17]. Another example is time-of-flight imaging, in which the body surface is reconstructed based on the time it takes for modulated light to travel to the surface and back [
17].
To create the perpetrator silhouette, an editable 3D model of a silhouette is inserted into the virtual crime scene, where a 3D model (created using Lidar technology) and image or video material of the perpetrator are superimposed to create a realistic image. This allows the editable 3D silhouette model to be moved into the position and pose of the perpetrator and then adjusted to match their silhouette [
9].
In order to determine whether the suspect could be the perpetrator, the 3D models of both individuals are superimposed and compared in identical poses and original size [
9].
This manual rig method offers advantages:
Realistic reconstruction: By transferring camera images into a 3D space, spatial relationships, proportions, and viewing angles can be precisely reconstructed [
9].
Scaling and transferability: By combining the scan with control measurements, the correct scaling can be guaranteed [
9]. This ensures the transferability of the 3D silhouette models.
But the method is also subject to challenges and limitations:
Effort: The many steps involved in recording the suspect, measuring a crime scene, and post-processing (creating and adjusting 3D silhouettes of the suspect and 3D room models, rectifying camera images, reconstructing intrinsic and extrinsic camera parameters, adjusting the 3D silhouette of the perpetrator in the pose, comparison [
9]) are very time-consuming.
Expertise: The entire process requires a certain degree of forensic expertise and depends on the visual-cognitive assessment of the person performing the work. Especially in perspective view, it is difficult to accurately estimate the silhouette [
9].
Dependence on camera parameters and calibration: An exact metric reconstruction is only possible if the camera positions, focal lengths and distortions are known or can be precisely estimated. Errors in these parameters lead to systematic measurement deviations.
Variable factors: The clothing worn by the perpetrator can make it difficult to create an exact 3D model of the silhouette. In addition, a person’s silhouette is strongly influenced by their body weight and physical condition [
9]. Therefore, the characteristics to be compared must be unchangeable and clearly defined.
Validation: Validation of the method is necessary for use in court. The method should be clearly defined and tested on a larger data set.
5. Digital Anthropometric Rig Alignment—State of the Art
An alternative to comparison using 3D reconstruction of silhouettes is the use of a so-called rig (person-specific digital skeleton). This method has been used increasingly in recent years and has its origins in the COMBI study [
14].
The rig method with manually placed markers can also be understood as a further development of traditional photogrammetric methods, as presented in
Section 4. The rig method also uses the superimposition of 3D room models (in this case created using terrestrial laser scans) and image or video recordings to generate realistic representations with metric scaling. Here, two 3D models are always created and superimposed with camera images, one of the room in which the suspect is being recorded and one of the crime scene [
18].
However, instead of silhouettes, a rig is used to compare perpetrator and suspect. To create a rig for a suspect, recordings are made in a standardized mobile treatment room, with the suspect standing on a turntable. For this purpose, anatomical markers are placed at defined points on the skin of the joint system. These markers enable the creation of a rig in the virtual representation of the treatment room, including virtual cameras. This rig represents a simplified representation of the person’s specific movement and skeletal system, which can be moved and posed in digital 3D space [
19].
By transferring the rig to the 3D model of the crime scene and superimposing it with video recordings of the crime, the rig is manually adjusted to the pose of the perpetrator [
14,
19].
In this pose, measurements of the head and shoulder height of the perpetrator and those of the rig are taken and compared [
18,
19]. This means that the specificity of the movement and skeletal systems is taken into account, but not compared in all individual points. The application of this method in real cases is demonstrated by Rosenfelder et al. [
18], as presented in
Section 7.
This manual rig method offers several advantages:
Standardization: The entire method follows a clear workflow. The measurement of the suspect is subject to a standardized setup. Anatomical landmarks are clearly defined, directly visible through markers, and can be located very precisely [
19].
Reconstruction: When measuring the suspect, multiple camera angles allow a robust three-dimensional reconstruction.
Scaling and transferability: After processing, the terrestrial laser scanner used delivers metric-scaled 3D models with such minor deviations that they can be ignored [
20]. Combining this with a 3D reference model ensures correct scaling, which enables the transferability of the person-specific rig to another 3D space (specifically the crime scene) and thus comparison with unknown persons (perpetrators) [
14].
Independence from body weight: Since this method relies on the movement and skeletal system and does not use silhouettes for comparison, body mass does not play a decisive role.
The method is subject to challenges and limitations:
6. Potential and Current Challenges of Human Pose Estimation for Use in Digital Forensic Comparison of Individuals
Developments in the field of human pose estimation [
21] offer considerable potential to automate the previously manual marker rig method, thus significantly reducing effort and complexity. The origin of HPE is not primarily in forensic anthropometry, but in overlapping fields such as computer vision, animation, sports science, and human-machine interaction, where automatic estimation of joint positions in images or videos was identified early on as a key technology [
22].
In the context of forensic applications, these methods could also be valid under certain conditions, such as sufficient image or video quality, known camera calibration, and established landmark models. Both 2D-HPE and 3D-HPE offer interesting approaches: 2D models are usually easier to train, faster to execute, and require less data, while 3D methods can depict spatial depth, perspective, and body proportions more realistically [
23].
A recent example is the paper “Accuracy Evaluation of 3D Pose Reconstruction Algorithms Through Stereo Camera Information Fusion for Physical Exercises with MediaPipe Pose” [
24] in which combining 2D pose estimates from stereo images and fusion algorithms achieved significantly increased accuracy in 3D reproduction. This and similar studies show that HPE models can theoretically already achieve usable measurement accuracy in controlled or partially controlled conditions, which is a prerequisite for their potential use in a forensic context.
6.1. Two-Dimensional Human Pose Estimation
2D-HPE has developed rapidly in recent decades. Early methods included HPE models that used classic image processing and feature extraction algorithms, e.g., through contours, edges, and template matching [
25]. However, these methods were severely limited, especially in less than ideal lighting conditions or when the view was not frontal [
25]. Since around the mid-2010s, modern methods have increasingly relied on deep learning, heatmaps and neural networks for joint point estimation [
26].
The advent of deep learning techniques brought about a paradigm shift: networks no longer learned exclusively from handcrafted features, but extracted features directly from image data, used heatmap representations for joint point estimation, and employed neural networks with great depth and large amounts of data [
26]. Two main architectures were established: On the one hand, top-down approaches, in which people are first localized and then pose estimation is performed within these regions; On the other hand, bottom-up approaches, in which joint points in the image are first detected and then assigned to individual people [
26].
A milestone was the publication “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields” [
27], which introduced the OpenPose architecture. OpenPose is based on a bottom-up approach in which all joint points in the entire image are first detected independently of each other. To do this, the model uses confidence maps, which represent the probabilities of individual keypoints at each pixel, and part affinity fields (PAFs), which are two-dimensional vector fields that encode the spatial relationships between related joint points. The detected points are correctly grouped into individual persons using these PAFs. This allows OpenPose to capture multiple persons simultaneously without the need to calculate bounding boxes beforehand. This method was one of the first to combine real-time performance with multi-person recognition and high accuracy [
27,
28].
In addition to OpenPose [
27], a number of other systems were developed in the following years that improved 2D-HPE in various aspects. These developments were primarily aimed at achieving greater accuracy in complex postures, more robust detection of multiple people, and more efficient network architecture. For example, the High-Resolution Network (HRNet) [
29] used parallel high-resolution representations to capture finer joint positions. The HigherHRNet [
30], which was based on this, improved scaling robustness in multi-person scenes in particular. Elsewhere, lighter systems such as BlazePose were introduced for mobile applications [
31,
32]. A further development within the top-down family is YOLOPose [
33], which integrates person detection and keypoint estimation into a unified, real-time pipeline. Building on the efficiency of the YOLO detector family, YOLOPose offers high-speed 2D-HPE with slightly lower (than state of the art) but competitive accuracy, making it particularly suitable for scenarios requiring fast and reliable person localization prior to pose estimation [
33,
34].
These works mark the transition from general research prototypes to a variety of specialized and highly optimized systems for different application contexts.
These developments provide the methodological foundation for variants of the manual rig method already presented in the COMBI project, which use OpenPose to extract and compare anthropometric measurements and body proportions from 2D video or image material [
14]. Joint points are detected using OpenPose and are used to create an OpenPose-based, person-specific digital skeleton (OpenPose rig), similar to the manual markers introduced in
Section 5 [
19]. Similarly, relevant surveillance footage showing the perpetrator is processed using OpenPose, which allows joint points (2D) for the perpetrator to be predicted automatically. This makes it possible to perform a point-by-point comparison after fitting the OpenPose rig to the pose of the perpetrator. For this purpose, variants of the root mean square deviation (RMSD), see Formula (
1) [
35], are used as a measure of dissimilarity [
19].
Here,
and
denote the
N key points of the
i-th perpetrator and the
j-th suspect. The common index
k represents the
k-th joint point.
corresponds to the squared Euclidean distance between these
N points after superimposing the suspect rig and the perpetrator [
35].
Current work shows that the root weighted square deviation (RWSD), see Formula (
2), is a possible new measure of dissimilarity for evaluating the suspect-perpetrator comparison [
35].
The RWSD is a modification of the classic RMSD that includes keypoint-specific weighting factors
. The weighting factors are defined as real values in the interval
and sum to 1 [
35].
According to a minimal study, however, it is not possible to compare manual and OpenPose rigs, but when considered separately, both methods work well on their own [
19].
The advantages of the open-pose variant are:
Reduction of effort and expertise: Manual marker placement becomes obsolete and rapid extraction of joint points from images or videos is enabled. This reduces the on-site effort and necessary interaction when treating a suspect. The anatomical expertise required to place manual markers is reduced.
Evaluation: By comparing each joint individually, more information can be included than with the manual rig method, which can increase the significance of the comparison.
However, a case study [
18] discussed in more detail in
Section 7 shows that the OpenPose variant of the rig method has some limitations. Fundamental limitations lie in:
Effort: Even though the OpenPose variant saves time, the effort involved remains considerable, as apart from marker placement, almost all other steps remain identical or at least similar in terms of effort.
Operational conditions and accuracy: In real crime scenes or surveillance situations, idealized camera positions and recording conditions are often not available [
9], which limits the use of the method [
36]. The estimation of joint points depends heavily on the HPE used and factors such as brightness, image resolution, perspective or occlusions. The robustness of the HPE to such factors is often due to the training data used [
8].
6.2. Three-Dimensional Human Pose Estimation
While 2D-HPE is limited to estimating joint points in image coordinates, 3D-HPE offers the potential to make statements about spatial depth. This is a feature that can be particularly important in forensic applications. The transition to 3D-HPE began with approaches that used multiple cameras or special depth sensors, or combined monocular images with model assumptions and statistical body models [
37,
38,
39].
One milestone is the model “Keep it SMPL [(skinned multi-person linear)]: Automatic Estimation of 3D Human Pose and Shape from a Single Image” [
39], which for the first time allows both pose and body shape to be estimated from a single Red, Green, Blue (RGB) image using the statistical SMPL model. Here, the 2D pose result is adjusted with a 3D body model.
Another important step is “Monocular 3D Human Pose Estimation in the Wild using Improved CNN [(Convolutional Neural Network)] Supervision” [
40], which uses images from uncontrolled environments with training methods that utilize both 2D and 3D annotations to improve generalization.
“VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera” [
41] is also worth mentioning. It shows that even with just one camera, real-time full-body 3D pose estimation is possible, which is particularly relevant for applications with limited hardware or in field conditions.
A modern and forensically relevant approach is MeTRAbs (Metric-Scale Truncation-Robust Heatmaps for Absolute 3D-HPE) [
42]. Unlike many previous 3D approaches, which relied on an intermediate representation in image coordinates and often had scaling or occlusion problems, MeTRAbs defines volumetric heatmaps entirely in metric 3D space and is therefore not image space dependent. MeTRAbs is thus able to estimate metrically scaled 3D joint points from a single RGB image. The core of the method is that the network learns absolute metric depth information by combining special truncation-robust heatmaps and a coordinate-based regression scheme. MeTRAbs uses precise camera parameters to correctly project the predicted joint points into metric space, so that body measurements and spatial positions remain comparable between different camera systems. Thanks to robust training strategies, MeTRAbs is also largely insensitive to occlusions, truncated body parts, and varying recording conditions, delivering consistent, scale-correct 3D poses even in realistic scenes [
42,
43,
44].
This development has created the basis for current forensically inspired concepts such as the concept paper “Potential approach for targeted matching of people in video footage based on 3D human pose estimation” [
5]. The concept described in this paper is shown schematically in
Figure 1. The study proposes comparing two individuals (perpetrator and suspect) using videos of each of them without additional measurements, recordings, or scans. MeTRAbs [
42] is proposed as the basis for this, as it enables metric scaling when using precise camera parameters. This makes transferability between different recording systems possible.
Therefore, the video recordings of the perpetrator and suspect are processed using MeTRAbs, which estimates the 3D joint points of the depicted individual in metric space for each video frame. Based on these joint points, the bone lengths per frame are then calculated. This is followed by an averaging of all frames in the video sequence. Unreliable estimates are identified and excluded by outlier detection. Since the method is applied to both the video material of the perpetrator and that of a suspect, two metrically scaled sets of bone lengths are produced, which can be understood as 3D rig of the perpetrator and suspect. These would then provide the basis for a mathematical comparison, e.g., using a variant of the RMSD based on bone lengths, as shown in Formula (
3) [
5,
19].
Here, and denote the M bone lengths of the i-th perpetrator and the j-th suspect. The common index l stands for the l-th bone length.
As visualized in
Figure 2, this approach would offer significant advantages over current methods and 2D HPE approaches:
Reduced effort: No measurements of the suspect or crime scene would be necessary, resulting in both time and financial savings. In general, the manual effort would be significantly reduced. The time required would mainly be determined by the 3D-HPE and the subsequent calculations.
Reduced expertise: The original high requirements for forensic expertise would be reduced and shifted.
Broader range of applications: If this idea delivers reliable results, the method could potentially be used in significantly more cases due to its high degree of automation and reduced need for expert knowledge.
However, the success of such a method is limited by:
Quality and robustness of 3D-HPE: The accuracy of pose estimation depends heavily on the quality and diversity of the training data. This is still significantly limited, particularly in the field of 3D-HPE [
7]. However, forensic applications often involve less than ideal recording conditions. The lack of representation of real crime or surveillance scenarios can lead to systematic estimation errors.
Dependence on camera parameterization and calibration: Without known camera parameters, it is not possible to perform a metrically correct reconstruction [
42] of body measurements and thus a meaningful comparison between individuals. In real forensic scenarios, it is often difficult to obtain these calibration data.
Validation: For forensic applications, studies under ideal and real conditions are essential in order to prove the functionality of the method, identify limitations, establish evaluation procedures, and examine its admissibility in court [
5].
7. Practical Example
In a published real practical example, both the manual method of digital anthropometric rig alignment (see
Section 5) and its OpenPose-based variant were used [
18]. This allows the advantages and disadvantages to be demonstrated under real-world conditions.
The case study concerns a kiosk robbery in which the perpetrator was partially covered. Two cameras were installed in the kiosk that recorded the crime. There were also recordings from the day before the robbery, in which the suspect could already be clearly identified [
18].
Analogous to the process chains introduced in
Section 5 and
Section 6.1, a marker-based rig and an OpenPose-based rig were created in the digital representation of the treatment room for the suspect, see
Figure 3 [
18].
For the crime scene, as well as for the treatment room, a real-scale 3D room was created, which can be overlaid with the footage of the crime and the previous day. This ensures the transferability of the rigs and enables them to be fitted into the various poses [
18]. The suspect’s rigs were matched to the footage of the suspect from the previous day and to the footage of the perpetrator during the robbery [
18]. The matching to the footage of the suspect from the previous day is shown as an example for the manual rig method in
Figure 4 and for the OpenPose-based variant in
Figure 5.
The results obtained are shown in
Table 1 and
Table 2. In all comparisons, the differences between the rig of the suspect and the footage of the perpetrator are slightly smaller than between the rig of the suspect and the footage of the suspect. This is the case for both: the body and shoulder heights in the manual rig method and the RMSD values in the OpenPose-based variant. This indicates a high degree of consistency between the suspect and the perpetrator. Furthermore, the results underscore the parallel functionality of both methods used [
18].
The results of the manual rig method are subject to the visual-cognitive assessment of the processor. This applies to rig creation, rig fitting, and the measurements of shoulder and head height. Human expertise and experience play a role in several areas of rig creation. First and foremost, locating and marking the relevant points on the suspect’s body requires a high level of expertise. Secondly, the recognition of markers in the recorded image material and their associated transfer into the digital space is subject to the visual cognitive assessment of the processor. Human errors in both marker placement and transfer to the digital space could potentially lead to small differences in the rig of the suspect. Similarly, the visual-cognitive assessment of the processor can influence the correct positioning and alignment of the suspect’s rig in the pose of the reference person (in this case, the suspect from the previous day and the perpetrator [
18]). This potentially affects the resulting body and shoulder heights in the corresponding pose. The difficulty in measuring body and shoulder heights is particularly influenced by the camera angle. In the present case, for example, due to the camera angle, the endpoint of the head is not at the outer edge of the head, but rather in the middle. This makes it difficult to determine the exact position [
18]. Human error at this point can also potentially influence the final result. However, human errors cannot be clearly quantified. To counteract this, a measurement error of one pixel is assumed in the practical example and the resulting differences in the metric space are determined [
18].
The dependence on human expertise and the time required could be reduced somewhat by the OpenPose-based method. Manual marker placement is not necessary here, which saves time and reduces human error at this point. When fitting the suspect’s rig to the pose of the reference person, the influence of the operator’s visual-cognitive assessment remains, as this step still has to be performed manually. However, human influence is reduced in the actual comparison. The OpenPose-based variant does not require measuring rods, but allows a point-to-point comparison. In addition, this can potentially increase the significance of the comparison. However, the case study clearly shows that OpenPose predictions currently contain inaccuracies and outliers. This becomes apparent when both cameras available in the kiosk should be used for fitting the OpenPose-based rig, as shown in
Figure 6. The OpenPose predictions cannot be merged at this point. The high variability of certain key points is also pointed out. The hip should be emphasized here, which is even omitted from the calculation of the RMSD values, as its high variability in the predictions would significantly increase the dispersion of the RMSD values and therefore reduce their significance [
18]. This means that although the OpenPose-based method reduces the time and expertise required as well as human influence, it is dependent on the accuracy of the OpenPose predictions.
Overall, this case study demonstrates the potential of the OpenPose-based method, but also highlights the challenges posed by inaccuracies and inconsistencies in the predictions.
8. Conclusions
The development from historical Bertillonage to modern methods of 3D human pose estimation impressively demonstrates how Body measurements for digital forensic comparisons have transformed in the tension between technological progress and forensic applicability. This development marks the transition from static measurement systems to dynamic, data-driven methods that enable a more objective and reproducible contribution to identity analysis.
There are currently two manual methods based on the superimposition of 3D space and image or video recordings. One method uses the silhouettes of suspects and perpetrators for comparison [
9], while the other method compares a rig of the suspect with the recordings of the perpetrator [
14,
19]. Both methods offer potential for further development with advances in the field of HPE. A variant of the manual rig method that uses 2D HPE, specifically OpenPose, already exists [
14,
19]. Ideas for the use of 3D HPE have also been presented [
5]. The methods described are summarized and compared in
Table 3.
A comparison of the variants of the rig-method shows that increasing automation also brings new challenges. Although the manual rig method has already been successfully applied in real forensic cases [
18], it is only used in relevant individual cases due to its high cost and the required expertise [
14]. The same applies to the OpenPose-based variant of the method. Although this still offers potential for the future, it is clearly dependent on developments in 2D-HPE. However, not only 2D HPE-based approaches, but especially 3D HPE-based approaches offer the potential to significantly reduce effort, as shown schematically in
Figure 2. This would also open up the possibility of extending the method to broader application scenarios. However, validation of such new methods is essential and represents the central hurdle for implementation in court.
In the long term, digital anthropometry, especially in conjunction with advanced 3D human pose estimation, could play a decisive role in forensic practice, not as a replacement, but as a complementary, passive biometric method to support traditional methods. This would close the historical circle: from Bertillonage as the analog origin, through the digital rebirth of anthropometric methods, to the prospective establishment of a scientifically validated digital skeleton as a forensically usable comparison feature.
Author Contributions
Conceptualization, S.R. and D.L.; formal analysis, S.R.; data curation, S.R.; writing—original draft preparation, S.R.; writing—review and editing, S.R. and D.L., visualization, S.R.; supervision, D.L.; project administration, D.L.; funding acquisition, S.R. and D.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by European Social Fund Plus and the Free State of Saxony grant number 100670478. The APC was funded by the University of Applied Sciences Mittweida. Supported by the Open Access Publication Fund of Mittweida University of Applied Sciences.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is not applicable to this article.
Acknowledgments
During the preparation of this manuscript, the authors used Gemini, 2.5 Flash and GPT-5 for the purposes of research and text generation. The authors have reviewed and edited the output and take full responsibility for the content of this publication.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
| AI | artificial intelligence |
| ANSUR | Army Anthropometric Survey |
| CCTV | Closed-Circuit Television |
| CNN | Convolutional Neural Network |
| COMBI | Computer-based forensic motion analysis for the identification of individuals |
| HPE | human pose estimation |
| HRNet | High-Resolution Network |
| MeTRAbs | Metric-Scale Truncation-Robust Heatmaps for Absolute 3D-HPE |
| PAFs | part affinity fields |
| RGB | Red, Green, Blue |
| RMSD | root mean square deviation |
| RWSD | root weighted square deviation |
| SMPL | skinned multi-person linear |
| 2D | two-dimensional |
| 3D | three-dimensional |
References
- Bertillon, A.; McClaughry, R.W. Signaletic Instructions Including the Theory and Practice of Anthropometrical Identification; Werner Company: Chicago, IL, USA, 1896. [Google Scholar]
- Altes, K.B.; Ost, A.; Perez, D.; Constantino, A.; Carpenter, K.; Ferguson, D.; Bohne, C. Bertillon, Alphonse. In The International Encyclopedia of Biological Anthropology; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2018; pp. 1–2. [Google Scholar]
- Jain, A.K.; Ross, A. Bridging the Gap: From Biometrics to Forensics. Phil. Trans. R. Soc. B 2015, 370, 20140254. [Google Scholar] [CrossRef]
- Lima, T.G.L. Better Justice Through Better Science-Technology? The Entanglements of Algorithms and Security and Legal Professionals. Ph.D. Thesis, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, Brazil, 2024; p. 81. [Google Scholar]
- Richter, S.; Labudde, D. Potential Approach for Targeted Matching of People in Video Footage Based on 3D Human Pose Estimation. In INFORMATIK 2023; Gesellschaft für Informatik e.V.: Bonn, Germany, 2024; pp. 371–380. [Google Scholar] [CrossRef]
- Ciampini, C.; Petrillo, A.; Zomparelli, F.; Groutas, S. An Innovative Method for Human Height Estimation Combining Video Images and 3D Laser Scanning. J. Forensic Sci. 2024, 69, 301–315. [Google Scholar] [CrossRef]
- Guo, Y.; Gao, T.; Dong, A.; Jiang, X.; Zhu, Z.; Wang, F. A Survey of the State of the Art in Monocular 3D Human Pose Estimation: Methods, Benchmarks, and Challenges. Sensors 2025, 8, 2409. [Google Scholar] [CrossRef]
- Sun, R.; Lin, Z.; Leng, S.; Wang, A.; Zhao, L. An In-Depth Analysis of 2D and 3D Pose Estimation Techniques in Deep Learning: Methodologies and Advances. Electronics 2025, 14, 1307. [Google Scholar] [CrossRef]
- Maksymowicz, K.; Kuzan, A.; Szleszkowski, Ł.; Tunikowski, W. Anthropological Comparative Analysis of CCTV Footage in a 3D Virtual Environment. Appl. Sci. 2023, 13, 11879. [Google Scholar] [CrossRef]
- Lucas, T.; Henneberg, M. Comparing the Face to the Body, Which Is Better for Identification? Int. J. Legal Med. 2016, 130, 533–540. [Google Scholar] [CrossRef] [PubMed]
- Gordon, C.C.; Churchill, T.; Clauser, C.E.; Bradtmiller, B.; McConville, J.T.; Tebbetts, I.; Walker, R.A. 1988 Anthropometric Survey of U.S. Army Personnel: Methods and Summary Statistics; Final Report; NATICK/TR-89/044; U.S. Army Natick Research, Development and Engineering Center: Natick, MA, USA, 1989. [Google Scholar]
- Heinke, F.; Heuschkel, M.-L.; Labudde, D. A Frequentist Estimation of Duplicate Probability as a Baseline for Person Identification from Image and Video Material Using Anthropometric Measurements. In INFORMATIK 2022; Gesellschaft für Informatik e.V.: Bonn, Germany, 2022; pp. 91–98. [Google Scholar] [CrossRef]
- Heinke, F.; Heuschkel, M.; Labudde, D. Analysing Distributions of Feature Similarities in the Context of Digital Anthropometric Pattern Matching Probability. In INFORMATIK 2023—Designing Futures: Zukünfte Gestalten; Gesellschaft für Informatik e.V.: Bonn, Germany, 2023; pp. 573–582. [Google Scholar] [CrossRef]
- Becker, S.; Heuschkel, M.; Richter, S.; Labudde, D. COMBI: Artificial Intelligence for Computer-Based Forensic Analysis of Persons. KI-Kunstl. Intell. 2022, 36, 171–180. [Google Scholar] [CrossRef]
- Kraus, K. Photogrammetry: Geometry from Images and Laser Scans; De Gruyter: Berlin, Germany, 2007; pp. 1–183. [Google Scholar] [CrossRef]
- Luhmann, T.; Robson, S.; Kyle, S.; Boehm, J. Close-Range Photogrammetry and 3D Imaging; De Gruyter: Berlin, Germany, 2023; pp. 121–262. [Google Scholar] [CrossRef]
- Bartol, K.; Bojanic, D.; Petkovic, T.; Pribanic, T. A Review of Body Measurement Using 3D Scanning. IEEE Access 2021, 9, 67281–67301. [Google Scholar] [CrossRef]
- Rosenfelder, J.; Pistorius, E.; Labudde, D. Fallstudie zum Identitätsabgleich mittels digital-anthropometrischem Rig. In INFORMATIK 2025; Gesellschaft für Informatik e.V.: Bonn, Germany, 2025; pp. 417–425. [Google Scholar] [CrossRef]
- Pistorius, E.; Richter, S.; Labudde, D. The Digital Skeleton in Modern Video Analysis - Inter- and Intraspecific Comparsion of Individual Rigs. In INFORMATIK 2023—Designing Futures: Zukünfte Gestalten; Gesellschaft für Informatik e.V.: Bonn, Germany, 2023; pp. 611–621. [Google Scholar] [CrossRef]
- Kersten, T.P.; Lindstaedt, M. Geometric Accuracy Investigations of Terrestrial Laser Scanner Systems in the Laboratory and in the Field. Appl. Geomat. 2022, 14, 421–434. [Google Scholar] [CrossRef]
- Ben Gamra, M.; Akhloufi, M.A. A Review of Deep Learning Techniques for 2D and 3D Human Pose Estimation. Image Vis. Comput. 2021, 114, 104282. [Google Scholar] [CrossRef]
- Toshpulatov, M.; Lee, W.; Lee, S.; Haghighian Roudsari, A. Human Pose, Hand and Mesh Estimation Using Deep Learning: A Survey. J. Supercomput. 2022, 78, 7616–7654. [Google Scholar] [CrossRef]
- Neupane, R.B.; Li, K.; Boka, T.F. A Survey on Deep 3D Human Pose Estimation. Artif. Intell. Rev. 2024, 58, 24. [Google Scholar] [CrossRef]
- Dill, S.; Ahmadi, A.; Grimmer, M.; Haufe, D.; Rohr, M.; Zhao, Y.; Sharbafi, M.; Hoog Antink, C. Accuracy Evaluation of 3D Pose Reconstruction Algorithms Through Stereo Camera Information Fusion for Physical Exercises with MediaPipe Pose. Sensors 2024, 24, 7772. [Google Scholar] [CrossRef] [PubMed]
- Poppe, R. Vision-Based Human Motion Analysis: An Overview. Comput. Vis. Image Underst. 2007, 108, 4–18. [Google Scholar] [CrossRef]
- Zheng, C.; Wu, W.; Chen, C.; Yang, T.; Zhu, S.; Shen, J.; Kehtarnavaz, N.; Shah, M. Deep Learning-Based Human Pose Estimation: A Survey. ACM Comput. Surv. 2023, 56, 1–37. [Google Scholar] [CrossRef]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.-E.; Sheikh, Y. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef]
- Cao, Z.; Simon, T.; Wei, S.-E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1302–1310. [Google Scholar] [CrossRef]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5686–5696. [Google Scholar] [CrossRef]
- Cheng, B.; Xiao, B.; Wang, J.; Shi, H.; Huang, T.S.; Zhang, L. HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–18 June 2020; pp. 5385–5394. [Google Scholar] [CrossRef]
- Chung, J.L.; Ong, L.Y.; Leow, M.C. Comparative Analysis of Skeleton-Based Human Pose Estimation. Future Internet 2022, 14, 380. [Google Scholar] [CrossRef]
- Bazarevsky, V.; Grishchenko, I.; Raveendran, K.; Zhu, T.; Zhang, F.; Grundmann, M. BlazePose: On-Device Real-Time Body Pose Tracking. arXiv 2020, arXiv:2006.10204. [Google Scholar] [CrossRef]
- Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 2636–2645. [Google Scholar] [CrossRef]
- Ding, J.; Niu, S.; Nie, Z.; Zhu, W. Research on Human Posture Estimation Algorithm Based on YOLO-Pose. Sensors 2024, 24, 3036. [Google Scholar] [CrossRef]
- Heinke, F.; Heuschkel, M.; Labudde, D. Bildgestützte Biometrische Personenidentifizierung Anhand Des Digital-Anthropometrischen Rigabgleichs: Quantitativer Vergleich Mittels RWSD. In Polizei-Informatik 2025; Deutsche Hochschule der Polizei— Hochschulverlag: Münster, Germany, 2025; Volume 27, pp. 180–193. [Google Scholar]
- Baldinger, M.; Reimer, L.M.; Senner, V. Influence of the Camera Viewing Angle on OpenPose Validity in Motion Analysis. Sensors 2025, 25, 799. [Google Scholar] [CrossRef]
- Hofmann, M.; Gavrila, D.M. Multi-View 3D Human Pose Estimation in Complex Environment. Int. J Comput. Vis. 2012, 96, 103–124. [Google Scholar] [CrossRef]
- Shotton, J.; Sharp, T.; Kipman, A.; Fitzgibbon, A.; Finocchio, M.; Blake, A.; Cook, M.; Moore, R. Real-Time Human Pose Recognition in Parts from Single Depth Images. Commun. ACM 2013, 56, 116–124. [Google Scholar] [CrossRef]
- Bogo, F.; Kanazawa, A.; Lassner, C.; Gehler, P.; Romero, J.; Black, M.J. Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9909, pp. 561–578. [Google Scholar] [CrossRef]
- Mehta, D.; Rhodin, H.; Casas, D.; Fua, P.; Sotnychenko, O.; Xu, W.; Theobalt, C. Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 506–516. [Google Scholar] [CrossRef]
- Mehta, D.; Sridhar, S.; Sotnychenko, O.; Rhodin, H.; Shafiei, M.; Seidel, H.-P.; Xu, W.; Casas, D.; Theobalt, C. VNect: Real-Time 3D Human Pose Estimation with a Single RGB Camera. ACM Trans. Graph. 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Sárándi, I.; Linder, T.; Arras, K.O.; Leibe, B. MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation. IEEE Trans. Biom. Behav. Identity Sci. 2021, 3, 16–30. [Google Scholar] [CrossRef]
- Sárándi, I.; Linder, T.; Arras, K.O.; Leibe, B. Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; pp. 407–414. [Google Scholar] [CrossRef]
- Sárándi, I.; Hermans, A.; Leibe, B. Learning 3D Human Pose Estimation from Dozens of Datasets Using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 2955–2965. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).