Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning

Zhao, Zhike; Song, Linman; Li, Songying; Xue, Ruihao; Li, Peng

doi:10.3390/bdcc10070204

Open AccessArticle

Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning

by

Zhike Zhao

^1,2

,

Linman Song

^1,2

,

Songying Li

¹,

Ruihao Xue

¹ and

Peng Li

^2,*

¹

School of Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China

²

Institute for Complexity Science, Henan University of Technology, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2026, 10(7), 204; https://doi.org/10.3390/bdcc10070204 (registering DOI)

Submission received: 28 April 2026 / Revised: 12 June 2026 / Accepted: 16 June 2026 / Published: 23 June 2026

(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)

Download

Browse Figures

Versions Notes

Abstract

Traditional acupoint localization methods rely heavily on manual operation, resulting in high subjectivity and limited accuracy. To improve the precision and stability of acupoint detection, this study integrates machine vision technology with in situ projection to achieve automated recognition and real-time visualization of human acupoints. First, an automatic calibration method based on image processing is proposed for back acupoints. Spinal features are extracted from the blue channel, enhanced using adaptive histogram equalization, and processed through region of interest extraction, minimum-threshold binarization, and morphological operations. Key spinal curve points are then fitted using Bézier functions. Canny edge detection is used to extract the human silhouette, locate the acromion, and derive the pixel scale of the “cun” measurement, enabling coordinate computation for 141 back acupoints. In the deep learning component, an improved YOLOv8-Pose model is developed for acupoint localization. Unlike existing methods that use local attention or the original Object Keypoint Similarity (OKS) loss, we introduce two innovations: a non-local attention module for global dependency modeling, and a novel Efficient Object Keypoint Similarity (EOKS) loss function that incorporates geometric constraints—namely, width, height, and center distance—in addition to Euclidean distance. A non-local attention mechanism is incorporated into the backbone to enhance global feature extraction, and the EOKS loss function is designed to improve spatiogeometric regression accuracy. An inference mechanism is further introduced to derive the remaining acupoints from 49 detected keypoints; experiments demonstrate that the improved model achieves 95.0% detection accuracy, outperforming the baseline by 2.62%, with an inference time of 14.5 ms. Finally, an in situ projection platform is constructed, combining camera calibration, four-point proportional scaling, and an OpenCV 4.5.4-based interactive interface. The system supports real-time translation, rotation, and scaling, enabling accurate projection of detected acupoints onto the human body.

Keywords:

acupoint recognition; image processing; YOLOv8-Pose; attention mechanism; in situ projection

1. Introduction

Traditional Chinese medicine (TCM), as an integral part of China’s traditional culture, has long played a unique role in disease treatment and health management. In TCM, techniques such as acupuncture, tuina, and moxibustion are widely employed due to their rapid efficacy and straightforward application. These techniques primarily target and manipulate acupoints [1].

Acupoint detection on the human back faces three unique challenges that distinguish it from general pose estimation or facial keypoint detection [2]:

(I): Weak texture. Unlike the face (with eyes, nose, and mouth) or hands (with creases and knuckles), the back surface lacks distinct visual features. Individual acupoints cannot be identified by local appearance alone.
(II): A large number of acupoints. The back contains 141 acupoints (including bilateral points), far exceeding the 17 keypoints in standard pose estimation datasets. This high density makes differentiation difficult.
(III): High inter-subject variability. Acupoint positions scale with body size (height, shoulder width, BMI) and vary with posture. A fixed coordinate mapping is insufficient.

However, traditional methods for identifying and localizing acupoints heavily rely on the professional knowledge and clinical experience of healthcare practitioners [3]. Inaccurate localization of acupoints can result in improper treatment, which may significantly undermine therapeutic outcomes. Recent advancements in deep learning and machine vision technologies have enabled the automatic identification and localization of human acupoints, making this area of research increasingly prominent [4]. Chang Menglong et al. [5] proposed a method for facial acupoint localization using image processing techniques. This approach utilizes Canny edge detection to extract facial contours and identifies acupuncture points based on distinct facial features such as the nose and eyes. However, this method is limited to acupoints with prominent features. In another study, Fei Honglin et al. [6] introduced a technique for locating acupoints on the back using visual technology. This method involves manually marking visual feature points on the target and using a deep learning model to recognize them.

Additionally, Tang Ruiyin et al. [7] employed circular artificial markers placed at relevant acupoints on the human back to obtain preliminary location data. They then determined the acupoint coordinates based on edge gray-level distribution features. Notably, acupoint detection methods relying on external markers are heavily influenced by the quality of the dataset and may exhibit limitations in generalizability. Zhang Tingting et al. [8] introduced the FADbR model, a facial acupoint detection model based on feature representation learning. The model extracts implicit knowledge from facial features through face image reconstruction, but its limited ability to filter out irrelevant features reduces its sensitivity to acupoints. Wei Yu et al. [9] applied convolutional neural networks (CNNs) for acupoint detection on the human hand. They constructed a Faster R-CNN framework and used Dropout regularization to mitigate model overfitting, achieving an accuracy of 92% for hand acupoint detection. Xiaoyun Ji et al. [10] proposed a method combining graph convolution and point cloud depth to detect acupoints. This method uses a linear model of the skin and processes it with PointNet to predict acupoint coordinates. However, PointNet has limitations in extracting local features, which affects the accuracy of acupoint detection.

The YOLOv8 model presents a novel direction for acupoint keypoint detection. Its variant, YOLOv8-Pose, is specifically designed for human pose estimation. Both human pose estimation and acupoint detection share the common objective of identifying point-like features on the human body [11]. This shared feature makes YOLOv8-Pose a promising model for acupoint detection. Fu Yu et al. [12] enhanced the YOLOv8-Pose module by replacing the original C2f module with C2F-GHOST, which reduces the number of model parameters. Furthermore, a small target detection head was incorporated into the model to reduce the missed detection rate for small acupoint points, improving the detection accuracy of key human pose points. However, the model’s robustness in complex scenarios requires further refinement. Wang Xiaopeng et al. [13] proposed an improved YOLOv8 algorithm for human keypoint detection, using ShuffleNetV2 to replace the Darknet-53 backbone feature network, which significantly improved detection accuracy. However, the model achieved a relatively low detection frame rate of 16.8 FPS, indicating limited real-time performance. Zijian Yuan et al. [14] developed an acupoint location network for the human face by integrating the Efficient Channel Attention (ECA) mechanism into YOLOv8-Pose. However, due to the small size of facial acupoints, positional inaccuracies persisted. Chen et al. [15] designed a back acupoint detection method based on Pose Tracking, successfully locating six back acupoints through regression heatmaps, in accordance with traditional Chinese medicine theory. Liu et al. [16] employed a deep convolutional generative adversarial network (DCGAN) to expand the human back sample set and utilized a Keypoint Region Convolutional Neural Network to train the expanded dataset, achieving an average accuracy of 86.33% for twelve back acupoints. These methods demonstrate the potential of attention mechanisms and network structure modifications to enhance keypoint detection. However, some models require substantial computational resources, limiting their real-time detection capabilities [17]. While methods for detecting acupoints on areas with distinct textures, such as the face and hands, show promising results, detecting acupoints on the smooth and featureless back presents significant challenges due to the scarcity of distinguishing features. Moreover, the larger number of acupoints on the back further complicates the task of improving detection accuracy. Based on deep learning principles, this study proposes an enhanced YOLOv8-Pose algorithm for human acupoint location recognition. By analyzing the distribution of acupoints on the back, 49 core reference acupoint landmarks were identified and localized using deep learning methods. To improve detection accuracy, non-local modules were integrated into the YOLOv8-Pose framework, enhancing the model’s ability to learn spatial dependencies among acupoints. Additionally, the OKS function was optimized to minimize prediction errors. Furthermore, to address the challenges posed by the dense distribution of acupoints on the back, an acupoint inference and localization mechanism was introduced during the detection phase. This mechanism, based on the 49 detected acupoint characteristic points, analyzes the correlations among acupoints and enables the identification of a total of 141 acupoints. To ensure systematic acupoint recognition and localization, a dynamic in situ projection system for human acupoints was developed using OpenCV.

2. Materials and Methods

This study proposes an adaptive recognition and positioning method for human back acupoints based on the fusion mechanism of acupoint feature extraction and acupoint reasoning. First, we extract the blue component from the RGB color model and apply histogram equalization to enhance the image contrast between the human back spine line and the surrounding area. Secondly, the region of interest close to the human back spine is cropped from the enhanced image, which reduces the computational load of the subsequent image processing algorithms and obtains a clearer image of the human back spine line through binarization and morphological processing of the interested region. Moreover, after the morphological processing is completed, the Bézier curve is used to fit the discontinuous spine line, thereby extracting spine curves with varying degrees of curvature. Subsequently, the contour image of the human back is extracted using the Canny operator, and the contour line of the human image is differentiated to obtain the point with the minimum average tangent slope on the contour line, which is identified as the acromion of the human back. Then, the “cun” unit in traditional Chinese medicine measurement is calculated based on the distance between these two acromion points, and the “cun” is converted into pixel distance to determine the pixel position of the Shenzhu acupoint. Finally, with the Shenzhu acupoint as the reference point, the spine line of the human back is used as the dividing line for the left and right symmetrical acupoints on the human body, and a spatial position reasoning model for all acupoints on the human back is established to achieve the recognition of human back acupoints.

2.1. Back Acupoint Localization Based on Image Processing

2.1.1. Method for Extracting the Localization Curve of the Human Spinal Column

According to the national standard GB/T 40997-2021 [18], the positioning characteristics of the Hua Tuo Jia Ji Point are 0.5 cun beside the posterior midline of the human body, on both sides of the spinous processes of the 1st thoracic vertebra to the 5th lumbar vertebra. The middle line in the paper is the human spine curve, and the spine curve can be used as the positioning curve of the Hua Tuo Jia Ji Point. The 0.5 cun value is proportional to the individual’s body size, not an absolute distance (e.g., 0.5 cun = 12.7 mm). For a taller/wider individual, 0.5 cun is physically larger; for a smaller individual, 0.5 cun is physically smaller.

In this paper, the human spine curve extraction method is used to locate the basic position of the points. As shown in Figure 1, the spine curve extraction process consists of four steps: positioning curve feature enhancement, feature point extraction, curve fitting, and acupoint labeling. Each step is detailed in the following subsections.

The low contrast of the skin surface in the original back image leads to the blurring of the contours of the human spinal curve and the difficulty in extracting the characteristic curves of the acupoint localization. To enhance the feature representation, the original images were subjected to RGB three-channel conversion, The three-channel comparison diagram is shown in Figure 2. By analyzing these three-channel images, it can be observed that the blue channel more prominently highlights the spinal features of the human back.

Extracting the blue channel image and applying adaptive histogram equalization (AHE) enhanced the local contrast in the back spine region (ROI). In this case, to avoid over-amplification of noise, the contrast limit of histogram equalization was set to 2, and the gray value enhancement was capped at twice the original value. In order to enhance the processing speed and reduce the image size, the region of interest was extracted from the equalized back image. The steps for feature enhancement of the acupoint localization curves are shown in Figure 3.

Analysis of the ROI image shows that the human spine curve pixel points are located at the row brightness minima. Thus, we design the minimum-value binarization feature point extraction process; the steps are shown in Figure 4. An all-zero image the same size as the ROI (region of interest) is first pre-built. Iterate over each pixel point f (i, j) in the ROI; if the pixel is the minimum brightness value in its row, set the corresponding pixel in the all-zero image to one. Binarization by the minima introduces white noise points, which interfere with the subsequent fitting of the spine curves. The morphological processing method was used to eliminate noise. For each pixel with a value of 1 in the binary image, the number of neighboring pixels with a value of 1 in the eight-connected neighborhood was calculated. Pixels with an eight-connected area of less than 3 were removed. Finally, the image is subjected to a morphological open operation, in which the size of the structural kernel is 2 × 2 and the element values are all 1, eliminating the remaining interfering points while preserving the basic shape of the curve.

The characteristic points on the spinal contour serve as essential references for fitting the spinal curve. Although the binarized image obtained after morphological processing contains these characteristic points, extracting the human spinal curve necessitates curve fitting using these points. This paper employs three methods for fitting these characteristic points [19], spline interpolation, polynomial fitting, and Bézier function fitting, with the aim of obtaining a characteristic curve that accurately reflects the spine.

Spline interpolation fitting generates curves by constructing piecewise low-order polynomials between adjacent data points, as illustrated in Figure 5a. Polynomial fitting employs the least squares method to create polynomial functions that connect the original data points, resulting in curves suitable primarily for datasets with fewer feature points, as shown in Figure 5b. Bézier function fitting generates curves based on a series of control points, where the curve’s shape is dictated by the control points’ positions, as depicted in Figure 5c. Comparing the fitting results of the three methods, the curves produced by Bézier function fitting most accurately replicate the human spinal curve while ensuring smoothness and continuity at each node.

The evaluation was based on qualitative visual comparison rather than quantitative metrics, as the ground-truth spine curve is not available as a pixel-wise reference. The Bézier function fitting was visually confirmed by TCM practitioners to best represent the natural spinal curvature.

In the Bézier method, the start point

P_{0}

and the end point

P_{n}

among the feature points determine the start and end positions of the spine curve, the intermediate control point

P_{i}

controls the shape and curvature of the curve, and the value of i ranges from

1

to

n - 1

. By adjusting the number of feature control points, the Bézier function is able to restore the shape of the spine to the maximum extent possible. The Bézier curve is defined as follows:

B (t) = \sum_{i = 0}^{n} P_{i} \cdot B_{i, n} (t)

(1)

where

B (t)

is the parameterized form of the Bessel curve,

t

is a variable between 0 and 1 that defines the position of the point on the Bessel curve, and as

t

changes from 0 to 1, the Bessel curve moves from the starting control point

P_{0}

to the ending control point

P_{n}

.

P_{i}

is the control point of the curve. When fitting the spine curve, the control points are selected from the characteristic points of the human spine, and each control point

P_{i}

has a corresponding Bessel basis function

B_{i, n} (t)

. The basis function is defined as follows:

B_{i, n} (t) = (\binom{n}{i}) (1 - t)^{n - i} t^{i}

(2)

where

B_{i, n} (t)

is the Bessel curve basis function that defines the degree of influence of the control points on the curve. The number of combinations

(\binom{n}{i})

is the weighting coefficient, which reflects the combination way when choosing

i

control points from

n

control points. It was experimentally verified that the best results were obtained when the number of control points was set to 100 during the fitting process, and the obtained spine curve fitting results are shown in Figure 6. In Figure 6, the green curve illustrates the spinal positioning curve derived from Bézier function fitting. This curve will serve as a benchmark for acupoint localization in subsequent experiments, thereby ensuring the precise marking of acupoint positions and providing reliable support for further acupoint detection and research.

2.1.2. Method for Extracting Acromion Landmark Points on the Human Body

The analysis of the “bone proportional measurement method,” one of the traditional Chinese medicine acupoint localization techniques [20], shows that the distance between the acromions on the left and right sides of the human body is 16 cun. The acromion refers to the outermost prominence of the scapula on both sides, forming the highest point of the shoulders. Therefore, in an image, the pixel distance corresponding to one cun can be determined by locating the positions of the acromions on the back. The specific implementation workflow is shown in Figure 7.

To simplify the image information, the input image is first converted to grayscale. This process transforms the color image into a single-channel intensity image, thereby reducing the volume of data to be processed. Since the acromion is located at the edge of the human body contour, a suitable edge detection operator is required to extract the body outline. By comparing the contour extraction effects of three different edge detection operators, Sobel, Prewitt, and Canny, this method adopts the Canny operator for extracting the human body contour [21]. The Canny operator, through non-maximum suppression and double-threshold detection, is capable of effectively removing noise while ensuring that the extracted contours remain sufficiently smooth.

After obtaining the complete human body contour, a bounding rectangle is fitted to the entire contour, and the vertical centerline of the rectangle is calculated. Based on this centerline, the contour region is divided into left and right halves, and contour points are sequentially stored outward from the centerline. The acromion is located on the lateral side of the human contour and forms a small angle with the vertical centerline, usually close to horizontal. Therefore, by traversing the contour points in the upper half of the image and calculating the tangent slope at each point, the points closest to zero on each side are selected as the candidate acromion points. For a contour point

P_{i}

(

X_{i}

,

Y_{i}

), its tangent slope can be calculated based on the difference with adjacent points

P_{i - 1}

(

X_{i - 1}

,

Y_{i - 1}

) and

P_{i + 1}

(

X_{i + 1}

,

Y_{i + 1}

). The forward difference formula is as follows:

{slope}_{i - 1} = \frac{y_{i} - y_{i - 1}}{x_{i} - x_{i - 1}}

(3)

The backward difference formula is as follows:

{slope}_{i + 1} = \frac{y_{i + 1} - y_{i}}{x_{i + 1} - x_{i}}

(4)

Here, slope_i−1 and slope_i+1 represent the slopes between the current point and the previous point, and the current point and the next point, respectively. To obtain the tangent slope at point Pi, the average of the forward difference and backward difference is calculated as follows:

\tan gent_{slope}_{i} = \frac{{slope}_{i - 1} + {slope}_{i + 1}}{2}

(5)

In the formula, tangent_slope_i epresents the average tangent slope at point Pi. For the acromion point, the point with the smallest average tangent slope is selected. After obtaining the acromion points on the human body, the pixel coordinates between the left and right acromion points in the image are identified. Let the pixel distance between the two acromion points be L, and let the pixel distance corresponding to one “cun” bed. The calculation formula is d = L/16.

2.1.3. Coordinate Inference Model for Acupoint Locations

To identify the specific locations of acupoints on the human back from an image, the spinal curvature of the human body was analyzed using the aforementioned image processing techniques. Subsequently, the pixel distance corresponding to one “cun” was calculated. In accordance with the distribution of meridians and the relative positions of acupoints in traditional Chinese medicine, a coordinate correlation exists among the acupoints. The Dazhui acupoint serves as a reference benchmark for progressively calculating the coordinate positions of other back acupoints.

Notably, the axis of symmetry is the fitted spine curve (Bézier curve), not a straight line. Even when the spine is curved, the left–right symmetry of acupoints holds because bilateral acupoints are defined by their perpendicular distance to the spine curve at each vertebral level.

In GB/T 12346-2006 “Nomenclature and Location of Acupoints” [22], it is stated that the Dazhui acupoint is located in the depression below the spinous process of the seventh cervical vertebra, along the midline of the human body. The midline of the human body recorded in the national standard refers to the curve of the human spine; thus, the position of the Dazhui acupoint can be located on the curve of the human spine. We assume that the pixel coordinates of the Dazhui acupoint in the image are

P_{d z}

(

X_{0}

,

Y_{0}

), and the calculated pixel distance corresponding to one cun in the image is L. According to traditional Chinese medicine theory, the position three cun below the Dazhui acupoint is the Taodao acupoint; therefore, the pixel coordinates of the Taodao acupoint in the image are (

X_{0}

,

Y_{0 + 3 d}

). The schematic diagram of the acupoint reasoning process is shown in Figure 8. Based on the aforementioned principles, the 141 acupoints on the human back can be deduced.

To verify the accuracy of the identified acupoints, the coordinates of the detected acupoints are compared with the reference acupoint coordinates provided by experts. Let the coordinates of the acupoint identified by the model be denoted as (

X_{d}

,

Y_{d}

), and the reference acupoint coordinates be denoted as (

X_{a}

,

Y_{a}

). To quantify the difference between the two sets of coordinates, the Euclidean distance serves as a metric to calculate the straight-line distance between the two acupoints. The formula for calculating the Euclidean distance is as follows:

D = \sqrt{{(x_{d} - x_{a})}^{2} + {(y_{d} - y_{a})}^{2}}

(6)

where D represents the straight-line distance between the identified coordinates and the reference coordinates.

In the experiment, a total of 50 human back images were selected as input data, representing a range of body types and slight variations in posture. This selection was intended to assess the detection accuracy and robustness of the proposed method under diverse conditions. During the detection process, 38 images successfully completed the acupoint detection task, whereas 12 images failed to accurately identify acupoints due to issues such as abnormal aspect ratios or missing edge information. These results highlight the significant impact of the input image’s size ratio on the detection performance. Figure 9 presents the acupoint identification results on the human back, including labeled acupoints, reference acupoints, and the Euclidean distance between the detected and reference acupoints.

To provide a more intuitive understanding of the discrepancies between the identified acupoints and the reference acupoints, this section presents the error statistics for the acupoints from three selected images. Additionally, the distribution of errors for each acupoint is shown in Figure 10. The red polyline represents the distribution of detection errors for the 141 acupoints in Sample 1, with error values ranging from 0 to 8.83 pixels. Analyzing the three sample images, it is observed that in Sample 1, the acupoints with larger errors are primarily concentrated around the neck and underarm regions. The blue polyline indicates the error distribution for Sample 2, which exhibits generally lower and more stable error values. The green polyline represents the error distribution for Sample 3, showing a similar pattern to Sample 1, with notably higher error values at acupoints located on both sides of the neck and underarms.

The success rate of detecting 50 images is 76%, with an average error of 1.27 pixels and an average accuracy of 84.3%. These results indicate that there remains significant potential for further optimization of the detection algorithm, particularly regarding its generalization and accuracy in acupoint recognition.

2.2. Acupoint Keypoint Detection Based on Improved YOLOv8-Pose

This study utilizes the YOLOv8-Pose model to enhance the accuracy of acupoint detection on the human back. Specifically, we introduce a non-local attention mechanism to capture the spatial dependencies between acupoint keypoints while integrating global features by calculating the correlations among keypoint locations within the feature map. Additionally, we constructed a keypoint regression bounding box that allows the OKS loss function to incorporate geometric information, such as the area, width, and height of the bounding box, based on the Euclidean distance. This approach effectively reduces the prediction error of the model.

2.2.1. Dataset Construction

To build a high-quality acupoint dataset, this study collected data from the back regions of diverse individuals, covering variations in BMI, age, gender, and skin color. Examples of some dataset images are shown in Figure 11. Back acupoints are primarily distributed along the Bladder Meridian of Foot-Taiyang, the Small Intestine Meridian of Hand-Taiyang, and the Governor Vessel. Acupoints on the Governor Vessel are mainly located in the depressions below the spinous processes of the vertebrae, representing key areas of the spinal column; those on the Bladder Meridian of Foot-Taiyang are situated approximately 1.5 or 3 cun lateral to the spinous processes, following a consistent positioning pattern; while acupoints on the Small Intestine Meridian of Hand-Taiyang are concentrated around the scapular spine, which can be precisely located based on the relative positions between thoracic vertebrae and the scapula. Additionally, the back region includes several fixed extra-meridian points that have unique therapeutic functions and significance. During the dataset construction, 49 fundamental acupoint feature points were selected and annotated according to their spatial distribution.

The annotation of acupoint locations is a crucial step in dataset construction, as it directly affects the accuracy and effectiveness of model training. In this study, the Labelme annotation software was used to complete the annotation of acupoints on the back. Labelme (version 5.0.1) is an open-source image annotation tool that supports various annotation forms. The annotation interface is shown in Figure 12. During the annotation process, the positioning of the acupoints strictly followed the national standards, and the accuracy of each annotated acupoint was ensured under the guidance of relevant traditional Chinese medicine practitioners.

During the dataset construction process, a total of 420 back images were annotated, covering a diverse population to ensure broad adaptability of the dataset. In the annotation process, all acupoint locations were stored as keypoints, with the two-dimensional pixel coordinates of each acupoint recorded in the image. These coordinate data accurately reflect the positions of acupoints within the images and are used for supervised learning in subsequent model training.

The annotated data images are shown in Figure 13. Different colors in the figure represent 49 different acupoints. All the annotation information is stored in a standard JSON file format. Each annotated JSON file contains the image file information, keypoint information, and acupoint number.

2.2.2. Improved YOLOv8-Pose Model

The YOLO algorithm is a deep-learning-based object detection model widely applied in real-time object detection tasks [23]. Its core concept treats object detection as a regression task, directly predicting all object categories and their localization information within an image through a single forward pass. Specifically, YOLO divides the input image into S × S grids, with each grid responsible for predicting whether an object exists within that region while simultaneously regressing the object’s bounding box and category probability.

YOLOv8 is a variant within the YOLO series, adopting the CSPDarknet53 backbone architecture with enhanced feature extraction capabilities [24]. It replaces the Decoupled-Head component in the head section with an optimized version. YOLOv8-Pose is a branch of the YOLOv8 model designed to simultaneously perform human object bounding box detection and human keypoint localization [25]. Its network architecture primarily consists of three components: backbone, neck, and head. The backbone serves as the main network, responsible for pre-training and feature extraction of input images through the CBS, C2f, and SPPF modules. The CBS module comprises a convolutional layer, a batch normalization layer, and an activation function. It first extracts features through convolution, then applies batch normalization to accelerate training, and finally employs the SiLu activation function to enhance the network’s nonlinear representation capabilities. The module structure is illustrated in Figure 14. This design effectively mitigates the vanishing gradient problem commonly encountered during small object detection.

The C2f module first performs preliminary processing on the input feature map via labeled convolutions, then divides the feature map into two branches: one part preserves the original features, while the other undergoes layer-by-layer processing through N Bottleneck modules. Each Bottleneck module consists of two CBS modules and one Concatenate (Concat) module. After processing through the Bottleneck modules, the feature stream splits into two paths: one path transforms the features and feeds them into the next Bottleneck module; the other path retains the current features and provides input for subsequent feature concatenation processing. After processing by N Bottleneck modules, features from all paths are fused. The network architecture of the C2f module is illustrated in Figure 15.

The SPPF module expands the network’s receptive field by applying multi-scale pooling operations to input feature maps, thereby enhancing the model’s ability to process diverse objects while improving detection performance without increasing computational overhead. Specifically, input features undergo preliminary processing through a CBS module before branching into two paths: one path preserves the original features and feeds them into a Concat module for concatenation, while the other path sequentially downsamples the features through three max-pooling layers. The pooled features are ultimately also fed into the Concat module for fusion. The concatenated features are then processed through another CBS module to generate the final output. The network structure of the SPPF module is illustrated in Figure 16.

The neck network serves as the model’s neck component, integrating and further processing features extracted from the backbone through C2f, Concat, and Upsample modules. This enhances the model’s ability to generate clearer prediction results. The head component serves as the model’s head network. It processes the fused features through two CBS modules, followed by a Conv2D layer that convolves the data and outputs the final result.

However, in human acupoint detection, acupoints on the back exhibit a dense distribution and appear in large numbers. Particularly in high-resolution images, the smaller acupoint dimensions pose challenges for model differentiation. Therefore, this study proposes an improved network architecture to address the limitations of the existing model, as illustrated in Figure 17. In the red area (a) of Figure 17, we integrate the non-local attention mechanism. When the input image passes through the SPPF, it enters the non-local module, which captures the long-range dependencies of the acupoints by calculating the correlations among the keypoints. This integration allows for a more comprehensive feature representation. In the red area (b) of Figure 17, we enhance the original OKS loss function by incorporating geometric information into the bounding box regression. This modification transitions the focus from single keypoint regression to keypoint bounding box regression. Consequently, the model can fully account for geometric attributes such as width, height, and area during the regression process, thereby improving the accuracy of acupoint detection by effectively evaluating the discrepancies between predicted results and actual labels in the head prediction layer.

2.2.3. Improved Non-Local Attention Mechanism

During the detection process of acupoints on the human back, relative dependencies exist between different acupoints. The distribution of back acupoints exhibits inherent long-range spatial dependencies. First, acupoints along the Bladder Meridian of Foot-Taiyang (e.g., Feishu, Xinshu, Ganshu, Shenshu) are spaced with large longitudinal distances, which can span tens to hundreds of pixels in images, forming typical long-range dependencies. Second, bilaterally symmetric acupoints (e.g., left and right Dachangshu) maintain cross-hemisphere correlations. Conventional local attention mechanisms model feature dependencies through convolutional kernels with limited receptive fields, where the effective modeling distance is constrained by the number of convolutional layers and kernel size. Consequently, they struggle to capture the aforementioned long-range dependencies. In contrast, the non-local attention mechanism directly models global dependencies by computing pairwise similarities between all positions in the feature map, making it more suitable for back acupoint detection. To enable the model to better capture correlations among various acupoints, this paper introduces the non-local attention mechanism into the backbone network of the YOLOv8 model. The non-local module captures global features by calculating the similarity between any two positions within the input feature map. This mechanism dynamically adjusts the feature representation at each location, enabling the model to extract local features while incorporating global contextual information. The network architecture of the non-local attention mechanism is illustrated in Figure 18.

When feature X is input into the non-local attention mechanism for linear mapping after SPPF, T represents the number of time steps, while H and W denote the height and width of the feature, respectively, with 1024 indicating the number of channels. Initially, feature X compresses the channels using a 1 × 1 × 1 convolution, resulting in three new features,

θ

,

ϕ

, and g, each with 512 channels. Subsequently, the similarity matrix THW × THW is derived by performing a dot-product operation between features

θ

and

ϕ

to assess the autocorrelation between them. This similarity matrix is then normalized using the Softmax function, transforming it into a weight matrix. Each element in this matrix signifies the similarity weight of one position in relation to others, with values ranging from 0 to 1. The weight matrix is then multiplied by feature g to produce a new feature

y_{i}

of dimensions T × H × W × 512. Finally, a 1 × 1 convolution is applied to restore the number of channels to 1024, followed by a residual connection with the original input feature X, resulting in an output feature

Z_{i}

that encapsulates the long-range dependencies of the image. The formula for the matrix dot-product operation between features

θ

, and

ϕ

is as follows:

f (x_{i}, x_{j}) = θ {(x_{i})}^{T} ϕ (x_{j})

(7)

where

x_{i}

and

x_{j}

represent the

i t h

and

j

-th positions in the feature graph X, respectively. After mapping

x_{i}

and

x_{j}

to a new feature space by 1 × 1 convolution, representations r and s are obtained, respectively. T is the time step. The new feature

y_{i}

of the

x_{i}

position can be derived as:

y_{i} = \frac{1}{C (x)} \sum_{\forall j} f (x_{i}, x_{j}) g (x_{j})

(8)

In the formula,

y_{i}

is the new feature obtained after the dot-product operation between the weight matrix and feature

g

,

\sum_{\forall j}

is the weighted sum of all positions

j

associated in the input feature map,

C (x)

is the normalization factor, and

g (x_{j})

is the result obtained by the linear transformation of the position features. The output feature

Z_{i}

of the non-local module is defined as follows:

Z_{i} = W_{z} y_{i} + X

(9)

where

Z_{i}

is the new feature map of position

i

after the fusion of long-range dependencies. Wz is the linear transform used to project the new feature into the same dimension as the original input feature, and

X

is the original feature before the input non-local module.

To compare the performance of various attention mechanisms in acupoint detection, this study selected the Convolutional Block Attention Module [26] and the Simple Attention Module [27]. Additionally, two common attention modules, namely SimAM [28] and the Activation Enhancement (AE) attention module [29], were compared with the non-local attention module. The performance evaluation criteria included precision, recall, and mAP. Table 1 presents the experimental results of different attention mechanisms. Among these, the precision of CBAM reached 87.6%, although its recall was lower at 83.2%. SimAM exhibited a higher recall of 85.5%, while its precision and mAP were 85.4% and 83.8%, respectively. The metrics for AE were relatively balanced, with precision, recall, and mAP values of 85.9%, 85.4%, and 83.2%, respectively. The precision of the non-local attention mechanism was 88.1%, with a recall of 85.7% and an mAP of 85.8%, demonstrating superiority over other attention mechanisms across various metrics. By analyzing the relationship between the location of each keypoint in the input feature map and the locations of all other keypoints, the non-local attention mechanism can generate more discriminative feature representations, thereby enabling the model to more accurately identify and locate dense and small acupoints. Consequently, the precision, recall, and mAP50 are notably superior.

The superiority of non-local attention over CBAM, SimAM, and AE (Table 1) can be attributed to the unique spatial distribution of back acupoints. Unlike general object parts in natural images, back acupoints exhibit long-range dependencies and bilateral symmetry. Local attention mechanisms, with their limited receptive fields, cannot effectively model such dependencies. In contrast, non-local attention computes pairwise similarities across all spatial positions, making it inherently suitable for back acupoint detection.

2.2.4. Optimization Loss Function

In the task of detecting keypoints on acupoints, the primary role of the loss function is to measure the discrepancy between the model’s predictions and the ground-truth annotated data. The loss function of YOLOv8-Pose comprises multiple components, including Classification Loss, Keypoint Confidence Loss, Bounding Box Loss, keypoint pose localization loss, and Focal Loss. Among these, the keypoint pose localization loss evaluates the accuracy of keypoint predictions. It minimizes the Euclidean distance between the model’s predicted keypoint locations and the true keypoint locations, thereby enhancing the accuracy of keypoint detection.

In the original YOLOv8-Pose model, the OKS loss function serves as the keypoint pose localization loss. This function evaluates the keypoint by comparing the Euclidean distance between the predicted acupoint and the actual acupoint. The calculation principle of the keypoint pose localization loss is illustrated in Figure 19, which depicts four sets of predicted keypoint regression paths. In this figure, the blue sphere represents the actual keypoint position, while the red sphere denotes the predicted keypoint position. The loss is defined as the Euclidean distance between the predicted keypoint and the ground-truth keypoint. The OKS loss is computed as follows:

L_{O K S} = σ_{1} \cdot d (p_{1}, g_{1}) + σ_{2} \cdot d (p_{2}, g_{2}) + σ_{3} \cdot d (p_{3}, g_{3}) + σ_{4} \cdot d (p_{4}, g_{4})

(10)

where

p

is the predicted keypoint and

g

is the real keypoint.

d (p_{1}, g_{1})

is the Euclidean distance between the first set of predicted keypoints and the ground-truth keypoint.

σ

is the weight coefficient corresponding to each group of keypoints, and the weight coefficient is determined by the number of keypoint groups.

L_{O K S}

calculates the total loss by summing the products of the Euclidean distance and the weight coefficient between each set of keypoints.

In the keypoint detection task, the original OKS loss function primarily relies on the Euclidean distance between the predicted and ground-truth points, neglecting the geometric relationships among these points. In contrast, the EIOU loss function, as an enhanced version of the IOU loss, not only accounts for the overlap area between the predicted box and the ground-truth bounding box but also integrates the width, height, and center-to-center distance of the bounding boxes. This approach provides a more comprehensive evaluation of the matching degree between the predicted and actual boxes.

Based on the concept of the EIOU loss function [28], this paper constructs a keypoint regression bounding box and introduces width, height, and center point distance losses in addition to the Euclidean distance loss to enhance the regression accuracy of acupoint keypoints. Specifically, the original OKS loss only computes the Euclidean distance between predicted and ground-truth keypoints, which cannot encode the relative positional constraints inherent in acupoint distribution. Inspired by EIOU, we construct a virtual bounding box for each keypoint, as illustrated in Figure 20, and introduce width difference, height difference, and center distance as penalty terms. The width and height terms implicitly enforce the standardized inter-acupoint spacing (e.g., 1.5 cun or 3 cun along the Bladder Meridian), while the center distance term ensures the spatial consistency of symmetric acupoints (e.g., left and right acromion points). The weighting coefficients α and β in Equation (12) were determined via grid search on the validation set: among the combinations {0.3, 0.5, 0.7} tested, α = β = 0.5 achieved the highest mAP.

The loss function is reconstructed by considering both bounding box regression and the Euclidean distance between the predicted point and the ground-truth keypoint. The loss function is defined as follows:

L_{E I O U} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{C_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{C_{h}^{2}}

(11)

In this study, we define the Intersection Over Union (IOU) as the ratio of the area of overlap between the predicted bounding box and the ground-truth bounding box to the area of their union. Let

ρ^{2} (b, b^{g t})

represent the square of the Euclidean distance between the center point of the predicted bounding box and the true bounding box. We denote

c

as the diagonal length of the minimum enclosing region that encompasses both the predicted and actual bounding boxes. The variables

w

and

h

refer to the width and height of the predicted bounding box, respectively, while

w^{g t}

and

h^{g t}

denote the width and height of the actual bounding box. Additionally,

C_{w}

and

C_{h}

are defined as the maximum values between the width and height of the predicted bounding box and the actual bounding box. The improved loss function is thus formulated as follows:

L_{E O K S} = α \cdot L_{O K S} + β \cdot L_{E I O U}

(12)

where

L_{E O K S}

represents the keypoint regression loss,

L_{E I O U}

denotes the keypoint boundary frame EIOU loss, and

L_{O K S}

signifies the Euclidean distance loss. α and β are the weighting factors used to balance the contribution of the two loss components, both set to 0.5 in our experiments. To evaluate the impact of the improved loss function

L_{E O K S}

on the model, this paper conducts comparative experiments under consistent configuration conditions.

Figure 21 shows the comparison results between the original OKS loss function and the improved EOKS loss function. Lower loss values indicate that the model progressively optimizes during training, reducing prediction errors. Comparing the two pose loss curves reveals that both loss functions exhibit significant decreases within the first 50 training iterations, with

L_{O K S}

showing slightly higher values than the improved function. As the number of iterations increases, the loss reduction rate slows and stabilizes. The loss value of the improved

L_{E O K S}

loss function consistently remains below that of

L_{O K S}

, exhibiting smaller fluctuations.

L_{O K S}

eventually stabilizes around 2.25, while the optimized

L_{E O K S}

stabilizes around 1.75, representing a 0.5 reduction compared to the original model. The improved loss function demonstrated lower values throughout the training process, indicating that the optimized loss function enhances the accuracy of keypoint detection and improves generalization performance.

2.2.5. Acupoint Inference Mechanism Design

To address the decline in model recognition accuracy caused by the rapid increase in acupoint quantity, this study designed an acupoint inference mechanism during the model detection phase. Based on the existing 49 fundamental feature point acupoints, the remaining 92 acupoint locations are inferred and localized according to anatomical relationships between acupoints and traditional Chinese medicine theory. The principle of the acupoint inference module is illustrated in Figure 22. First, the original image is input into the keypoint detection model. After feature extraction, the keypoint information of 49 basic benchmark acupoints is obtained, including detected acupoint numbers and acupoint coordinate information. Through a layered processing strategy, we ensure the accuracy of acupoint detection while mitigating the performance degradation that occurs when the number of detected acupoints is expanded.

According to the bone proportional measurement method used in traditional Chinese acupuncture point localization, and referencing the principles of human body proportion calculation, the distance between the left and right acromions is defined as 16 cun [30]. By traversing the fundamental acupoint feature points and identifying the left acromion point (K1) and the right acromion point (K2), the horizontal pixel difference between these two points is calculated. Dividing this difference by 16 yields the pixel distance corresponding to the unit of cun. This distance serves as a basis for establishing the correlation between the fundamental acupoint feature points and the acupoint to be inferred. Finally, the basic acupoint feature points and the inferred acupoint feature points are superimposed and annotated on the input image to achieve the localization and display of acupoints on the human back.

This study established a reasoning model system for locating acupoints on the human back based on the distribution patterns of the associated meridians. According to the regions of the meridians where the acupoints are situated, the acupoints on the human back were classified into four categories: the Extraordinary Points group, the Bladder Meridian of Foot-Taiyang acupoints group, the Small Intestine Meridian of Hand-Taiyang acupoints group, and the Huatuo Jiaji Points group.

Utilizing the pixel distance S corresponding to one ‘cun’ in the images obtained during parameter solving, and referencing the record in the “Name and Location of Meridian Points” (GB/T 12346-2021) [31], we identify the Dazhui acupoint as situated along the posterior median line of the human body, specifically in the depression beneath the spinous process of the seventh cervical vertebra. The Dingchuan point is located beneath the spinous process of the seventh cervical vertebra, at a distance of 0.5 ‘cun’ from both the left and right sides of the posterior median line. Among the characteristic points of basic acupoints, K₁₇ is identified as the Dazhui point, with its coordinate information designated as (X₁₇, Y₁₇). Consequently, the pixel coordinates for the left Dingchuan point are (X₁₇ − S, Y₁₇), while those for the right Dingchuan point are (X₁₇ + S, Y₁₇). Based on these principles, the pixel coordinate table for the inference acupoint is presented in Table 2.

In Table 2, K₁₁ and K₁₂ serve as the initial points for the Huatuo Jiaji acupoint group. K₁₃, K₁₄, K₁₅, and K₁₆ represent the starting points of the Foot-Taiyang Bladder Meridian. K₅₂ and K₅₃ correspond to the initial points of the Small Intestine Meridian of Hand-Taiyang. Lastly, K₅₀ and K₅₁ are identified as the starting points of the Jingweiqi group.

2.3. Construction of In Situ Projection System

The in situ projection platform for human back acupoints consists of three core components: camera calibration, projection device parameter analysis, and dynamic image correction. These components work synergistically to ensure precise alignment between the projected image and the actual anatomical features. First, the camera calibration process establishes the mapping relationship between three-dimensional space and two-dimensional images by acquiring both the intrinsic and extrinsic parameters of the imaging device. This step enables the calculation of the physical dimensions of the region to be detected. Second, the projection device parameter analysis function determines the proportional relationship between the projected image and the actual physical space, adjusting the scaling factor as necessary to ensure that the projected image accurately reflects real-world dimensions. Lastly, to address potential misalignment during projection, the platform integrates a dynamic projection correction module based on OpenCV. This module enables real-time adjustment of the position, size, scaling factor, and other relevant parameters of the projected image. The system architecture of the proposed in situ projection platform is illustrated in Figure 23.

2.3.1. Calibration of Monocular Cameras

In this study, Zhang’s checkerboard calibration method is adopted for monocular camera calibration [32]. The checkerboard used for calibration consists of six rows and nine columns, resulting in a total of 54 corner points, with each square measuring 30 mm. During the calibration process, the physical distance, denoted as z, between the camera and the checkerboard is initially set to 50 cm.

The checkerboard is aligned correctly, and images are captured from various angles and directions, ensuring that the checkerboard occupies 30% to 50% of the total image area. The resolution of the camera used in this study is 640 × 480 pixels. After capturing the checkerboard images, OpenCV is used for camera calibration. The number of row and column intersection points on the checkerboard is defined, along with the termination criteria for corner detection. To improve calibration accuracy, the iteration count is set to 30, and the corner detection precision threshold is set to 0.001. The detection process concludes either when the iteration count reaches the maximum of 30 or when the corner displacement is less than the specified precision threshold. Subsequently, the captured images are read, converted to grayscale, and the corners of the checkerboard are detected.

Following image capture and corner detection, the calibrate Camera function in OpenCV is applied to calculate the intrinsic parameter matrix and distortion coefficients of the camera. The intrinsic parameter matrix K includes the camera’s focal lengths (

f_{x}

and

f_{y}

) along the x-axis and y-axis, respectively, as well as the coordinates of the optical center (

c_{x}

and

c_{y}

) in the camera image. These parameters are crucial for understanding the geometric properties of the camera and ensuring precise image calibration. The internal parameter matrix obtained through this process is as follows:

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] = [\begin{matrix} 548.7715962 & 0 & 342.4261934 \\ 0 & 536.3619627 & 279.67487593 \\ 0 & 0 & 1 \end{matrix}]

(13)

Radial distortion occurs due to the deviation between the actual geometry of a lens and the idealized pinhole camera model. This distortion causes pixels farther from the optical axis to undergo more significant displacement than those closer to the axis, resulting in a characteristic “barrel” or “pincushion” effect in the image. Radial distortion is typically modeled using three coefficients:

k_{1}

, the primary radial distortion coefficient, which governs the first-order distortion;

k_{2}

, the secondary radial distortion coefficient, which accounts for the second-order distortion; and

k_{3}

, the tertiary radial distortion coefficient, which captures the higher-order radial distortion effects. In contrast, tangential distortion arises when the lens is not perfectly aligned with the image plane, leading to distortion of image points in both the horizontal and vertical directions. Tangential distortion is typically characterized by two parameters:

p_{1}

, the first tangential distortion coefficient, which controls the distortion along the horizontal axis, and

p_{2}

, the second tangential distortion coefficient, which governs the distortion along the vertical axis. These coefficients are determined through the calibration process, and the resulting distortion parameters are as follows:

[\begin{array}{l} k_{1} & k_{2} & k_{3} & p_{1} & p_{2} \end{array}] = [\begin{array}{l} - 0.2179 & 1.3310 & - 0.0049 & - 0.0008 & - 3.0898 \end{array}]

(14)

During the calibration procedure, ten calibration images were selected, yielding ten sets of rotation vectors. Each set consists of three components, which correspond to the rotation angles around the three spatial axes. These rotation vectors provide an accurate description of the relative pose between the camera and the calibration board. The ten sets of rotation vectors obtained in this study are presented in matrix form as follows:

Rotation Vectors = [\begin{matrix} - 0.0816 & - 0.0285 & 1.5941 \\ 0.0140 & 0.0624 & 1.9066 \\ - 0.0623 & 0.0094 & 1.6853 \\ - 0.0519 & - 0.0136 & 1.5122 \\ - 0.0510 & 0.0494 & 1.2038 \\ - 0.0402 & 0.0728 & 1.3124 \\ 0.0440 & 0.0647 & 1.8800 \\ 0.0366 & 0.0952 & 1.9173 \\ - 0.1165 & - 0.0313 & 1.3852 \\ 0.0038 & 0.0765 & 2.0885 \end{matrix}]

(15)

The relationship between the actual size of an object (

X_{real}

) and its pixel size in an image (

X_{pixel}

) can be calculated using the focal length (

f_{x}

) and the distance from the object to the camera (

Z

). Therefore, the formula for calculating the actual size of the object is as follows:

X_{real} = \frac{X_{pixel} \times Z}{f_{x}}

(16)

Y_{real} = \frac{Y_{pixel} \times Z}{f_{y}}

(17)

where

X_{real}

and

Y_{real}

represent the object’s width and height in the real world, measured in millimeters.

X_{pixel}

and

Y_{pixel}

denote the pixel dimensions of the object in the image. The variable Z indicates the distance from the object to the camera, also measured in millimeters. Additionally,

f_{x}

and

f_{y}

represent the camera’s focal lengths in pixels, which correspond to the field of view angle associated with each pixel in the image.

2.3.2. Analysis of Projection Proportion Relation

To achieve accurate in situ projection of human acupoints, it is critical to determine the projection distance and angle of the projector before calculating the projection scaling factor. The projector used in this study is the EPSON CB-X31. For optimal projection performance, multiple projection distances were tested, including 100 cm, 150 cm, 200 cm, 250 cm, and 300 cm. During the experiment, the projector’s focal length was manually adjusted to ensure the sharpest possible image. The relationship between the projection distance and the dimensions of the projection screen is shown in Figure 24. At a projection distance of 250 cm, the projection screen dimensions are 140 cm by 90 cm, which fully cover the entire adult back, accommodating both standard and wider body types. Consequently, a projection distance of 250 cm was selected for the system.

In analyzing the projection angle, the projection distance was fixed at 250 cm. The built-in stand of the projector was adjusted to ensure a flat image at the optimal angle. The projection angle was then calculated based on the fixed projection distance and the horizontal height difference between the projector and the projection surface. The calculation diagram for the projection angle is shown in Figure 25.

After conducting measurements, when the distance between the projector and the target is 250 cm and the height difference is 5.4 cm, the projection angle can be calculated using the arctangent function. The formula for calculating the projection angle is given as follows:

θ_{h} = {t a n}^{- 1} (\frac{h}{D})

(18)

where the distance is represented by D, and the height difference is represented by h. The converted formula is as follows:

θ_{h} = a r c t a n (\frac{h}{D})

(19)

When both the projection angle and projection distance of the projector are fixed, the projection scaling factor can be calculated using the four-point proportional analysis method. The calculation process for calculating the proportional relationship between the projected image and the actual physical projection is illustrated in Figure 26. Initially, the size of the image to be projected must be determined, which involves measuring the pixel width and pixel height of the original image.

In Figure 26, panel (a) shows the image to be projected, with vertices labeled as A, B, C, and D. The pixel distance between points A and C is 683 pixels, while the pixel distance between points C and D is 770 pixels. Panel (b) also includes the actual projected image, with vertices labeled E, F, G, and H. The physical distance between points G and H is 650 mm, and the physical distance between points F and H is 580 mm. Using this data, the relationship for calculating the projection scaling factor can be derived.

r_{x} = \frac{L_{x}}{L_{p x}}

(20)

r_{y} = \frac{L_{y}}{L_{p y}}

(21)

Based on the calculations, when the projection distance of the CB-X31 projector is set to 250 cm with a projection angle of 1.24°, the corresponding physical distance for each pixel in the x-direction is 0.84 mm, while in the y-direction, it is 0.85 mm. These parameters allow the projected image to be scaled appropriately to accurately reflect the actual dimensions of the photographed object.

2.3.3. Dynamic Correction of Projection Screen Based on OpenCV

In the human acupoint in situ projection system, the dynamic correction platform, implemented using OpenCV 4.5.4, integrates the target surface image captured by the camera with the output image from the projector to enable real-time adjustment of the projection content. The core function of this system is to dynamically project images onto a designated projection area while supporting real-time operations such as panning, zooming, and rotation. Precise control over the projected image is crucial to ensure accurate alignment and presentation on the display. The platform consists of three main modules: the Load Display Module, the Parameter Calculation Module, and the Interactive Control Module, as shown in Figure 27.

The Load Display Module allows the platform to define the initial position of an image within the overall projection interface, while simultaneously displaying various parameters related to the platform’s configuration. The Parameter Calculation Module consists of two components: projector internal parameters and image parameters.

The internal parameters include projection distance, projection angle, and scaling factor, while the image parameters encompass the actual size of the image, real-time rotation angle, and scaling factor. These image parameters are updated in real-time, with any changes reflected immediately in the displayed values. The Interactive Control Module enables real-time manipulation of the image through button presses, allowing for functions such as translation, rotation, and resizing. The interface of the projection system is shown in Figure 28. Within this interface, the scale factor functions as a zoom factor, with its value adjustable via the “+” and “_” buttons. The center denotes the central coordinates of the projected image, while the rotation angle indicates the degree of rotation. Adjustments can be made by pressing the “R” key for clockwise rotation and the “T” key for counterclockwise rotation. The projection distance refers to the distance between the projector and the image, and the projection angle refers to the angle of projection. The frame rate (FPS) of the projector is also displayed. Projection size represents the actual physical size of the projected image, while pixel size refers to the physical dimensions of each pixel in the x- and y-directions. The coordinates of the image’s top-left, top-right, bottom-left, and bottom-right corners are labeled as the left vertex, right vertex, left bottom point, and right bottom point.

3. Results

3.1. Training Environment

The hardware specifications for model training are as follows: the operating system used is Windows 11, and the programming language employed is Python3.10. The GPU used is an NVIDIA GeForce RTX 4080 SUPER, equipped with 16 GB of memory. The CPU is an Intel^® Core™ i7-13700K (13th Gen), and the CUDA version is 11.8. Detailed training parameters are provided in Table 3. The training process consists of 300 epochs, with an input image size of 640 × 640 pixels and a batch size of four. Early stopping is implemented with a patience of 100 epochs; training will terminate if the model’s performance does not improve within this threshold. Parallel processing is performed using a single thread, and the Adam optimizer is employed with the initial learning rate (Ir0) and final learning rate (Irf) both set to 0.01.

3.2. Evaluation Index

The performance of the model was evaluated using six key metrics: precision (P), recall (R), average precision (AP), mean average precision (mAP), number of parameters (Params/M), and frame rate (FPS). Precision (P) is defined as the proportion of true positive samples among all samples predicted as positive, providing an indication of the accuracy of positive sample predictions. In acupoint keypoint detection, a predicted keypoint is considered a true positive (

T_{P}

) if its Euclidean distance to the ground-truth keypoint is less than 0.5 × d, where d is the pixel distance corresponding to one “cun” for that image (calculated as d = L/16, see Section 2.1.2). This threshold is derived from the clinical tolerance in TCM (0.5 cun). Otherwise, the prediction is a false positive (FP). A ground-truth keypoint with no predicted keypoint within this threshold is a false negative (FN). The formula for calculating precision P is given as follows:

P = \frac{T_{P}}{T_{P} + F_{P}}

(22)

The recall rate denotes the proportion of samples that are accurately predicted as positive relative to the total number of samples that are genuinely positive. In other words, it reflects the likelihood of a sample being classified as positive when it is indeed positive. The formula for calculating recall is as follows:

R = \frac{T_{P}}{T_{P} + F_{N}}

(23)

where

T_{P}

denotes a sample that is predicted to be positive and is indeed positive.

F_{P}

refers to a sample that is predicted to be positive but is actually negative, while

F_{N}

signifies a sample that is actually negative and is predicted to be negative. Here,

P

represents accuracy, and

R

denotes recall.

A P

stands for the average accuracy of a class, whereas

m A P

refers to the mean average precision across all classes. Higher values of

A P

and

m A P

indicate improved detection accuracy of the model. The mAP reported in this study follows the keypoint evaluation protocol. mAP50 refers to the average precision at the OKS [33] (Object Keypoint Similarity) threshold = 0.5, where didi is the Euclidean distance between the predicted and ground-truth keypoints, ss is the object scale, and kiki is a per-keypoint constant. The definitions of

A P

and

m A P

are as follows:

A P = \int_{0}^{1} P (R) d R

(24)

m A P = \frac{1}{n} \sum_{i - 1}^{n} A P (i)

(25)

where P denotes the accuracy rate of the model at a specific recall rate. Here, n represents the total number of classes, while

A P (i)

signifies the average precision for the i-th class. The total number of classes n = 141 for the full acupoint detection task (all back acupoints). The mAP values reported are computed over all 141 classes. The parameter number refers to the total sum of all trainable parameters within the model, encompassing both weights and biases in the neural network. This metric is crucial for assessing the complexity and computational resource demands of the model. Additionally, the frame rate indicates the speed at which the model processes image or video frames, serving as a critical index for evaluating the actual detection speed of the model.

3.3. Ablation Experiment

To verify the effects of various enhancements on model performance, we assessed each improvement’s impact on the model’s recall rate, precision, average precision, and parameters within a consistent training environment. The results of the ablation experiments are illustrated in Figure 29. Specifically, the blue model denotes the basic YOLOv8-Pose model, the orange model represents the integration of the basic YOLOv8-Pose model with non-local modules, the green model features the optimized loss function EOKS of YOLOv8-Pose, and the purple model corresponds to the enhanced acupoint detection model proposed in this paper. Notably, the EOKS loss function is applied in conjunction with the YOLOv8-Pose model that incorporates the non-local module. By comparing the performance metrics of different models, including mAP50, P, R and mAP50-90, the influence of each improved module on overall model performance can be effectively assessed.

The results of the ablation experiments reveal that the incorporation of the non-local attention mechanism into the YOLOv8-Pose model led to a 1.33% improvement in mAP50, with precision (P) and recall (R) increasing by 0.79% and 1.45%, respectively. Furthermore, mAP50-90 showed an increase of 0.2%. The non-local attention mechanism effectively captures the spatial dependencies among keypoints, thereby enhancing the model’s ability to interpret global contextual information, which in turn improves both detection accuracy and recall rates.

Additionally, the introduction of the optimized loss function, EOKS, resulted in a 1.45% improvement in mAP50, with precision and recall increasing by 2.15% and 1.04%, respectively, and mAP50-90 improving by 0.3%. The EOKS loss function offers a more accurate evaluation of the similarity between predicted and actual keypoints, especially in cases involving densely packed keypoints. This leads to a reduction in errors and an overall enhancement in detection performance.

When both the non-local attention mechanism and the optimized EOKS loss function were integrated, mAP50 improved by 2.62%, with precision and recall increasing by 3.23% and 2.13%, respectively, while mAP50-90 improved by 0.4%. The combination of the non-local attention mechanism and the EOKS loss function significantly boosted the detection accuracy and recall rate of the original model, enhancing the model’s ability to capture global features in acupoint detection and refining the measurement of keypoint similarity.

The results of the ablation experiments are summarized in Table 4. Module A incorporates the non-local attention mechanism, while Module B utilizes the EOKS loss function. The integration of both the non-local attention mechanism and the EOKS loss function leads to an increase in GFLOPS and the number of parameters (Params) within the model, which in turn results in a slight reduction in the frame rate (FPS). However, the time required for processing a single image remains 11 ms, which is sufficient to meet the requirements of practical applications.

3.4. Comparison of Mainstream Keypoint Detection Algorithms

To assess the effectiveness of the proposed model in acupoint keypoint detection, we performed comparative experiments against several mainstream keypoint detection networks, including HRNet, RTMPose, OpenPose, and our improved model. The performance metrics used for comparison were the number of parameters and mAP50. The results of these comparisons are shown in Table 5. The base YOLOv8-Pose model has 3.7 M parameters and achieves an mAP50 of 92.4. The HRNet model, with 28.5 M parameters, achieves an mAP50 of 92.1. The RTMPose model has 11.5 M parameters and an mAP50 of 89.2. OpenPose, which contains a significantly higher number of parameters (346 M), reports an mAP50 of 93.1. In contrast, our improved model, which utilizes 4.0 M parameters, achieves the highest mAP50 of 95.0. This demonstrates superior performance relative to all other models. The improved model achieves high detection accuracy while maintaining a lower computational complexity, making it more suitable for human acupoint detection compared to high-complexity models such as HRNet and OpenPose.

3.5. Result Visualization

To evaluate the detection efficacy of the proposed model, we selected back images from individuals with varying body shapes, ensuring that all images were resized to 640 × 640 pixels. The detection results of the improved model are shown in Figure 30, where the blue dots represent the acupoints identified by the model.

The experimental results indicate that the model demonstrates strong generalization across different postural conditions and can accurately detect and mark 141 acupoints on the human back. Figure 31 illustrates the actual projection effect of the human back image after model detection. The model accurately projects the detected left and right acromial points onto their corresponding locations on the real human body, achieving clear and precise projection for all detected acupoints.

4. Discussion

The acupoint recognition system proposed in this study relies on 49 fundamental detection points as the basis for inference, deriving an additional 92 acupoints through the application of bone proportional measurement and acupoint relationship rules.

However, there is a potential risk that detection errors in the fundamental points may be amplified during the geometric reasoning process. Therefore, it is essential to conduct a quantitative analysis of the error propagation. Initially, the average pixel error for the 49 basic detection points was analyzed, revealing an average Euclidean distance error of 1.8 pixels. Subsequently, during the acupoint inference process, some derived acupoints depend on the coordinates of one to three basic points for interpolation or proportional calculation. Given that these reasoning steps involve multiple coordinate transformations, the errors may propagate in both linear and nonlinear ways. To further quantify the error accumulation, a statistical analysis of the localization errors for the 92 inferred acupoints was performed. The results indicated an average Euclidean distance error of 2.4 pixels, which represents an approximate 30.4% increase compared to the base point error.

Among the inferred points, those with smaller error increases predominantly relied on single-axis interpolation methods (e.g., left–right symmetry), while larger errors were primarily observed in acupoints that involved interpolation across multiple base points or proportional calculations.

5. Conclusions

To address the challenges associated with acupoint identification and localization in traditional Chinese medicine acupuncture treatment, this study explores the identification and localization of acupoints on the human back using machine vision technology. Corresponding acupoint recognition schemes were designed based on two approaches: image processing and deep learning. Additionally, to accurately project detected acupoint locations onto corresponding body positions, a projection adjustment platform was developed using OpenCV. The study first proposes an image processing-based method for detecting and locating back acupoints. By analyzing spinal features in the blue channel and enhancing image quality with adaptive histogram equalization, spinal feature points are extracted through region of interest (ROI) selection, binarization, and morphological processing. A Bézier curve is then used for precise fitting of the spinal curve. Subsequently, the acromion point location is determined through Canny edge detection and tangent slope calculation, establishing a pixel-scale model based on the bone-based “cun” measurement system. Utilizing the spinal curve and “cun” scale, the coordinates of 141 back acupoints are successfully inferred and marked.

Regarding deep learning approaches, this paper constructs an improved acupoint detection method based on the YOLOv8-Pose keypoint detection model. To address the specific demands of back acupoint detection, the model’s backbone incorporates a non-local attention mechanism to enhance global feature modeling capabilities. Additionally, an EOKS loss function is designed to optimize the extraction of spatial geometric information during keypoint regression. By detecting 49 fundamental acupoints, the remaining points were inferred. Combining these with inter-point relationships enabled precise localization of all 141 back acupoints. Experimental results demonstrate that the improved model achieves 95.0% accuracy—a 2.62% improvement over the original model—while maintaining detection speed at 14.5 milliseconds, ensuring an efficient real-time detection capability.

Additionally, this study designed an in situ projection platform for human back acupoints. Using Zhang’s calibration method to obtain camera intrinsic and extrinsic parameters, an image size restoration model was established. Combined with projector parameters, optimal projection distance and related projection parameter configurations were determined. The system employs a four-point proportional analysis method for precise correction of projected image dimensions. A projection platform with real-time interactive control functions was developed using OpenCV, supporting adjustments such as image translation, rotation, and scaling. Experimental validation confirmed the platform’s capability for accurate projection and localization of marked acupoints.

Ultimately, this research established a comprehensive acupoint recognition and projection system. This system integrates camera, projector, and computer hardware to automate the entire workflow from image acquisition and data processing to acupoint projection. Software-wise, combining high-precision acupoint recognition algorithms with an interactive interface, the system enables real-time detection, acupoint selection, and projection adjustment. Experimental results validate the system’s ability to accurately identify 141 back acupoints and achieve clear, precise real-time projection localization.

Author Contributions

Conceptualization, Z.Z. and P.L.; methodology, Z.Z. and L.S.; software, Z.Z. and S.L.; validation, R.X., L.S. and Z.Z.; formal analysis, Z.Z. and S.L.; investigation, L.S. and R.X.; resources, P.L.; data curation, S.L.; writing—original draft preparation, Z.Z. and L.S.; writing—review and editing, P.L.; visualization, L.S. and R.X.; supervision, P.L.; project administration, P.L.; funding acquisition, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fund of the Institute of Complexity Science, Henan University of Technology (CSKFJJ-2026-6), the Henan Province Science and Technology Research Program (252102310444), the Innovation Training Program for College Students of Henan University of Technology (202510463068), and the Institute for Complexity Science, Henan University of Technology (CSYISKFJJ-2025-56).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki. This study involved the collection of anonymized back images with no identifiable personal information. According to Article 32(2) of the Measures for Ethical Review of Life Sciences and Medical Research Involving Human Subjects (National Health Commission of China, 2023) [34], research using anonymized information data that does not cause harm to human subjects, does not involve sensitive personal information, and has no commercial interests may be exempted from ethical review. Therefore, this study did not require ethics committee approval.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy and ethical restrictions.

Acknowledgments

The authors would like to thank all volunteers who participated in the back image collection.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AHE	Adaptive Histogram Equalization
ROI	Region of Interest
TCM	Traditional Chinese Medicine
YOLO	You Only Look Once
CNN	Convolutional Neural Network
OKS	Object Keypoint Similarity
EOKS	Efficient Object Keypoint Similarity
EIOU	Efficient Intersection Over Union
IOU	Intersection Over Union
mAP	Mean Average Precision
FPS	Frames Per Second
CBAM	Convolutional Block Attention Module
SimAM	Simple Attention Module
AE	Activation Enhancement
SPPF	Spatial Pyramid Pooling Fast

References

Zhu, B. On acupoints and acupoint specificity. Chin. Acupunct. Moxibustion 2021, 41, 943–950. [Google Scholar]
Li, J.; Fei, Z.; Xie, Y.; Deng, D.; Ming, X.; Niu, F. A review of acupoint localization based on deep learning. Chin. Med. 2025, 20, 116. [Google Scholar] [CrossRef] [PubMed]
Zhai, Z.; Wang, Z.; Xu, L.; Zhang, L.; Zhang, Y.; Yin, J.; Zeng, P.; Li, C.; Sun, T.; Jiang, T. A systematic review of computer-aided acupoint localization. iScience 2025, 28, 113708. [Google Scholar] [CrossRef] [PubMed]
Li, F.J.; Gao, M.; Yang, Y.Z. Exploration on the application of artificial intelligence in acupoint location technology in traditional Chinese medicine. Shanghai J. Tradit. Chin. Med. 2024, 58, 17–22. [Google Scholar]
Chang, M.; Zhu, Q. Automatic location of facial acupuncture-point based on facial feature points positioning. In 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology; Atlantis Press: Dordrecht, The Netherlands, 2017; pp. 545–549. [Google Scholar]
Fei, H.L.; Huang, L.J.; Lu, D.H.; Guo, C.; Yang, R. Vision-based method for acupoint location of lumbar and back meridian robot in traditional Chinese medicine. Mod. Chin. Med. 2023, 43, 24–30. [Google Scholar]
Tang, R.Y.; Peng, A.H.; Li, W. Acupoint patch location method for massage robot based on YOLOv5 and image processing. Mech. Des. Manuf. 2025, 290, 281–285. [Google Scholar] [CrossRef]
Zhang, T.T.; Yang, H.Y.; Lin, Y. Traditional Chinese medicine facial acupoint detection framework integrating representation learning. J. Univ. Electron. Sci. Technol. China 2023, 52, 175–181. [Google Scholar]
Wei, Y.; Ma, X.Y.; Gao, Z.Y. Research on human acupoint recognition based on convolutional neural network. Chin. Med. Inf. 2024, 41, 39–43. [Google Scholar]
Ji, X.; Zhou, L. Location of acupuncture points based on graph convolution and 3D deep learning in virtual humans. Comput. Animat. Virtual Worlds 2023, 34, e2159. [Google Scholar] [CrossRef]
Zhang, P.; Zhao, X.; Dong, L.; Lei, W.; Zhang, W.; Lin, Z. A framework for detecting fighting behavior based on key points of human skeletal posture. Comput. Vis. Image Underst. 2024, 248, 104123. [Google Scholar] [CrossRef]
Fu, Y.; Gao, S.H. Research on improved YOLOv8s-Pose lightweight model for multi-person pose estimation. Comput. Sci. Explor. 2025, 19, 682. [Google Scholar]
Wang, X.P.; Shi, H. Improved YOLOv8 fall detection algorithm integrated with keypoints. J. Xidian Univ. 2024, 51, 149–164. [Google Scholar]
Yuan, Z.; Shao, P.; Li, J.; Wang, Y.; Zhu, Z.; Qiu, W.; Chen, B.; Tang, Y.; Han, A. YOLOv8-ACU: Improved YOLOv8-pose for facial acupoint detection. Front. Neurorobotics 2024, 13, 55857. [Google Scholar]
Chen, C.; Lu, P.; Wang, S.; Zhu, Z.; Qiu, W.; Chen, B.; Tang, Y.; Han, A. Posenet based acupoint recognition of blind massage robot. In Proceedings of the 2020 5th International Conference on Computer and Communication Systems; IEEE: New York, NY, USA, 2020; pp. 251–255. [Google Scholar]
Liu, Y.B.; Qin, J.H.; Zeng, G.F. Back acupoint location method based on prior information and deep learning. Int. J. Numer. Methods Biomed. Eng. 2023, 39, e3776. [Google Scholar] [CrossRef]
Wang, F.; Wang, G.; Lu, B. YOLOv8-PoseBoost: Advancements in multimodal robot pose keypoint detection. Electronics 2024, 13, 1046. [Google Scholar]
GB/T 40997-2021; Names and Positioning of Extra-Meridian Qi Points. Standards Press of China: Beijing, China, 2021.
Burden, R.L.; Faires, J.D. Numerical Analysis, 10th ed.; Chapter 3: Interpolation and Polynomial Approximation; Chapter 6: Bézier Curves; Cengage Learning: Boston, MA, USA, 2016. [Google Scholar]
Zhang, Z.; Cao, Y.; Zhu, W.; Du, S.; Liu, C. An Analysis of the Current Research Status on “Lingshu·Bone Measurement”. Shanghai J. Acupunct. Moxibustion 2016, 35, 98–100. [Google Scholar]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
GB/T 12346-2006; Nomenclature and Location of Acupoints [S]. Standards Press of China: Beijing, China, 2006.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2018; pp. 7794–7803. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A simple, parameter-free attention module for convolutional neural networks. In International Conference on Machine Learning (ICML); PMLR: Cambridge, MA, USA, 2021; pp. 11863–11874. [Google Scholar]
Li, M.; Zhao, L.; Zhou, D.; Nie, R.; Liu, Y.; Wei, Y. AEMS: An attention enhancement network of modules stacking for low-light image enhancement. Vis. Comput. 2022, 38, 4203–4219. [Google Scholar]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Godson, D.R.; Wardle, J.L. Accuracy and precision in acupuncture point location: A critical systematic review. J. Acupunct. Meridian Stud. 2019, 12, 52–66. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
National Health Commission of China. Measures for Ethical Review of Life Sciences and Medical Research Involving Human Subjects; National Health Commission of China: Beijing, China, 2023. [Google Scholar]

Figure 1. Hua Tuo Jia Ji Point marking procedure.

Figure 2. The three-channel comparison diagram.

Figure 3. Positioning curve feature enhancement.

Figure 4. Feature point extraction.

Figure 5. Comparative diagram of human spine curve fitting using different methods.

Figure 6. Comparative diagram of human spine curve fitting using Bézier function.

Figure 7. Human shoulder protrusion finding process.

Figure 8. Acupoint inference diagram.

Figure 9. Error analysis diagram of acupoint identification.

Figure 10. The distribution of errors for each acupoint.

Figure 11. Back dataset sample.

Figure 12. Labelme annotation interface diagram.

Figure 13. Dataset annotation results.

Figure 14. The schematic diagram of the network structure of the CBS module.

Figure 15. The network architecture of the C2f module.

Figure 16. The network structure of the SPPF module.

Figure 17. Network structure of acupoint keypoint detection model.

Figure 18. The network structure of the non-local attention mechanism.

Figure 19. Schematic diagram of keypoint loss.

Figure 20. Bounding box of keypoint regression.

Figure 21. The comparison chart of positioning loss results before and after it was improved.

Figure 22. Schematic diagram of acupoint reasoning mechanism.

Figure 23. The system composition of the in situ projection.

Figure 24. Mapping between projection distance and image size.

Figure 25. Schematic diagram of in situ projection.

Figure 26. Four-point proportional analysis diagram.

Figure 27. Overall frame diagram of in situ projection system.

Figure 28. Projection interface display.

Figure 29. Comparison of ablation experiments.

Figure 30. The actual detection effect of the model.

Figure 31. Projection effect of acupoint recognition.

Table 1. Comparison of different attention mechanisms.

Attention Mechanism	Precision (%)	Recall (%)	mAP50 (%)
CBAM	87.6	83.2	84.5
SimAM	85.4	85.5	83.8
AE	85.9	85.4	83.2
Non-local	88.1	85.7	85.8

Table 2. Reasoning relationship of acupoint coordinate points.

Point Number	Pixel Coordinate
K₁₁	(X₁₁, Y₁₁)
K₁₂	(X₁₂, Y₁₂)
K₁₃	(X₁₃, Y₁₃)
K₁₄	(X₁₄, Y₁₄)
K₁₅	(X₁₅, Y₁₅)
K₁₆	(X₁₆, Y₁₆)
K₁₇	(X₁₇, Y₁₇)
K₅₀	(X₁₇ + S, Y₁₇)
K₅₁	(X₁₇ − S, Y₁₇)
K₅₂	(X₁₇ + 2S, Y₁₇)
K₅₃	(X₁₇ − 2S, Y₁₇)
K₅₄	(X₁₃, Y₁₃ + S)
K₅₅	(X₁₄, Y₁₄ + S)
K₅₆	(X₁₅, Y₁₅ + S)
K₅₇	(X₁₆, Y₁₆ + S)
K₅₈	(X₁₅, Y₁₅ + 2S)
K₅₉	(X₁₆, Y₁₆ + 2S)
K₆₀	(X₁₁, Y₁₁ + S)
K₆₁	(X₁₂, Y₁₂ + S)

Table 3. Editorial Board of Electromagnetic Science.

Parameter	Value	Parameter	Value
Epoch	300	workers	1
Batch	4	close_mosaic	10
Image size	640	Ir0	0.01
Patience	100	Irf	0.01

Table 4. Comparison of parameters of ablation experimental models.

Model	GFLOPS	Params/M	FPS
YOLOv8-Pose	11.3	3.7	92
YOLOv8-Pose + A	12.1	3.9	91
YOLOv8-Pose + B	11.4	3.7	91
A + B	12.4	4.0	90

Table 5. Parameters of ablation experimental models.

Model	Params/M	mAP50
YOLOv8-Pose	3.7	92.4
HRNet	28.5	92.1
RTMPose	11.5	89.2
OpenPose	346	93.1
Improved model	4.0	95.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Z.; Song, L.; Li, S.; Xue, R.; Li, P. Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning. Big Data Cogn. Comput. 2026, 10, 204. https://doi.org/10.3390/bdcc10070204

AMA Style

Zhao Z, Song L, Li S, Xue R, Li P. Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning. Big Data and Cognitive Computing. 2026; 10(7):204. https://doi.org/10.3390/bdcc10070204

Chicago/Turabian Style

Zhao, Zhike, Linman Song, Songying Li, Ruihao Xue, and Peng Li. 2026. "Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning" Big Data and Cognitive Computing 10, no. 7: 204. https://doi.org/10.3390/bdcc10070204

APA Style

Zhao, Z., Song, L., Li, S., Xue, R., & Li, P. (2026). Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning. Big Data and Cognitive Computing, 10(7), 204. https://doi.org/10.3390/bdcc10070204

Article Menu

Recognition of Acupoints on Human Back Based on Machine Vision and Deep Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Back Acupoint Localization Based on Image Processing

2.1.1. Method for Extracting the Localization Curve of the Human Spinal Column

2.1.2. Method for Extracting Acromion Landmark Points on the Human Body

2.1.3. Coordinate Inference Model for Acupoint Locations

2.2. Acupoint Keypoint Detection Based on Improved YOLOv8-Pose

2.2.1. Dataset Construction

2.2.2. Improved YOLOv8-Pose Model

2.2.3. Improved Non-Local Attention Mechanism

2.2.4. Optimization Loss Function

2.2.5. Acupoint Inference Mechanism Design

2.3. Construction of In Situ Projection System

2.3.1. Calibration of Monocular Cameras

2.3.2. Analysis of Projection Proportion Relation

2.3.3. Dynamic Correction of Projection Screen Based on OpenCV

3. Results

3.1. Training Environment

3.2. Evaluation Index

3.3. Ablation Experiment

3.4. Comparison of Mainstream Keypoint Detection Algorithms

3.5. Result Visualization

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI