In contrast to these studies in which segmentation and removal of the head and tail were performed using manual or semi-automated algorithms in MATLAB, the present study implemented fully automatic dorsal area segmentation using the YOLOv11 detection and segmentation model, previously trained to recognize and isolate the body region of interest. This approach enhances process efficiency and scalability while maintaining standardization of the segmented area, which likewise excludes the head and tail in accordance with the methodological guidelines established in earlier works.
3.2. Results of Applying the Trained Model to New Data for Weight Prediction
To evaluate the practical applicability of the trained segmentation model, it was applied to an independent subset comprising 25,994 images derived from the same overall monitoring period. These images were not used during the training, validation, or testing stages, ensuring an independent evaluation of the model within the observed conditions. This dataset captures variability in posture, interaction, and body size. The objective was to detect and segment pigs, verifying the presence of at least one valid instance per image, using a confidence threshold of 90%.
It is important to note that the model was trained using images in which pigs were in a standing posture and without significant occlusion, as accurate dorsal area extraction requires a clear and unobstructed view of the animal. Therefore, postures such as lying animals or strong overlap between individuals fall outside the intended application domain of the proposed approach and were not explicitly evaluated.
Figure 6 presents, for each capture date, the total number of generated masks and the number of images containing at least one valid detection. The model maintained consistent performance throughout the analyzed period, with a high number of segmented instances, particularly on days with a larger volume of images, such as 29 December and 31 December. Although natural variations across dates were observed—associated with differences in image quantity, animal density, and acquisition conditions—a substantial proportion of images contained at least one valid detection.
These results confirm the robustness and stability of the model under real-world, large-scale inference conditions, reinforcing its suitability for continuous automated monitoring applications, such as morphometric feature extraction and subsequent pig body weight prediction.
During the first few days of the evaluation period, a lower number of images with valid detections were observed, defined as those that presented at least one segmented mask with ≥90% confidence. This behavior can be attributed to the fact that, in the initial stages of growth, piglets tend to remain more grouped together, with frequent social interactions and playful behaviors. This body proximity makes individual segmentation of the animals difficult, reducing the model’s ability to accurately delineate the backs for classification purposes and extraction of morphological characteristics. In addition, the high confidence threshold adopted in this study contributes to filtering out detections under these conditions.
Similarly, in the last dates of the analyzed period, a new decline in the number of valid detections is observed. This behavior is possibly associated with the larger physical size of the pigs at this stage, coupled with an increase in sedentary behaviors, such as prolonged periods of rest in lateral or sternal recumbency. Such postures reduce the visibility of the dorsal region and individual markings, compromising the effective segmentation of instances, especially under the strict confidence threshold applied.
On the other hand, the largest number of successful detections was concentrated between the end of December and the beginning of January. This interval coincides with an intermediate phase of animal growth, in which body size already allows for clear identification of dorsal markings, without prolonged resting behavior yet being predominant. Thus, this period constitutes a particularly favorable time window for the application of automatic segmentation, resulting in a high success rate and a greater number of masks generated. These results highlight the potential of the proposed model for applications in real zootechnical environments and reinforce the importance of temporal analyses to guide the implementation and optimization of the system throughout the different phases of the production cycle.
Figure 7 illustrates visual examples of the segmentation performed by the model in different phases of pig development. The columns represent three distinct periods: beginning (animals still small and very active), intermediate (growth and greater morphological definition), and end (large animals, with reduced movements). It is noted that the model’s performance remains adequate throughout the phases, despite the observed morphological and behavioral differences.
Beyond the visualization of individual pig segmentations and classifications, the masks generated by the model are fundamental for quantifying the animals’ body areas. These masks were used to compute the number of segmented white pixels per image through a Python function that converts the mask into binary format and performs pixel counting. Based on these counts, the relative dorsal area per day was estimated, enabling the application of a body mass prediction equation.
Similar findings were reported by [
18], who emphasized the importance of generating binary masks after segmentation to accurately isolate the pig body, eliminate background noise, and facilitate the calculation of morphometric parameters. However, unlike the method proposed by [
18], the approach presented in this study employs modern neural networks to perform segmentation automatically, without additional manual steps, making the process more robust and scalable.
Figure 8 illustrates examples of segmented images at different growth stages (small, medium, and large), along with their corresponding binary masks, highlighting the model’s ability to accurately identify the dorsal regions of the pigs.
Ref. [
8] developed the PigMS R-CNN model to improve pig segmentation in group-housing environments, specifically addressing situations involving overlap between individuals. Although the focus of [
8] was on detection accuracy and the separation of adjacent pigs, the present approach leverages the resulting segmentations for body area computation and subsequent body mass prediction, thereby extending the practical application of instance segmentation techniques.
Figure 9 presents the evolution of the mean segmented area (in pixels) over the monitoring period, considering the ten classes previously defined based on the animals’ body characteristics. For each day, the mean segmented area was calculated for all individuals belonging to each class, using the binary masks automatically generated by the YOLOv11 model. This approach enables continuous and non-invasive monitoring of relative body growth, providing relevant information on animal development throughout the production cycle.
A progressive increase in the mean segmented area, in pixels, can be observed over the monitoring days for all analyzed classes, directly reflecting the body growth of the pigs throughout the production cycle. It is important to emphasize that classes numbered from 1 to 10 correspond exclusively to identifiers assigned to individuals during the segmentation process and therefore do not represent an increasing order of body weight or size. Nevertheless, the consistent upward trend in the segmented areas over time indicates that the model was able to robustly track the animals’ morphological development.
The small fluctuations observed between consecutive days may be attributed to factors inherent to animal behavior and image acquisition conditions, such as intense movement, overlapping between individuals, and unfavorable positioning during image capture, all of which directly affect segmentation quality. These aspects highlight the importance of continuous analyses and time-series evaluation to properly assess the model’s performance throughout the production cycle.
It is worth noting that on 17 January, a reduction in the mean segmented area was recorded compared to adjacent days, particularly for some specific classes. This behavior may be associated with a lower number of available images during this period, increased clustering and interaction among animals, or localized detection and segmentation failures for certain individuals. Since the classes represent only identifiers and not a weight-based hierarchy, it is plausible that occasional segmentation inconsistencies temporarily influenced the estimated average pixel area on that date.
3.3. Results of the Prediction Models
Figure 10 shows the evolution of the average weight of the pigs over the days, comparing the actual values obtained experimentally, the predictions of the Linear Regression model and the MLP Neural Network, as well as the reference curve used to monitor growth. It can be observed that both models consistently followed the trend of daily weight gain, with small variations between them. The standard deviation bars indicate the variability of the predictions for each day, while the reference line serves as an additional parameter to validate the expected behavior of the animals during the analyzed period.
The performance metrics for these models are shown in
Table 3.
Previous studies, such as that of [
3], investigated the estimation of pig body weight from 2D images using fully connected neural networks, relying on the manual extraction of geometric features, such as curvature and deviation, to compensate for postural variations in the animals. Although the authors achieved a coefficient of determination of R
2 = 0.79, the method exhibited greater variability in the errors and required significant manual intervention during the segmentation process. In contrast, in the present study, the direct use of the segmented area as a predictive variable, applied to both Linear Regression and MLP models, resulted in superior performance, with R
2 values of 0.96 and 0.95, respectively, as well as reduced average errors (RMSE ≤ 1.63 kg and MAE ≤ 1.25 kg), as shown in
Table 3. These findings indicate that the direct use of segmented masks provides a more stable morphometric representation and is less sensitive to noise associated with animal posture and movement.
The temporal analysis of the mean daily weight predictions further reinforces this behavior, as both models consistently track the growth trend observed in the mean real weight throughout the experimental period. It is also observed that Linear Regression achieved slightly superior performance in terms of R2 and absolute errors, whereas the MLP exhibited lower relative dispersion of predictions, reflected by a smaller percentage standard deviation (Std = 11.88%). This result suggests that, although the linear model more directly captures the global relationship between segmented area and body weight, the MLP may offer greater stability in the presence of intra-class variability observed in the images.
Additionally, Ref. [
17] evaluated different machine learning algorithms for estimating pig body weight from 2D images and identified XGBoost as the best-performing model, achieving an MAE of 3.93 kg. Despite employing a broader set of predictive variables, the models proposed in the present study—based exclusively on the segmented area—achieved mean errors close to 1 kg, even under real farming conditions characterized by high behavioral and environmental variability. These findings indicate that, despite the structural simplicity of the adopted models, morphometric information derived directly from segmented masks constitutes a highly efficient, robust, and competitive predictor for estimating pig body weight.
Although Multilayer Perceptron (MLP) networks are capable of modeling complex nonlinear relationships, the results obtained in this study indicate the similar—and in some cases slightly superior—performance of Linear Regression compared to MLP. This behavior can be explained by the fact that the relationship between segmented dorsal area and pig body weight exhibited a predominantly linear trend throughout the analyzed period.
In scenarios where there is a strong linear correlation between the predictor variable and the response variable, linear models tend to achieve competitive or superior performance compared to more complex models, particularly when the dataset is limited or structural variability is controlled [
19,
20,
21]. In such situations, the use of more complex models may not yield significant gains in accuracy and may further increase the risk of overfitting.
Thus, the observed results corroborate findings in the literature indicating that, when the underlying relationship between variables is approximately linear, linear methods provide good generalization capability, greater interpretability, and lower computational cost, making them particularly suitable for practical applications in animal monitoring systems based on computer vision [
22].
The next stage of the analysis aims to individually evaluate the performance of the Linear Regression and MLP Neural Network models for each of the ten identified pig classes.
Figure 11 presents the relationship between the mean observed weight and the predicted weights for each class. The plots allow visualization of the models’ ability to track weight variations specific to each individual, highlighting possible discrepancies within certain weight ranges and demonstrating the level of fit achieved by each approach.
Table 4 presents the performance of the Linear Regression and Multilayer Perceptron (MLP) Artificial Neural Network models in predicting body weight for each of the 10 classes analyzed individually. Overall, both models achieved high coefficients of determination (R
2), ranging from 0.91 to 0.98, indicating strong agreement between the observed weight values and the estimated values. The associated errors, expressed by RMSE and MAE, remained low for most classes, demonstrating the good predictive capability of the models even when evaluated on an individual basis.
In particular, Classes 7 and 9 exhibited the lowest RMSE and MAE values in both models, reflecting greater estimation accuracy. In contrast, Class 8 showed inferior performance, with the lowest R2 values and the highest prediction errors. Although this class presents a comparable number of samples, its behavior suggests increased estimation difficulty associated with greater variability in the relationship between dorsal area and body weight. Based on the temporal analysis of dorsal area (in pixels), Class 8 tends to exhibit consistently higher and more dispersed area values over time compared to other classes, as reflected in the broader distribution of pixel values throughout the monitoring period, without a proportional improvement in prediction accuracy. This pattern indicates increased variability in the area–weight relationship, particularly at higher weight ranges, where prediction errors become more pronounced. As a result, the mapping between input features and target values is less consistent for this class, rather than indicating a limitation of the model itself. Furthermore, segmentation metrics remained consistently high across all classes, and no direct correspondence was observed between segmentation performance and prediction error, suggesting that the observed variability is more strongly associated with biological and morphometric differences than with segmentation quality. Classes 5 and 10 also presented relatively higher errors compared to the others, although still within an acceptable range, further reinforcing the overall robustness of the proposed approach.
The results obtained for Class 8 also highlight an important aspect of the proposed approach related to the use of 2D dorsal area as a predictor of body weight. Although dorsal area proved to be a strong proxy for animal size and demonstrated high overall performance in this study, under specific conditions it may not fully capture variations associated with body volume, which is more directly related to mass. As a result, differences in body conformation, thickness, and mass distribution may introduce additional variability in the relationship between projected area and actual weight. This behavior becomes more evident in certain individuals or growth stages, as observed for Class 8, where greater dispersion in prediction errors was identified. Furthermore, no direct correspondence was observed between segmentation performance and prediction error across classes, as segmentation metrics remained consistently high, indicating that the observed variability is more strongly associated with biological and morphometric differences rather than segmentation quality. Nevertheless, the overall results indicate that the approach is robust under the evaluated conditions, and the integration of complementary information, such as depth data, may represent a promising direction for further refinement.
Following the presentation of
Table 4, which details the performance of the models for each class individually,
Figure 12 illustrates the overall relationship between the average segmented area of the pigs (in pixels) and the corresponding average weight obtained from the model predictions. A clear linear trend is observed, showing that the increase in area is directly associated with the weight gain of the animals. The prediction curves obtained by Linear Regression and the MLP Neural Network consistently follow the trend of the real data, with small discrepancies mainly at the upper weight extremes, possibly caused by a smaller number of samples or model saturation. This result reinforces the robustness of the segmented area as a predictor variable for weight, indicating that, even in a global model without separation by classes, it is possible to obtain good accuracy in estimating body weight from images.
Recent advances in computer vision for pig weight estimation show a transition from two-dimensional methods based on image segmentation to three-dimensional systems that extract biometric measurements and predict weight with greater accuracy. Studies such as those by [
2,
23] present 3D approaches using depth cameras and automated computer vision, achieving high R
2 values and low error margins by incorporating variables such as animal volume, area, and shape descriptors. More recent hybrid methods, such as that proposed by [
24], explore the combination of 2D segmentation with depth sensors to efficiently generate 3D point clouds, reinforcing the trend toward more comprehensive three-dimensional analyses.
The results obtained in this study demonstrate that the proposed 2D approach, combining instance segmentation with Linear Regression and MLP models, achieves performance levels consistent with recent findings in the literature. Previous studies have shown that models based solely on RGB images can provide reliable weight estimation even without the use of 3D sensors [
17]. In this context, the present work reinforces that well-structured 2D strategies remain competitive, particularly due to their lower cost and easier implementation in commercial production systems. At the same time, such approaches can serve as a foundation for future hybrid or three-dimensional solutions, following current technological trends in precision livestock farming.
Despite the limited number of animals evaluated in this study, continuous monitoring over time enabled the collection of a large and representative dataset, capturing variations in growth, posture, and interaction among individuals. These results highlight the importance of data diversity, indicating that the variability and quality of the input data are key factors influencing model performance in real production environments. It is important to note that the proposed model was developed and validated primarily using images of pigs in a standing posture, where the dorsal region is clearly visible and suitable for morphometric extraction. As animals grow, behavioral changes may reduce the proportion of standing postures, which can limit the direct applicability of the method under certain conditions. From a practical perspective, this aspect represents a relevant constraint for deployment in commercial systems, where animal posture is not controlled. Therefore, future studies should consider including a larger and more diverse population, as well as exploring the impact of different postures (e.g., lying or overlapping animals) on weight prediction performance. Additionally, the integration of complementary features beyond dorsal area, or the development of posture-specific models, may further improve robustness across varying production scenarios.
While the proposed approach demonstrated high accuracy in both segmentation and weight prediction tasks, computational performance aspects such as inference speed (FPS), memory consumption, and model size were not evaluated, as they were beyond the scope of this study. These factors are critical for real-time deployment in embedded and resource-constrained systems, particularly in practical precision livestock farming scenarios. From an application perspective, evaluating and optimizing these parameters is essential to ensure scalability and operational feasibility. Therefore, future work should focus on benchmarking the model under such conditions and optimizing its architecture to achieve efficient real-time performance. Additionally, a comparative evaluation of different YOLOv11 model scales was not conducted and remains an important direction for future research, aiming to balance predictive accuracy and computational efficiency.