1. Introduction
Artificial intelligence (AI), particularly deep learning techniques, has led to significant advancements in numerous fields, including veterinary medicine, human medicine, and forensic sciences in recent years. Deep learning architectures such as convolutional neural networks (CNNs) have been successfully applied in medical imaging with high accuracy, contributing to fracture detection, organ segmentation, and tumor classification [
1,
2]. In veterinary medicine, AI-based systems have begun to be used for the analysis of radiographic images, enabling fracture diagnosis, abnormality detection, and early identification of certain internal diseases [
3,
4].
Recent developments in deep learning have led to significant improvements in bone detection and classification tasks, particularly through the use of YOLO (You Only Look Once)-based object detection algorithms. Tariq and Choi [
5] demonstrated that an enhanced YOLO11 architecture could detect and localize wrist fractures in X-ray images with high precision, underscoring the diagnostic power of real-time convolutional networks in skeletal image analysis.
Research on deep learning has enabled efficient segmentation of skeletal structures in veterinary imaging. For instance, Kvam et al. [
6] demonstrated the feasibility of using convolutional neural networks (CNNs) to automate the identification and segmentation of pig skeletons from CT scans, highlighting the growing potential of AI in animal anatomy research.
Recent advances in deep learning have enabled accurate assessments of skeletal development and trauma timing, even in non-human species. For example, Ergün and Güney [
7] demonstrated the use of convolutional neural networks for classifying bone maturity and fracture age in canine long bones using radiographic images.
Multi-view deep learning models have shown promise in improving the accuracy of radiographic interpretation in veterinary medicine in recent years. For instance, Dourson et al. [
8] developed the VET-DINO system, which demonstrated enhanced classification performance by leveraging multi-angle images in the analysis of veterinary radiographs.
However, the use of AI in the classification and automated identification of animal bones remains limited. Most existing studies have focused on human medicine, often addressing deformation analysis of human bones [
9].
In veterinary anatomical education, osteology holds critical importance for enabling students to learn bone structures and species-specific anatomical features. However, limitations in hands-on laboratory instruction and dependence on instructor availability may hinder the learning process. It has been reported that receiving real-time feedback can improve retention in learning by up to 40% [
10]. AI-supported mobile applications address this need by promoting individualized learning and facilitating digital transformation in educational environments [
11,
12].
The proliferation of mobile applications in anatomy education has opened new avenues for student engagement; however, the pedagogical rigor and scientific credibility of these tools vary widely. Rivera García et al. [
13] emphasized that while many anatomy apps are popular, few are developed within academic contexts or validated through structured evaluation methods.
The growing use of mobile applications in human anatomy education has prompted critical evaluations of their pedagogical effectiveness. Rivera García et al. [
13] emphasize that while such apps enhance accessibility, their anatomical accuracy and scientific validation often remain questionable.
Recent large-scale studies have shown that screen-based 3D and augmented reality tools can significantly improve student engagement and learning experience in anatomy education [
14]. These tools provide spatial understanding and interactivity, which are especially useful in visual-heavy disciplines like anatomy.
Latest developments in computer vision have led to the development of cloud-based web applications such as ShinyAnimalCV, which facilitate object detection, segmentation, and 3D visualization of animal data [
15]. This tool integrates pre-trained vision models and user-friendly interfaces to democratize access to image analysis methods for educators and students alike.
Interactive and augmented reality (AR) tools have emerged as powerful resources in veterinary anatomy education, offering immersive and intuitive experiences that surpass traditional teaching methods. Christ et al. [
16] demonstrated this potential by developing a mobile AR application for canine head anatomy, highlighting the feasibility of extending such digital workflows into veterinary curricula.
Emerging technologies such as augmented reality (AR) have been shown to enrich anatomical education by offering interactive and spatially intuitive experiences. Jiang et al. [
17] developed an AR-based canine skull model that effectively supported veterinary students’ learning without compromising comprehension when compared to traditional methods.
Conversational AI tools such as ChatGPT have opened new horizons for interactive learning in veterinary anatomy, providing instant explanatory feedback to students [
18]. These chatbots have been shown to enhance anatomical knowledge retention while highlighting the continuing importance of hands-on dissection practices.
In disciplines such as forensic science, archaeology, and crime scene investigation, the rapid and accurate determination of whether a bone belongs to a human or an animal is of paramount importance. However, the absence of experts in field situations can delay the process and increase the likelihood of errors [
19,
20]. In such cases, AI-based systems may serve as supportive tools that augment expert decision-making without replacing human specialists.
The aim of this study was to train an artificial intelligence system using image processing methods to recognize the scapula, humerus, and femur of cattle, horses, and dogs, and to evaluate the system’s performance in identifying these bones through a custom-developed application.
2. Materials and Methods
2.1. Study Design and Data Collection
In this study, scapula, humerus, and femur belonging to cattle (Bos taurus), horses (Equus caballus), and dogs (Canis familiaris) were utilized. The bone images were obtained from specimens available in the Anatomy Laboratory of the Faculty of Veterinary Medicine at Erciyes University.
Two stages were defined during the preparation of the dataset. In the first stage, the YOLO model was trained to distinguish bone structures from other objects using a limited number of images containing bones from various species, including those not present in our main dataset (
Table 1). As a result, there was no need for additional cropping when creating the classification dataset.
The trained YOLO model was used to scan bone images placed sequentially on a table from various angles, and the cropped images were recorded under the corresponding animal class (
Figure 1).
A total of 26,148 bone images were collected. Of these, 24,700 images were used for training and testing the model, while the remaining images were reserved for external validation. A systematic data collection protocol was implemented to ensure balanced representation across all classes. Regardless of the number of available physical bone specimens, exactly 2744 images were acquired for each bone–species combination (
Table 2). For categories with fewer bone samples (e.g., dog humerus: 38 bones), multiple images were captured from different angles, lighting conditions, and positions to reach the target number. For categories with more bone samples (e.g., cattle scapula: 62 bones), fewer images per bone were taken while maintaining diversity. This approach ensured equal representation across all nine classes and prevented model bias toward categories with a higher number of physical specimens.
All images were captured at 720p resolution and were subsequently cleaned to remove outliers. Care was taken to balance the dataset according to bone type and species. To enhance model performance and generalizability, data augmentation techniques such as brightness adjustment, rotation, cropping, and horizontal flipping were applied.
2.2. Image Processing and Annotation
The bone images were initially processed using the YOLOv5 algorithm, which automatically detected the relevant regions of interest. For the annotation process, the Labelme platform was utilized (
Figure 2), and distinct anatomical features of each bone (e.g.,
processus hamatus) were manually marked. The annotated data were exported in COCO JSON (Common Objects in Context) format for further use.
2.3. Deep Learning Architecture and Model Training
The dataset was divided into three subsets: training (85%), testing (10%), and validation (5%). Model training was conducted using the Python (version 3.12.0) programming language and the PyTorch (version 2.8.0) framework. In the initial stage, bone detection was performed using the YOLO algorithm. Subsequently, bone name and species classification were carried out using CNN architectures. The training process lasted approximately seven hours in total.
2.3.1. Model Selection and Rationale
In this study, a two-stage approach was adopted for bone detection and classification. In the first stage, the YOLOv5 algorithm was employed for object detection. In the second stage, species and bone type classification were performed using the ResNet34, SmallCNN, and AlexNet architectures, which were comparatively evaluated.
YOLO Architecture Configuration
YOLO algorithms are known for performing both object localization and classification tasks with high speed and accuracy using a single-stage approach.
In this study, YOLOv5s (small) variant was selected for its optimal balance between detection accuracy and computational efficiency, particularly suitable for mobile deployment requirements. Two main factors influenced the selection of the YOLOv5 model. First, YOLOv5 integrates the Cross-Stage Partial Network (CSPNet) structure into Darknet, establishing CSPDarknet as its backbone architecture [
21]. CSPNet addresses the problem of duplicated gradient information encountered in large-scale backbone architectures by incorporating gradient variations in feature maps. This reduces model parameters and floating-point operations per second (FLOPS) while maintaining both inference speed and accuracy, effectively minimizing model size. Speed and accuracy are crucial in the detection of bone structures.
Secondly, to enhance information flow, YOLOv5 employs a Path Aggregation Network (PANet) in its neck section [
22]. PANet improves low-level feature propagation using a bottom-up approach and introduces a new Feature Pyramid Network (FPN) topology. Additionally, an adaptive feature pooling method connects the feature grid across all feature levels, allowing for meaningful information obtained from each feature level to be transferred to subsequent subnetworks. Furthermore, the YOLO layer, which forms the head of YOLOv5, generates feature maps at three different scales to enable multi-scale prediction, enabling the model to process small, medium, and large-sized objects effectively (
Figure 3).
CNN Classification Architecture
Resnet34
ResNet34 is a 34-layer deep convolutional neural network architecture developed by Microsoft Research [
15]. This architecture utilizes residual connections to address the vanishing gradient problem commonly encountered in deep networks. The key factors influencing the selection of ResNet34 in this study are as follows:
The Residual Learning approach focuses on learning residual functions instead of directly modeling complex functions, as is the case with conventional CNNs. This approach is based on the principle that optimizing identity mappings is easier than optimizing complex residual functions. Through skip connections, information can be transferred directly from lower to higher layers, facilitating better gradient propagation.
The deep structure of ResNet34 makes it a strong candidate for recognizing the complex features of bone morphology. It enables multi-level feature extraction, which is essential for capturing the fine details of anatomical structures (
Figure 4).
SmallCNN is a lightweight and efficient convolutional neural network architecture specifically designed for this study. This architecture was developed to enable fast inference on mobile devices and to achieve high performance under limited computational resources. SmallCNN is based on the principle of achieving maximum performance with a minimal number of parameters. This approach helps prevent unnecessary complexity, particularly in relatively simple visual recognition tasks such as bone classification, thereby reducing the risk of overfitting and enhancing generalization capability. The developed architecture comprised a total of 387,000 parameters (
Figure 5).
AlexNet is a convolutional neural network architecture developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012 [
25]. This architecture won the first place in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 with a top-5 error rate of 15.3%, marking the beginning of the deep learning revolution in the field of computer vision. The primary reason for selecting AlexNet as a comparative architecture in this study is its pioneering role in integrating innovative techniques that have laid the foundation for modern CNN architectures (
Figure 6).
The use of the ReLU activation function instead of sigmoid and tanh functions reduced the vanishing gradient problem, enabling faster training. The incorporation of the dropout technique addressed the overfitting problem, particularly enhancing the model’s generalization performance on limited datasets. The AlexNet architecture consists of eight layers: five convolutional layers and three fully connected layers. With approximately 60 million parameters, it provides sufficient learning capacity to capture distinctive features of bone structures without introducing excessive complexity.
2.3.2. Training Configuration and Hyperparameters
During the training of the YOLOv5s model, the input image size was set to 640 × 640 pixels. Although the training was initially scheduled for 150 epochs, an early stopping mechanism was triggered at the 65th epoch due to the stabilization of performance metrics and to prevent the risk of overfitting. The Adam optimizer was employed with an initial learning rate of 0.01 and a momentum value of 0.937. To enhance the accuracy of bone detection, several data augmentation techniques were applied: mosaic augmentation (100%) combined multiple bone samples into a single image to improve the model’s generalization capability; the mix-up technique (15%) mitigated overfitting by smoothing transitions between classes; and manipulations in the HSV color space (H: 0.015, S: 0.7, V: 0.4) improved robustness against varying lighting conditions. All hyperparameters are detailed in
Table 3.
Model training was conducted using standardized hyperparameters to ensure a fair comparison across all CNN architectures. All models were trained using the Adam optimizer with a learning rate of 0.001, where the β1 and β2 parameters were set to 0.9 and 0.999, respectively. To optimize memory usage, the batch size was set to 32, and training was carried out for 100 epochs. Early stopping with a patience value of 15 was applied to prevent overfitting.
To ensure L2 regularization across all models, a weight decay value of 1e-4 was applied. The learning rate was systematically reduced using the StepLR scheduler with a step size of 30 epochs and a gamma factor of 0.1. To address potential class imbalance issues within the dataset, the cross-entropy loss function with balanced class weighting was utilized.
Data augmentation strategies were applied consistently across all models. Brightness adjustment (±20%), random rotation (±15 degrees), and horizontal flipping with a 50% probability were performed. Images were initially resized to 256 × 256 pixels, followed by random cropping to 224 × 224 pixels. ImageNet normalization statistics were applied to the RGB channels, using mean values of [0.485, 0.456, 0.406] and standard deviation values of [0.229, 0.224, 0.225] (
Table 4).
The training was conducted on the Google Colab platform using an NVIDIA Tesla T4 GPU (16 GB VRAM), an Intel Xeon CPU, and 12 GB of system RAM. The implementation was developed using the PyTorch 1.12.0 framework with CUDA 11.6 support for cloud-based GPU acceleration.
2.4. Model Evaluation
The model’s performance was evaluated not only based on accuracy but also using precision, recall, and F1-score metrics. These metrics provided more reliable insights, particularly in the context of imbalanced datasets. To account for variations in sample size among species, class-weighted F1-scores were calculated. Additionally, the model’s accuracy was assessed using an independent test dataset.
2.5. Mobil Application
The trained system was adapted for student use in a laboratory setting. Students access the webpage using the developed link and take a photo of the bone using the “Take Photo” button within the app. They then click the “Analyze” button, and the system analyzes the bone and reports the results below. Students can access the details of the bone within the app if they wish.
2.6. Student Surveys
The surveys included informed voluntary consent forms, questions related to the study topic, a brief section providing information about the research, and participants’ feedback on learning outcomes after using the application.
The 45 students who participated in the survey and application were actively taking the anatomy course. It was believed that having these students, who were new to the learning process, try the system, and comment on it would demonstrate its contribution to addressing learning difficulties. Additionally, the application and survey were administered to 105 more students, and the final data were generated accordingly. These remaining 105 students were upper-division students who had passed the anatomy course. This prevented any academic pressure or bias.
An anonymous survey was conducted with 150 students who used our mobile application. The purpose of the survey was to assess the students’ level of interest in such mobile applications and their willingness to use them over an extended period. Additionally, the survey aimed to evaluate the functionality and user-friendliness of the application. The survey included five Likert-scale questions. Students responded using a 5-point Likert-type scale, with 1 indicating “strongly disagree,” 2 indicating “disagree,” 3 indicating “neutral,” 4 indicating “agree,” and 5 indicating “strongly agree.”
2.7. Statistical Analysis
The collected data were analyzed using IBM SPSS Statistics software, version 28.0 (IBM Corp., Armonk, NY, USA), and the threshold for statistical significance was set at p < 0.05. Mean values and standard deviations (SDs) were reported for satisfaction survey data and demographic characteristics. The normality of data distribution was assessed using the Shapiro–Wilk test. For data that did not show a normal distribution, non-parametric tests were preferred. Satisfaction scores across five different academic levels (from first-year to fifth-year students) were compared using the Kruskal–Wallis H test. The Mann–Whitney U test was applied to compare mean satisfaction scores between gender groups. Effect size calculations were reported using Cohen’s d and eta-squared (η2) values. The internal consistency of the survey was evaluated using Cronbach’s α, and the suitability of the factor analytic model was tested using the Kaiser–Meyer–Olkin (KMO) measure and Bartlett’s test.
4. Discussion
The deep learning-based bone classification system developed in this study demonstrated high accuracy in identifying both bone type and the corresponding animal species. The achieved accuracy rate of 97.6% represents a superior performance compared to previous studies conducted on human bones [
2,
9]. In particular, the performance attained with the ResNet34 architecture exceeds the commonly reported accuracy range of 90–95% in the literature [
19].
The success of the current study in accurately identifying bone types and species aligns with previous research, such as that of Ergün and Güney [
7], who highlighted the diagnostic potential of AI in analyzing canine skeletal radiographs. Their work reinforces the notion that automated image-based systems can contribute meaningfully to both veterinary diagnostics and educational settings.
The advantage of incorporating multi-view inputs into AI architectures was also emphasized in the study by Dourson et al. [
8], where their model achieved improved accuracy in anatomical interpretation. This aligns with my findings, suggesting that models trained with diverse structural input data can offer more reliable species-level bone classification.
My study aligns with the findings of Kvam et al. [
16], as both emphasize the utility of deep learning for precise skeletal identification in non-human species. However, while their approach focused on CT data, our work expands the applicability of AI to photographic images and mobile-friendly platforms, offering broader usability in educational and field settings.
In my study, the implementation of YOLOv5 for detecting bone regions prior to classification builds on this principle, offering reliable and fast identification of anatomical structures from photographs. Similar to the success achieved by Tariq and Choi [
5] in clinical radiology, our findings confirm that modern YOLO-based models are highly suitable for veterinary osteological applications where accurate localization is essential.
Existing systems in the literature typically focus solely on human bone analysis and do not address interspecies comparative classification. Although projects such as OsteoID have reported high accuracy, these systems were usually tested on a limited number of species and did not provide publicly accessible datasets [
9]. In contrast, the use of a large and diverse dataset comprising bones from different species in the present study enhanced the model’s robustness in real-world applications, where a wide variety of samples may be encountered.
While this study focuses on AI-driven mobile bone classification rather than AR visualization, both approaches share the overarching goal of enhancing student engagement and spatial understanding in anatomy education. The success reported by Christ et al. [
16] underscores the value of technology-enhanced learning in veterinary settings, supporting the broader applicability of digital tools like
Smart Osteology.
While my current system does not yet integrate AR technology, it shares the same goal of enhancing anatomy education through accessible, student-centered digital tools. In line with Jiang et al. [
17], our approach also prioritizes learner engagement and real-world usability—especially by providing portable, offline functionality and intelligent anatomical feedback.
While this application does not utilize 3D or augmented reality technologies, it offers interactivity through real-time image analysis, speech-based queries, and dynamic feedback. These features align with the educational benefits highlighted by Barmaki et al. [
14], who emphasized that screen-based tools significantly improve student engagement and spatial understanding in anatomy learning.
The 98% satisfaction rate obtained from student surveys further highlights the educational potential of the system. Previous studies, such as those by Mayfield et al. [
12], have shown that mobile technologies can effectively support anatomical education. This study not only reinforces those findings but also demonstrates the potential of creating active learning environments that support individualized learning.
Echoing the insights of Choudhary et al. [
18], our system embraces AI-assisted feedback by delivering written and spoken anatomical explanations, yet it augments this with image-based bone identification to provide multimodal educational support. By balancing automated instruction with practical application, Smart Osteology addresses both the engagement benefits and limitations of virtual assistants noted by Choudhary et al.
This mobile application responds directly to the concerns raised in Rivera Garcia’s [
13] review by providing a scientifically grounded, academically developed system with demonstrable accuracy and user satisfaction. Unlike many commercially produced tools, our app was purpose-built for veterinary anatomical education and forensic support, bridging the gap between innovation and pedagogical reliability as advocated by Rivera García et al. [
13].
The analysis results show that positive response rates exceeded 90% across all survey items, with the highest satisfaction reported in the “ease of use of the application” category (98.0%). This finding suggests that students appreciate the use of technology in their educational processes. The relatively lower score for the “PDF export feature” is believed to reflect students’ preference for receiving instant feedback rather than storing or saving information.
When the responses of the participating students were analyzed according to academic level, no statistically significant differences were found (p > 0.05). The observation that the mean satisfaction scores of first- and second-year students were higher than those of the other academic levels is thought to be due to the fact that students in the earlier years are still actively engaged in the anatomy education process.
This mobile system extends this concept beyond livestock farming applications by enabling offline bone classification for multiple species directly from device cameras, without the need for internet or cloud infrastructure. While ShinyAnimalCV demonstrates the utility of web-based platforms in academic settings, Smart Osteology prioritizes field-readiness and data privacy through a locally executable, AI-powered app.
5. Conclusions
This study successfully demonstrated the applicability of deep learning techniques in classifying certain long bones from selected domestic animal species. Bone detection was performed using the YOLO algorithm, and species and bone name classification were achieved with high accuracy through CNN and ResNet34 architectures. The obtained accuracy rate of 97.6% confirms that the developed system is a reliable tool for both educational and forensic purposes.
As a preliminary study, this work offers a novel and versatile digital solution applicable to fields such as veterinary anatomy education, forensic science, archaeology, and biological anthropology. Future studies aim to expand the scope of the system by increasing the diversity of bone types and animal species, comparing various AI architectures, and integrating interactive technologies such as augmented reality. Additionally, to enhance field applicability, there are plans to further refine user interfaces and establish collaborations with official institutions to develop modules tailored to specific needs.