Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography

Sidiropoulos, Ilias I.; Apostolidis, Kyriakos D.; Vrochidou, Eleni; Papakostas, George A.

doi:10.3390/info17040340

Open AccessArticle

Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography

by

Ilias I. Sidiropoulos

,

Kyriakos D. Apostolidis

,

Eleni Vrochidou

^*

and

George A. Papakostas

MLV Research Group, Department of Informatics, Democritus University of Thrace, 65404 Kavala, Greece

^*

Author to whom correspondence should be addressed.

Information 2026, 17(4), 340; https://doi.org/10.3390/info17040340

Submission received: 11 February 2026 / Revised: 12 March 2026 / Accepted: 30 March 2026 / Published: 1 April 2026

Download

Browse Figures

Versions Notes

Abstract

Christian Orthodox iconography is a fundamental element of the religious cultural heritage of many countries. Iconoclasm, vandalism, and the passage of time ruined the appearance of icons, making it difficult to recognize the depicted saints. This work aims to test the performance of 13 state-of-the-art deep learning models for the task of Christian Orthodox saints’ recognition from images of preserved wooden hand-painted icons, which has never before been reported in the literature. Additionally, this work introduces the first public image dataset (ICONSAINT—ICONographic SAINT Recognition Dataset) of saint icons for classification tasks, including 2730 annotated images of 546 icons of 123 classes. All models were tested in three experimental setups, involving a balanced part of the dataset of six classes, an imbalanced part of the dataset of 12 classes and a medium-imbalanced part of the dataset of eight classes, reporting accuracy of up to 89% with VGG19 for the balanced data, of up to 78% for MobileNet with the imbalanced data, and of up to 87% with DenseNet201 for the medium-imbalanced data. Moreover, Class Activation Maps (CAMs) were considered to highlight the regions of the input image that mostly influenced the decision of the models towards adding valuable explainability to the results through visual explanations.

Keywords:

religious cultural heritage; machine learning; computer vision; saint recognition; Christian icons; iconography; artificial intelligence; explainability; interpretability

1. Introduction

Christian Orthodox iconography is the painting of religious images, namely icons, depicting Jesus, Mary, angels, saints and biblical events, aiming to decorate and facilitate prayer within the Eastern Orthodox Church [1]. Icons, usually painted on walls or wooden panels, are not realistic; they use specific perspectives, features and symbolic colors to reveal spiritual truths [2]. Specifically, iconography flourished in Byzantium and developed a distinctive style and theological consistency [3]. Indeed, iconography follows a set of strict guidelines, preserved and passed through the centuries. Such consistency would ensure the visual recognition of important Christian figures and spiritual meanings across time and cultures. Therefore, Christian Orthodox iconography can be characterized as one of the most systematically consistent art forms worldwide, extending beyond art towards a sacred tradition [1,4].

The consistency in Christian Orthodox iconography can form a well-suited problem for computer vision applications, particularly for the automatic recognition of saints and other sacred figures and biblical events. The latter is an emerging area of research that mixes together theology, art, and artificial intelligence (AI). Therefore, automated recognition of saints in hand-painted Christian Orthodox icons can offer interdisciplinary research potential towards multiple directions, especially considering that iconoclasm, vandalism, and the passage of time have ruined the appearance of many icons. Such research could support cultural heritage preservation for restoration efforts and digital cataloging of available icons, for enhanced accessibility and education to help non-experts towards identifying saints and understanding Christian icons’ symbolisms, and for pattern recognition and iconographic analysis to uncover the evolution in iconography and detect different periods and styles, to detect forgery, and enable quantitative analysis of symbolic features.

Towards this direction, computer vision has been widely used in cultural heritage preservation research [5,6,7,8]. However, related research on Christian Orthodox iconography is limited. To this end, this work aims to introduce a novel dataset of Christian Orthodox wooden hand-painted icons and test the performance of state-of-the-art deep learning models for saints’ recognition, not previously reported in the academic literature, as far as the authors’ knowledge. Similar works in the field of Christian Orthodox saints’ recognition include the work of Milani and Fraternali [9], who introduced a dataset and used a convolutional neural network (CNN) to identify saints in Christian religious paintings. Pinciroli Vago et al. [10] compares the performance of multiple algorithms to classify characters in Christian art paintings. Stork et al. [11] used a CNN to identify figures in Christian artworks based on their leading attributes, e.g., George is complemented by the dragon, Daniel by the lion, John by the eagle, Jesus by the cross, etc., towards interpreting the depicted source story and revealing its meaning. Yet, all aforementioned works refer to Christian artworks. Christian Orthodox icons were in focus only in two related works, that of Tzouveli et al. [12] and of Duan et al. [13]. In [13], the authors propose an active shape model (ASM) for icon face representation, in Cypriot Byzantine-style icons, towards style identification and attribution. The authors managed to cluster icons with style similarities based on two types of facial features. A more closely related approach is presented in [12], where the authors combine image analysis with fuzzy description logics (DLs) to construct a fuzzy knowledge base to classify figures in Byzantine icons. First, a big set of heuristics is provided by experts who interpreted the icons. The image analysis algorithm detects the saint’s face region, eyes, and nose, while in a subsequent step, the hair, forehead, cheeks, mustache, and beard are extracted, as well as the base color of the face. Finally, a semantic interpretation for each of these features is produced, together with formal assertions. Evaluation results report 80% accuracy for face detection and 96% accuracy for face-region detection, while correct classification results for 20 classes range from 74.07% to 88.88%. It should be noted though that the data used in the latter work was not openly published, while the proposed methodology included several steps such as icon analysis, semantic segmentation, feature extraction, semantic interpretation, knowledge representation and reasoning.

To this end, compared with previous related works, the contributions of this work can be summarized in the following points:

•: This work introduces the first annotated public (the annotated dataset is available at https://github.com/MachineLearningVisionRG/ICONSAINT accessed on 10 February 2026 ) dataset of Christian Orthodox hand-painted icons on wood panels. The dataset includes 2730 images of 123 classes, depicting saints and religious events. The dataset is tailored to the challenges of iconographic analysis.
•: The formulation of a new computer vision task, i.e., saints’ recognition in Christian orthodox iconography, which has not been previously benchmarked in the academic literature.
•: Thirteen different deep learning models are tested on the task of saints’ recognition. A set of extended experiments is designed to evaluate the best performing model by using combinations of classes to create balanced, imbalanced, and medium-imbalanced subsets of the dataset, including: a subset of the most balanced two and six classes, an unbalanced subset of 12 classes, and a medium-imbalanced subset of eight classes. Thus, a systematic benchmark framework is proposed to evaluate multiple pretrained architectures under consistent conditions, providing the first comparative reference work for this specific problem under study.
•: The results are visually interpreted by using Class Attention Maps (CAMs) to detect representative regions on the image where the classifier focused. Testing accuracy and loss, as well as confusion matrices, are also provided to evaluate the models’ performance. This analysis aims to provide domain-specific insights into the visual cues that models rely on.

These elements collectively constitute the technical contribution of this work, which goes beyond the empirical performance comparison of models. In what follows, Section 2 reviews basic aspects of Christian Orthodox iconography. Section 3 presents materials and methods used in this work, while the experimental setup is described in Section 4. Results and discussions are included in Section 5, while Section 6 concludes the paper.

2. Christian Orthodox Iconography

Christian Orthodox iconography is the art of painting depictions of holy persons or religious events of the Eastern Orthodox Church [14]. The icons derive their content from the lives of Saints and the miracles they performed during their lifetimes. Icons serve as spiritual tools used in prayer, contemplation, and liturgical worship [15]. Their tradition dates back to the beginning of Christianity, flourished in the Byzantine Empire, and was spread to Russia, Greece, Serbia and beyond [16]. Christian Orthodox iconography contains specific features and symbolisms linked with the life of the depicted saint [17]. The artists follow specific rules regarding the saints’ posture, appearance, gestures, clothing, and leading attributes. The main rules and symbolisms include the following [18,19]:

•: Eyes: are always large and vivid, revealing their mental tension, because they have seen supernatural things.
•: Ears: are usually big, because they listened to the commandments of the Lord.
•: Colors: are not realistic, e.g., red horses and blue or purple rocks, to help the viewer move into a transcendental and spiritual dimension. Each color has its own meaning; for example, white symbolizes light and purity, black symbolizes mystery, and yellow symbolizes divine glory and brilliance.
•: Hands: are making specific gestures. The raised hand or open palm symbolizes resistance to evil and the denial of earthly glory. In the icons of the Virgin Mary or the Holy Prodrome (the forerunner), the open palm expresses supplication.

In addition to the above symbolisms, there are several peculiarities in the icons, which are related to the lives of the saints depicted (Figure 1):

•: Martyrs: hold a cross in their hand (symbolizing their martyrdom) (Figure 1a) and wear red clothing (symbolizing the blood they shed for Christ). They may also hold/appear with the instrument of their martyrdom (Figure 1b), while military martyrs hold their weapons (Figure 1c).
•: Apostles: hold a scroll or book. The Apostles who left a written work hold a book, as the four Evangelists hold their Gospel (Figure 1d) and the Apostle Paul his letters (Figure 1e), while those who did not leave a written work hold a closed scroll.
•: Prophets: hold scrolls with excerpts from their prophecies (Figure 1f).

•: Hierarchs: wear priestly vestments, decorated with motifs of crosses, and hold the Gospel in one hand, because through it they preached the Word of God, while the other hand usually has a gesture of blessing (Figure 1g).
•: Ascetic Monks: wear dark robes, usually with a hood, and hold a prayer rope or a cross (symbolizing their spiritual struggle) (Figure 1h). They usually hold a scroll, open or closed, depending on whether they have left a written work behind them (Figure 1i).

3. Materials and Methods

3.1. Proposed Methodology

Captured images were used for training 13 deep learning models, while evaluation included performance metrics, CAMs, and confusion matrices. Several experimental setups have been investigated to evaluate our dataset on three types of problems: using a balanced, an imbalanced and a medium-imbalanced dataset. Details on each step of the process are provided in the rest of the section.

This work leverages a combination of deep learning and computer vision techniques to analyze and recognize saints within icons. Deep learning models are employed to extract and learn complex visual features from the captured image data, enabling the efficient classification and interpretation of unseen data. The proposed methodology aims to form a framework towards enhancing the automated analysis, identification, and understanding of sacred iconographic representations of the Christian Orthodox religion.

3.2. Dataset

The data collected consists of church frescoes and portable icons of saints, which can be found both in Christian Orthodox churches and in the homes of believers, in the area of the Municipality of Pangaio in the Prefecture of Kavala in northern Greece. The captured icons are of various conditions, from very well preserved (Figure 2a) to partially damaged (Figure 2b). A dataset containing both well-preserved and damaged icons is invaluable for developing robust and reliable AI models that can recognize and interpret sacred imagery across real-world conditions.

Usually, icons found in churches or private collections vary in age and condition. Therefore, the proposed dataset aims to reflect real-world variability as well as the historical reality. Moreover, training on diverse inputs would improve the generalization ability of models and enhance their robustness to noise, e.g., missing icon parts, discolorations, etc. Note that damaged icons are essential for digital restoration tasks as well as towards adapting easily to related tasks such as predicting missing faces, identifying saints based on partial features, or attributing icons to specific schools, artists or regions across time periods.

All icons were photographed by following a specific protocol. Each icon was photographed from a distance so that only the theme of the icon appears within the image. In cases where this was not possible, the image was taken so that the theme of the hagiography would appear as clear as possible. The difficulties in photographing the icon alone stemmed from lighting conditions and reflections on the protective glass covering the icons in most cases (Figure 2c), as well as the icons’ placement in inaccessible locations or at significant heights (Figure 2d), especially in churches, and in some cases, their large size.

Each icon was photographed from five different angles:

•: Frontal angle (0°);
•: Right angle (~60°);
•: Left angle (~−60°);
•: Intermediate angle between frontal and right (~30°);
•: Intermediate angle between frontal and left (~−30°).

The latter is towards creating a multi-angle dataset to help models recognize iconographic features regardless of perspective, thus improving their performance in real-world scenarios. Moreover, it is due to technical reasons, since, as already mentioned, icons are often displayed behind glass, i.e., to avoid reflections.

An example of the protocol followed of five different angles/perspectives is illustrated in Figure 3.

The final ICONSAINT dataset is a compilation of 546 different Christian Orthodox icons, each one photographed from five different angles, resulting in a total of 2730 images of 123 different saints or religious events. The list of the number of images per saint/event is included in Table 1. As can be observed from Table 1, our final dataset of 123 classes is an imbalanced dataset, since the number of images for each class is variable, ranging from 495 to 5. This is due to the fact that some saints are most popular and are depicted in every church or are easier to find, while others are rare.

All images were taken using an iPhone 11 (Apple Inc., Cupertino, CA, USA), assembled in China, with a dual 12 MP camera with ultra-wide and wide-angle lenses, the apertures of which were f/2.4 and f/1.8, respectively, with the smart HDR function, thus allowing for high-definition and high-quality images to be captured.

3.3. Classification Models

To better evaluate the quality of the proposed dataset, a set of 13 deep learning models of different architectures was selected, aiming to determine which one can work most effectively for the problem under study:

•: DenseNet121 and DenseNet201: Both models are based on the same DenseNet architecture in which each layer of the CNN communicates with the next layers, allowing for the maximum flow of information within the network, which allows for better training of the models. The main difference between the two models is in the number of layers, with DenseNet121 comprising 121 layers and 8.1 M parameters, while DenseNet201 has 201 layers with 20.2 M parameters [20]. Testing both models on the same image classification problem can provide useful insights into the performance and efficiency of the models, e.g., whether the deeper model can extract more complex features and potentially achieve higher accuracy, indicating that added depth translates into better results on our specific dataset.
•: EfficientNet: EfficientNet is a family of CNN models that introduced a new scaling method that significantly improved performance and efficiency over previous models like ResNet, Inception, and DenseNet. Specifically, the model uses a method of uniformly scaling the dimensions of depth, width and resolution, which means that if the input image is large, then the network will need more layers to increase its receiving field in order to capture more data [21].
•: InceptionV3: InceptionV3is one of the most successful models on the ImageNet benchmark. It is based on the use of Inception modules, which perform multiple types of convolutions in parallel, and then they concatenate the results. It is widely used for feature extraction in many transfer learning tasks [22].
•: MobileNet and MobileNetV2: The MobileNet model is designed to be fast and efficient, enabling real-time vision applications on mobile devices. This model uses depthwise separable convolutions, which reduce computational costs and the model’s size. The MobileNetV2 version introduces inverted residuals, reducing the dimensions of the input image before it is processed and then increasing its dimensions back, as well as linear bottlenecks, which help to preserve the input image information while changing its dimensions from the inverted residues, making it more accurate and efficient while keeping it light [23,24].
•: ResNet50 and ResNet101: The ResNet family uses residual connections, which allow the network to learn a set of residual functions that connect the input image to the desired output, enabling very deep networks and solving the vanishing gradient problem. The ResNet50 model consists of 50 convolutional layers, uses bottleneck residual blocks and is very popular due to its balance between accuracy and efficiency. The ResNet101 model has 101 layers and more bottleneck blocks than ResNet50, while it is reported to provide better accuracy for fine-grained or complex datasets [25].
•: NasNetLarge and NasNetMobile: NasNet models are based on finding the optimal building blocks for image classification and scaling them for different resource constraints. Both NasNetLarge and NasNetMobile are pretrained on large-scale datasets such as ImageNet. NasNetLarge is usually employed when accuracy is critical and resources are not limited, while NasNetMobile is a lightweight architecture appropriate for mobile devices and real-time applications, preferred due to its speed and size while maintaining adequate accuracy [26].
•: VGG16 and VGG19: Both VGG16 and VGG19 models are classic CNN architectures that have been widely used for feature extraction and transfer learning due to their simple architecture and ease of implementation [27]. The VGG16 model contains 16 convolutional layers, while VGG19 contains three additional layers.
•: Xception: Xception is an extension of Inception that uses depthwise separable convolutions across the entire model instead of standard convolutions. The latter aims towards a reduction in parameters and computations, providing a simple yet accurate model [28].

The selection of these 13 CNN architectures was based upon three criteria: (1) all models are among the most widely-adopted and validated models in transfer learning research, ensuring reliable baseline comparisons; (2) the models represent a diverse set of architectural families, including residual networks, dense connected networks, Inception-based models, mobile-optimized architectures, and classical deep CNNs; (3) the selected models aim to span a broad range of computational complexity, from lightweight architectures suitable for real-time or mobile deployment (MobileNet, MobileNetV2, and NasNetMobile) and medium-complexity to offer a trade-off between accuracy and computational costs (DenseNet121, EfficientNet, ResNet50, VGG16, Xception, and InceptionV3) to heavyweight networks suitable for fine-grained recognition (DenseNet201, ResNet101, VGG19, and NasNetLarge). This variety ensures a comprehensive evaluation of how different deep model architectures perform on the proposed dataset.

3.4. Evaluation

The output of all classifiers is evaluated quantitatively by calculating evaluation metrics such as accuracy, precision, recall, F1-score and confusion matrices, as well as qualitatively by providing techniques for interpretability of the results and visual understanding, such as CAMs.

3.4.1. Quantitative Evaluation Through Performance Metrics and Confusion Matrices

In this work, the quantitative evaluation of all models is performed by using classification metrics of accuracy rate, precision, recall, and F1-score [29]. Confusion matrices are also provided to create a more complete picture of the classification models’ performance, especially due to the imbalanced nature of the dataset.

In a confusion matrix, the actual versus predicted classification samples are visually presented within a grid, as shown in Figure 4. True positives (TPs) and true negatives (TNs) refer to the correctly predicted positives (Ps) and negatives (Ns), respectively. False positives (FPs) and false negatives (FNs) refer to incorrectly predicted positives and negatives, respectively.

Accuracy reflects the overall correctness of predictions. Precision indicates how many of the predicted positives are correct, while recall calculates how many of the actual positives are correctly predicted. The F1-score provides a balance between precision and recall, and it is preferable in case of uneven classes. All metrics are calculated directed from the values of the confusion matrix as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(4)

3.4.2. Qualitative Evaluation Through Class Attention Maps

Qualitative evaluation aims to provide insights into the models’ behavior. In this work, CAMs are considered as a qualitative measure for the classification models’ performance, aiming to highlight the regions of the input image that mostly influenced the decision of the models. CAMs are selected to add explainability to our results through visual explanations, providing the necessary trust and transparency to support the interpretation of the models’ performance. CAMs are computed from the last convolutional layer of the deep classifier. The feature maps and the weights of the output layer are combined to generate a heatmap on the input image, where a color gradient from blue to red is used to indicate the intensity of activation: blue refers to low activation, meaning that the model did not pay attention to the specific region, while red refers to very high activation, signifying a crucial region for the prediction [30].

In this work, the variant Gradient-weight CAM (Grad-CAM) [31] was employed since it does not presuppose an architecture requirement and can work with any CNN model. Moreover, Grad-CAM provides enhanced visual quality, i.e., better localization and more detailed heatmaps. This variation takes advantage of the gradients of the target class that flow in the last convolutional layer, providing class-discriminative and visually insightful interpretations.

4. Experimental Setup

All implementations were coded in Python v3.7/3.8 using TensorFlow 2.0 and Keras 2.3.1. Libraries NumPy 1.18, Matplotlib 3.2, Seaborn 0.11, ScikitLearn 0.24, Pandas 1.0 and OpenCV 4.5 were used for data preprocessing, visualization, performance evaluation, model training, and image processing, respectively, towards implementing and evaluating the proposed deep learning pipeline. Table 2 summarizes the setup for all deep models.

All images were subjected to basic preprocessing, including pixel normalization and resizing to fit the target size input of each model.

The evaluation of the selected models on the novel designed dataset was conducted based on three experiments:

•: Experiment 1: The balanced problem of six (6) classes. In this case, a balanced subset is created, comprised of six classes of 45 images each. The selected classes are: (1) Saints Nicholas, Raphael and Irene, (2) Saint Athanasios the Great, (3) Saint John the Baptist, (4) Saint Demetrios, (5) Saint Paisios, and (6) Apostle Peter and Apostle Paul.
•: Experiment 2: The imbalanced problem of twelve (12) classes. In this case, an imbalanced subset of 12 classes is created, spanning from 445 to 45 images, including the following classes: (1) Saints Nicholas, Raphael and Irene, (2) Saint Athanasios the Great, (3) Saint George, (4) Saint Demetrios, (5) Saint John the Baptist, (6) Saint Nicolas, (7) Saint Paisios, (8) Saint Panteleimon, (9) Apostle Peter and Apostle Paul, (10) Jesus Christ, (11) Mother of God and Jesus Christ, and (12) Prophet Ilias.
•: Experiment 3: The medium-imbalanced problem of eight (8) classes. Finally, a medium-imbalanced subset of eight classes is created, spanning from 45 to 60 images, including the following classes: (1) Saints Nicholas, Raphael and Irene, (2) Saint George, (3) Saint Demetrios, (4) Saint John the Baptist, (5) Saint Nicolas, (6) Saint Paisios, (7) Jesus Christ, and (8) Mother of God and Jesus Christ.

The exact distribution of images in each experiment is included in Table 3. All results included in the tables (accuracy, precision, recall, and F1-score) in the following refer to the average after performing 5-fold cross-validation. In cases where limited data are available, such as in our case, 5-fold cross-validation provides a more reliable estimate of model performance than a single train–test split, ensuring that every sample is used for both training and validation across folds.

It should be also noted that due to the fact that some saints’ images are very similar between them (high class similarity), as well as that images are affected by visual noise, i.e., strong illuminations and multiple different perspectives and orientations, while some icons are very faded, aged and partially damaged, data augmentation was deliberately not applied towards retaining data fidelity aiming not to further distort important signal patterns; transformations in our case would probably amplify ambiguity rather that enhance learning. Moreover, the dataset includes five images per icon taken from different angles, producing natural variations in illumination, viewpoint and orientation. These multi-view images, therefore, already serve as a form of natural augmentation. Finally, since pretrained models are used, they already know generalized representations from broader data; thus, synthetic data augmentation is not considered critical. For all these reasons, it is decided to preserve data fidelity and rely solely on the natural variability provided by the multi-angle dataset.

5. Results and Discussion

5.1. Quantitative Results

5.1.1. Experiment 1

Experiment 1 is a six-class classification problem based on balanced data. Balanced data ensures that the models do not develop a bias towards the majority class. In balanced problems, such as in this case, accuracy is meaningful since, due to equal class representation, each correct prediction contributes equally to the overall accuracy. Table 4 summarizes all performance metrics of all models for Experiment 1.

The results in Table 4 indicate that the best performing models with an accuracy of 0.89 are VGG19 and NasNetLarge. On the other hand, the model with the poorer performance is NasNetMobile, with an accuracy of 0.53. VGG19 and NasNetLarge are large models, which makes them better at capturing intricate patterns even in small datasets, revealing that they benefit from transfer learning and manage to extract generalized features. NasNetMobile is a lightweight model, which seems to fail in classification tasks with subtle inter-class differences.

Despite the balanced dataset, it is obvious that results depend mostly on the distinctiveness between classes. Classes that share similar features are hard to detect, especially from small model architectures. Figure 5 illustrates indicative confusion matrices from the best and worst performing models in order to support feature distinctiveness.

As it can be observed from the confusion matrices, the models find it hard to detect mainly Saint Athanasios the Great and Saint John the Baptist. Specifically, Saint John the Baptist is the one not detected by all models, mainly incorrectly classified as Saint Demetrios. These two classes are depicted in Figure 6. A closer observation reveals that both Saints share similar features, e.g., the javelin with the cross at its edge and the second head on the left bottom of the image. Feature visualizations with CAMs are therefore critical to see what the models pay attention to.

5.1.2. Experiment 2

Experiment 2 is a twelve-class classification problem based on imbalanced data. Considering only the accuracy of models would be misleading in this case. Precision, recall and F1-score could help to point out the classes that the model struggles to discriminate. Visual breakdown of true versus predicted classes provided by the confusion matrices is also essential. Table 5 summarizes all performance metrics of all models for Experiment 2.

Both the increase in classes and the unbalanced dataset led to a decrease in model performance. Minority classes with subtle iconographic differences are difficult to learn by the models.

The model with the lowest performance was NasNetMobile, which had a prediction accuracy rate of 57%, while the best performing model was MobileNet with 78%. As it can be seen from the indicative selected confusion matrices illustrated in Figure 7, the models fail to classify most of the less balanced classes: Saint Demetrios and Saint John the Baptist, as in the previous experiment, Saint Athanasios the Great, Saint Panteleimon, Apostle Peter and Apostle Paul, and Prophet Ilias. However, the imbalanced class of Saint Paisios is correctly classified in all cases; Saint Paisios is the only Saint in the used dataset that wears a monastic black robe, which becomes a highly discriminative feature. Confusion matrices reflect the fine-grained nature of Christian Orthodox iconography, where many saints share highly similar visual attributes. Thus, the main objective of fine-grained classification is the efficient identification of informative regions in images [32].

Confusion matrices also indicate that misclassified images were mainly attributed to the classes of Mother of God and Jesus Christ. The latter is a model bias towards the majority class, indicating that the models for some of the minority classes did not learn distinctive features well enough, resulting in assigning them to the most frequent class. The latter effect is typical in imbalanced multi-class problems [33], while here it is further amplified due to the visual similarity between several saints. Due to the increased complexity of the dataset, as well as its imbalanced nature, the results are considered satisfactory, since the models are capable of correctly classifying most of the testing images.

5.1.3. Experiment 3

Experiment 3 is an eight-class classification problem based on medium-balanced data. The use of a medium-balanced dataset aims to reveal the threshold at which class imbalance begins to affect metrics significantly. Table 6 presents the performance metrics of all models for Experiment 3.

According to Table 6, most of the models report satisfactory classification performance. DenseNet201, MobileNet, MobileNetV2 and VGG16 are among the best performing models, with accuracies of 87%, 85%, 85% and 80%, respectively. NasNetLarge, NasNetMobile and Xception are those with the poorest performance, with accuracies of 47%, 50% and 50%, respectively. The confusion matrices for the four best performing models are included in Figure 8. From Figure 8, it can be observed that in all cases, the models fail to correctly classify the images of Saint Demetrios, attributing them to the Saint George class. An additional common mistake is for Saint John the Baptist, assessed either as Jesus Christ (mostly) or the Mother of God and Jesus Christ.

Figure 9 includes images from these classes, towards revealing similarities in features. As seen from Figure 9a,b, Saint Demetrios and Saint George share striking similarities: they are both mounted on horses, and they are portrayed as courageous warriors defeating an enemy, having the same pose while holding the same weapon. Therefore, from a purely visual perspective, their icons share enough similarities to confuse the classification models, especially in our case, where the dataset is limited. Note that the class of Saint Demetrios is denoted as Saint George, while the opposite does not occur, since the class of Saint George is the one containing more images. The same occurs in case of the class of Saint John the Baptist. While this class is limited compared to the one of Jesus Christ, both share similar features, as shown in Figure 9c,d: both have long brown hair and beards, have a Halo and are holding a scroll/book, and are wearing the same kind of simple clothing.

In the following, attention visualizations (Grad-CAM) are employed to verify which parts of the icon the models focus on.

5.2. Qualitative Results

CAMs are used here as an interpretation tool towards understanding how the models are recognizing the depicted saints. Figure 10, Figure 11 and Figure 12 demonstrate the CAMs from the best performing model in Experiments 1, 2, and 3, respectively, regarding all classes involved.

Figure 10 includes six CAM images, one indicative example for each class, referring to the best performing model of Experiment 1, VGG19. From the CAM, we can see that VGG learns to classify Saints Nicholas, Raphael and Irene from the face (Figure 10a) and Saint Athanasios the Great (Figure 10b) from the crosses on his clothes. As for Saint John the Baptist and Saint Demetrios, who were the two saints that the model mixed up, as it can be seen from Figure 10c,d, both images reveal areas with great activation on the left bottom, where the second human head is depicted, as it was initially supposed. Saint Paisios is recognized by the black hat, while Apostle Peter and Apostle Paul are recognized from the church miniature that they hold between them.

Figure 11 includes 12 CAM images, one indicative example for each class, referring to the best performing model of Experiment 2, MobileNet.

From the CAMs in Figure 11a, we can see that MobileNet paid attention to all three figures to detect the Saints Nicholas, Raphael and Irene classes. Saint Athanasios the Great (Figure 11b) was detected mainly from the crosses on his clothes, as in Experiment 1. Regarding the classes of Saint Demetrios and Saint George, the model failed to correctly classify the images of Saint Demetrios, attributing them to the Saint George class. Both saints appear on horseback with comparable poses and weapons, leading the model to focus on these shared attributes rather than the subtle iconographic cues that differentiate them. As can be seen from the indicative CAMs in Figure 11c,d, the model pays attention to the horse, pose and weapon in both cases, which explains the quantitative results of Experiment 2 for these two classes. Saint John the Baptist (Figure 11e) is classified from his pose, as the model is paying attention to the entire human figure. Saint Nicolas (Figure 11f) is recognized by his face and clothes. Note that, as seen in the confusion matrices of Figure 7, images of Saint Nicolas are classified as Saint Athanasios the Great. The CAM of Figure 11f reveals that the model learns features from the clothes of Saint Nicolas, which have the same cross pattern as in the case of Saint Athanasios the Great, meaning that these two classes are, in cases, very close to each other, confusing the model. Both wear vestments with nearly identical cross patterns, which explains overlapping Grad-CAM activations and the misclassification between these two classes.

The Grad-CAM visualizations of the imbalanced Experiment 2 further illustrate how class imbalance shaped the models’ internal representations. Minority classes with limited training samples (Saint Panteleimon, Apostle Peter and Apostle Paul, and Prophet Ilias) exhibit more diffused activation maps, indicating that the model did not develop sufficiently discriminative features for these classes. These patterns highlight that the classification task is fundamentally fine-grained and that both visual similarity and class imbalance jointly shape the observed errors.

In the case of Saint Paisios (Figure 11g), all testing images were correctly classified, yet there is no clear heatmap activation in either of the images. The latter means that the model still finds enough abstract cues to make the correct decision. MobileNet has a lightweight architecture, and, therefore, it might rely on non-localized features, rather than distinct regions within the image. MobileNet uses depthwise separable convolutions that significantly reduce the number of parameters and shift how spatial features are captured. This can result in lower-resolution feature maps at later layers, which makes the CAM’s output faint or blank, especially for underrepresented classes.

In order to further investigate if MobileNet is being minimalist, the CAM of the same class in more complex architectures, that of VGG19, ResNet201 and EfficientNet, is illustrated in Figure 12. The results indicate attention regions on the hat and robe of the saint, verifying the lower activation visibility of lightweight MobileNet.

As for Saint Panteleimon (Figure 11h), activation is observed on the saint’s face and clothes, while for Apostle Peter and Apostle Paul (Figure 11i), attention is paid in the church miniature between them, as in Experiment 1. In Figure 11j, we can observe that the main characteristic that the model uses to classify Jesus Christ is its head pose and beard. Therefore, misclassification of Saint John the Baptist as Jesus Christ may be attributed to the same head pose of the two saints, who do not look straight forward, as most of the other saints, but shift their heads slightly to the left. For the class of Mother of God and Jesus Christ (Figure 11k), the model captures both faces, as for the class of Prophet Ilias (Figure 11l). Yet, Prophet Ilias also tilts his head to the left, which explains why testing images from his class were misclassified as Saint John the Baptist in most of the cases (Figure 7).

Figure 13 includes eight CAM images, one indicative example for each class, referring to the best performing model of Experiment 3, DenseNet201.

In the case of Experiment 3, the CAMs reveal expected regions in the image where the model paid attention, similar to the previous two experiments. The CAMs for Saint George, Saint Demetrios and Saint Paisios do not indicate activation regions. In cases, correct predictions may occur via subtle activation patterns that do not concentrate in one specific region strongly for the CAMs to highlight, especially for well-separated classes, such as in the case of Saint Paisios and Saint George. For underrepresented classes, such as for Saint Demetrios, the network may not learn strong, localized features. Indeed, in Experiment 3, ResNet201 failed to correctly classify the images of Saint Demetrios, attributing them to the Saint George class. Moreover, the class of Saint John the Baptist was mainly misclassified as Jesus Christ; the latter can be explained from the CAMs of Figure 13d,g, where the model in both cases paid attention to the same hand gesture of the Saints.

Overall, from the presented CAMs over all three experimental setups, it can be observed that the pretrained models of various architectures, from dense to simple, fine-tuned on our datasets of noisy hand-painted images of icons, exploring balanced, imbalanced and medium-balanced case studies, are capable of learning to discriminate the several classes by using the same visual clues as a human observer.

5.3. Discussion

In this work, a novel Christian Orthodox icon dataset has been presented and tested with 13 different deep architectures for the recognition of the depicted saints. Various experimental setups have been tested, overall indicating the ability of specific deep models to correctly classify imbalanced data, as well as data that are noisy, e.g., images poorly preserved, affected by illuminations, or of different orientations. By providing experiments across balanced, imbalanced and medium-imbalanced datasets, we aimed to uncover how the composition of data would affect the models’ behavior.

The results offered a deeper understanding of performance trade-offs; the balanced dataset resulted in higher performance across all classes, the imbalanced dataset resulted in biased accuracy in favor of majority classes, and the medium-imbalanced dataset revealed the threshold at which the imbalanced classes begin to affect the models’ performance. Moreover, from the provided CAMs, it became clear that balanced setups offer richer learning, while in imbalanced setups, feature representation for minority classes tends to be underdeveloped.

The results verify that deep learning and computer vision are able to help towards identifying and categorizing a wide range of different icons. Most of the models used were properly trained and had a high percentage of positive predictions in the set of testing images, despite the complexity of the multi-classification problem. The task of Christian Orthodox saint identification is very complex, as many saints are likely to have a variety of different depictions (e.g., Saint George is not always on the horse, Jesus Christ has a Halo or wears crown of thorns), while some saints have similar characteristics, making their icons very similar (e.g., Saint Demetrios and Saint Gregory) and difficult to be correctly classified not only by AI but also by experienced humans.

The performance results of the models, as well as their CAM visualizations, give room for future improvement and can be positively influenced by many configurations. Initially, collecting a larger amount of data from different saints in order to create a fully balanced dataset would be able to help the models in their training and final performance. Therefore, future work includes the enrichment of our dataset with more images, as well as with more classes, making it the first and biggest publicly available dataset of Christian Orthodox saint icons. Considering that Christian Orthodox iconography relies heavily on symbolic attributes such as crosses and characteristic vestments, future research would also benefit from the exploitation of explicit object-level detectors in the classification pipeline [34]. The fusion of whole-image classifiers with specific object detectors of key iconographic elements could potentially enhance the models’ ability to better distinguish between visually similar classes [35].

While in this work data augmentation was intentionally avoided, future research could investigate data augmentation strategies, considering the highly imbalanced nature of the dataset. Specifically, fine-grained recognition with feature-level data augmentation [36] could be beneficial, since many images share similar visual cues, and discriminative characteristics play a key role in distinguishing fine-grained classes. While image-level data augmentation is commonly used in deep learning classification tasks, it is not efficient in fine-grained problems due to randomly editing regions of the image, thus destroying discriminative characteristics in the subtle region. Feature-level augmentation strategies could therefore be employed in future research to balance and enrich the dataset without risking the loss of discriminative details [37,38,39].

Our long-term goal is to create a robust model able to deal with the complex multi-class classification problem of saint recognition so as to detect saints’ iconographic variations across regions, as well as on non-preserved old historical icons. The latter would support the development of a digital tool for automatic saint categorization, useful for church cataloging or as an educational tool for theology students or for personal use by the religious.

Note that the identification of Christian Orthodox saints is a fine-grained task that can even challenge trained art historians or theologians, especially when icons are partially damaged, stylistically diverse or sharing overlapping iconographic features. Theologists, especially non-expert ones, typically rely on broader cues and may struggle to distinguish attributes that require experts’ knowledge of vestment, symbolic attributes, etc. Thus, compared to the identification capacity of art historians or non-specialist theologians, the evaluated models achieve notable performances, considering the complexity of the task and the imbalanced dataset. The models managed to classify the majority of the test images and exhibited consistent patterns aligned with interpretable human reasoning, as evaluated by the Grad-CAM visualizations. In this context, the practical impact and the potential of the proposed system as an assistive tool for cataloging, education, or preliminary analysis is further highlighted.

From a deployment perspective, the evaluated model architectures differ significantly in computational costs and, thus, in suitability for real-world applications. Lightweight models, such as the MobileNet family, can offer fast inference and could be easily integrated into mobile applications [40], while heavier architectures could be employed for server-side processing [41]. Moreover, practical deployment should consider a preprocessing pipeline to deal with varying lighting conditions and illuminations, which are common in-field conditions affecting the quality of images. By addressing such aspects, the proposed benchmark could be transformed into a practical and operational tool.

The use, however, of AI in sacred contents may raise ethical and theological concerns [42,43]. Data acquisition should be respectful, ensuring that the religious significance of icons is protected, such as being based on requested permissions and contextual awareness. Algorithmic representations and assumptions made by programmers need to be verified by theologists, clarifying that the proposed method aims to be an assistive tool, complementary to the expertise of theologists and art historians. Deployment in real-world settings must prevent misuse, such as inappropriate commercial exploitation or trivialization of sacred images. Use and distribution of sacred imagery needs to be done with full respect in a way that does not offend the divine and the faithful, offering room for both ethical scientific explorations and practical applications. The involvement of religious and cultural heritage communities in the expansion of the dataset and the design and evaluation of the proposed system could help ensure that technological innovations would align with the values and traditions associated with sacred art.

6. Conclusions

This work aims to test the performance of 13 deep learning models of various architectures for the task of Christian Orthodox saints’ recognition from images of preserved wooden hand-painted icons, which has never before been reported in the literature. For this reason, the first public image dataset of saint icons is introduced, including 2730 annotated images of 546 icons corresponding to 123 classes of both saints and religious events.

The models were tested in three experimental setups, referring to training and testing the models by using a balanced part of the dataset of six classes, an imbalanced part of the dataset of 12 classes and a medium-imbalanced part of the dataset of eight classes. The results reported an accuracy of up to 89% with VGG19 for the balanced data, of up to 78% for MobileNet with the imbalanced data, and of up to 87% with DenseNet201 for the medium-imbalanced data. Qualitative analysis of the results by using Grad-CAMs indicated that the models focused on specific features that characterize the depicted saint and exploited them to accomplish correct classifications. This work is the first step towards the development of a digital tool (e.g., a mobile application) for automatic Christian Orthodox saint categorization, towards accessible theological education and Christian Orthodox preservation of tradition, bridging the gap between religious tradition and modern AI-based technology.

Author Contributions

Conceptualization, G.A.P.; software, I.I.S. and K.D.A.; validation, E.V. and G.A.P.; formal analysis, E.V., I.I.S. and K.D.A.; investigation, E.V. and I.I.S.; resources, E.V. and I.I.S.; data curation, E.V. and I.I.S.; writing—original draft preparation, E.V. and I.I.S.; writing—review and editing, E.V. and G.A.P.; visualization, G.A.P.; supervision, G.A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in GitHub at https://github.com/MachineLearningVisionRG/ICONSAINTaccessed on 10 February 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CAM	Class Attention Map
CNN	Convolutional neural network
AI	Artificial intelligence
ASM	Active shape model
DLs	Description logics

References

Grabar, A. Christian Iconography; Princeton University Press: Princeton, NJ, USA, 2023; ISBN 9780691252094. [Google Scholar]
Jensen, R.M. Understanding Early Christian Art; Routledge: London, UK, 2023; ISBN 9781003216094. [Google Scholar]
Hunt, L. Eastern Christian Iconographic and Architectural Traditions: Oriental Orthodox. In The Blackwell Companion to Eastern Christianity; Wiley: Hoboken, NJ, USA, 2007; pp. 388–419. ISBN 9780631234234. [Google Scholar]
D-Vasilescu, E.E. Development of Eastern Christian Iconography. Transform. Int. J. Holist. Mission Stud. 2010, 27, 169–185. [Google Scholar] [CrossRef]
Mitric, J.; Radulovic, I.; Popovic, T.; Scekic, Z.; Tinaj, S. AI and Computer Vision in Cultural Heritage Preservation. In Proceedings of the 2024 28th International Conference on Information Technology (IT), Zabljak, Montenegro, 21–24 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
Mishra, M.; Lourenço, P.B. Artificial Intelligence-Assisted Visual Inspection for Cultural Heritage: State-of-the-Art Review. J. Cult. Herit. 2024, 66, 536–550. [Google Scholar] [CrossRef]
Basu, A.; Paul, S.; Ghosh, S.; Das, S.; Chanda, B.; Bhagvati, C.; Snasel, V. Digital Restoration of Cultural Heritage with Data-Driven Computing: A Survey. IEEE Access 2023, 11, 53939–53977. [Google Scholar] [CrossRef]
Kalampokas, T.; Mentizis, D.; Vrochidou, E.; Papakostas, G.A. Connecting National Flags—A Deep Learning Approach. Multimed. Tools Appl. 2023, 82, 39435–39457. [Google Scholar] [CrossRef]
Milani, F.; Fraternali, P. A Dataset and a Convolutional Model for Iconography Classification in Paintings. J. Comput. Cult. Herit. 2021, 14, 1–18. [Google Scholar] [CrossRef]
Pinciroli Vago, N.O.; Milani, F.; Fraternali, P.; da Silva Torres, R. Comparing CAM Algorithms for the Identification of Salient Image Features in Iconography Artwork Analysis. J. Imaging 2021, 7, 106. [Google Scholar] [CrossRef]
Stork, D.G.; Bourached, A.; Cann, G.H.; Griffths, R.-R. Computational Identification of Significant Actors in Paintings through Symbols and Attributes. Electron. Imaging 2021, 33, 15-1–15-18. [Google Scholar] [CrossRef]
Tzouveli, P.; Simou, N.; Stamou, G.; Kollias, S. Semantic Classification of Byzantine Icons. IEEE Intell. Syst. 2009, 24, 35–43. [Google Scholar] [CrossRef]
Duan, G.; Sawant, N.; Wang, J.Z.; Snow, D.; Ai, D.; Chen, Y.-W. Analysis of Cypriot Icon Faces Using ICA-Enhanced Active Shape Model Representation. In Proceedings of the 19th ACM International Conference on Multimedia, Scottsdale, AZ, USA, 28 November–1 December 2011; ACM: New York, NY, USA, 2011; pp. 901–904. [Google Scholar]
Parry, K. The Blackwell Companion to Eastern Christianity; Parry, K., Ed.; Wiley: Hoboken, NJ, USA, 2007; ISBN 9780631234234. [Google Scholar]
Tsakiridou, C.A. Icons in Time, Persons in Eternity: Orthodox Theology and the Aesthetics of the Christian Image; Routledge: London, UK, 2013; ISBN 9781409447672. [Google Scholar]
Haworth, D.K. André Grabar, Christian Iconography, A Study of its Origins. Art J. 1970, 29, 468. [Google Scholar] [CrossRef]
Kokosalakis, N. Icons and Non-Verbal Religion in the Orthodox Tradition. Soc. Compass 1995, 42, 433–449. [Google Scholar] [CrossRef]
Tsakiridou, C.A. The Orthodox Icon and Postmodern Art; Routledge: New York, NY, USA, 2024; ISBN 9781003265825. [Google Scholar]
Andreopoulos, A. Gazing on God: Trinity, Church and Salvation in Orthodox Thought and Iconography. Theology 2015, 118, 47–48. [Google Scholar] [CrossRef]
Karthikeyan, M.; Raja, D. Deep Transfer Learning Enabled DenseNet Model for Content Based Image Retrieval in Agricultural Plant Disease Images. Multimed. Tools Appl. 2023, 82, 36067–36090. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946v5. [Google Scholar]
Ahmed, M.; Afreen, N.; Ahmed, M.; Sameer, M.; Ahamed, J. An Inception V3 Approach for Malware Classification Using Machine Learning and Transfer Learning. Int. J. Intell. Netw. 2023, 4, 11–18. [Google Scholar] [CrossRef]
Dong, K.; Zhou, C.; Ruan, Y.; Li, Y. MobileNetV2 Model for Image Classification. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 476–480. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
Ma’rifah, P.N.; Sarosa, M.; Rohadi, E. Comparison of Faster R-CNN ResNet-50 and ResNet-101 Methods for Recycling Waste Detection. Int. J. Comput. Appl. Technol. Res. 2023, 12, 26–32. [Google Scholar] [CrossRef]
Adedoja, A.O.; Owolawi, P.A.; Mapayi, T.; Tu, C. Intelligent Mobile Plant Disease Diagnostic System Using NASNet-Mobile Deep Learning. IAENG Int. J. Comput. Sci. 2022, 49, 216–231. [Google Scholar]
Ingle, Y.S.; Shaikh, N. Skin Cancer Recognition Using CNN, VGG16 and VGG19. In Smart Innovation, Systems and Technologies; Springer Nature: Singapore, 2023; pp. 131–144. ISBN 9789819940394. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1800–1807. [Google Scholar]
Ferrer, L. Analysis and Comparison of Classification Metrics. arXiv 2023, arXiv:2209.05355. [Google Scholar] [CrossRef]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2921–2929. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
Yang, Z.; Luo, T.; Wang, D.; Hu, Z.; Gao, J.; Wang, L. Learning to Navigate for Fine-Grained Classification. In Proceedings of the European Conference on Computer Vision (ECCV); Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2018; pp. 438–454. ISBN 9783030012632. [Google Scholar]
Sahare, M.; Gupta, H. A Review of Multi-Class Classification for Imbalanced Data. Int. J. Adv. Comput. Res. 2012, 2, 160. [Google Scholar]
Hasan, A.S.M.M.; Diepeveen, D.; Laga, H.; Jones, M.G.K.; Sohel, F. Object-Level Benchmark for Deep Learning-Based Detection and Classification of Weed Species. Crop Prot. 2024, 177, 106561. [Google Scholar] [CrossRef]
Chen, Z.-M.; Jin, X.; Zhao, B.-R.; Zhang, X.; Guo, Y. HCE: Hierarchical Context Embedding for Region-Based Object Detection. IEEE Trans. Image Process. 2021, 30, 6917–6929. [Google Scholar] [CrossRef] [PubMed]
Pu, Y.; Han, Y.; Wang, Y.; Feng, J.; Deng, C.; Huang, G. Fine-Grained Recognition with Learnable Semantic Data Augmentation. IEEE Trans. Image Process. 2024, 33, 3130–3144. [Google Scholar] [CrossRef]
Ye, S.; Peng, Q.; Cheung, Y.; Wang, Y.; Zou, Z.; You, X. FN-NET: Adaptive Data Augmentation Network for Fine-Grained Visual Categorization. Pattern Recognit. 2025, 165, 111618. [Google Scholar] [CrossRef]
Li, H.; Zhang, X.; Tian, Q.; Xiong, H. Attribute Mix: Semantic Data Augmentation for Fine Grained Recognition. In Proceedings of the 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, China, 1–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 243–246. [Google Scholar]
Dalal, R.; Moh, T.-S. Fine-Grained Object Detection Using Transfer Learning and Data Augmentation. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain, 28–31 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 893–896. [Google Scholar]
Ogden, S.S.; Guo, T. Characterizing the Deep Neural Networks Inference Performance of Mobile Applications. arXiv 2019, arXiv:1909.04783. [Google Scholar] [CrossRef]
Jeong, H.-J.; Jeong, I.; Lee, H.-J.; Moon, S.-M. Computation Offloading for Machine Learning Web Apps in the Edge Server Environment. In Proceedings of the 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, 2–6 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1492–1499. [Google Scholar]
Papakostas, C. Mediating the Sacred in the Digital Age: Computational Approaches to Religious Practice, Ethics, and Discourse. In Proceedings of the 5th International Conference (NiDS 2025); Springer: Cham, Switzerland, 2026; pp. 237–250. [Google Scholar]
Nam, S.H. Digital Spirituality and Sacred Consciousness: Reclaiming Attention and Formation in the Age of AI. Expo. Times 2025, 137, 116–128. [Google Scholar] [CrossRef]

Figure 1. Indicative icons from our dataset depicting different categories of saints: (a) Saint Great Martyr Irene; (b) Saint Andrew; (c) Saint Great Martyr George; (d) Saint John the Evangelist; (e) Apostle Paul; (f) Prophet Jeremiah; (g) Saint Nicolas; (h) Saint Arsenius of Cappadocia; (i) Saint Antony. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork.

Figure 2. Indicative images from the dataset, depicting the grading of the icons’ quality: (a) very well preserved; (b) partially damaged; (c) with strong reflections; (d) poorly placed. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork.

Figure 3. Indicative set of images from the dataset, depicting Saint Paisios from five different perspectives: (a) left angle; (b) intermediate frontal-left angle; (c) frontal angle; (d) intermediate frontal-right angle; (e) right angle. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork.

Figure 4. Illustration of the structure of a confusion matrix.

Figure 5. Indicative confusion matrices of Experiment 1: (a) best performing model, VGG19; (b) worst performing model, NasNetMobile.

Figure 6. Indicative images of: (a) Saint John the Baptist; (b) Saint Demetrios. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork.

Figure 7. Indicative confusion matrices of Experiment 2: (a) best performing model, MobileNet; (b) worst performing model, NasNetMobile.

Figure 8. Indicative confusion matrices of Experiment 3 for the four best performing models: (a) DenseNet201; (b) MobileNet; (c) MobileNetV2; (d) VGG16.

Figure 9. Indicative testing images from Experiment 3 for the classes of: (a) Saint Demetrios; (b) Saint George; (c) Saint John the Baptist; (d) Jesus Chris. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork.

Figure 10. Indicative examples of CAMs for Experiment 1 with VGG19 for different test images and classes: (a) Saints Nicholas, Raphael and Irene; (b) Saint Athanasios the Great; (c) Saint John the Baptist; (d) Saint Demetrios; (e) Saint Paisios; (f) Apostle Peter and Apostle Paul. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork. Heatmap colors span red (high contribution) to blue (low contribution).

Figure 11. Indicative examples of CAMs for Experiment 2 with MobileNet for different test images and classes: (a) Saints Nicholas, Raphael and Irene; (b) Saint Athanasios the Great; (c) Saint George; (d) Saint Demetrios; (e) Saint John the Baptist; (f) Saint Nicolas; (g) Saint Paisios; (h) Saint Panteleimon; (i) Apostle Peter and Apostle Paul; (j) Jesus Christ; (k) Mother of God and Jesus Christ; (l) Prophet Ilias. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork. Heatmap colors span red (high contribution) to blue (low contribution).

Figure 12. Indicative CAMs for the testing images of the Saint Paisios class with more complex architectures: (a) VGG19; (b) DenseNet201; (c) EfficientNet. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork. Heatmap colors span red (high contribution) to blue (low contribution).

Figure 13. Indicative examples of CAMs for Experiment 3 with DenseNet201 for different test images and classes: (a) Saints Nicholas, Raphael and Irene; (b) Saint George; (c) Saint Demetrios; (d) Saint John the Baptist; (e) Saint Nicolas; (f) Saint Paisios; (g) Jesus Christ; (h) Mother of God and Jesus Christ. Letterings within the icon indicate the saints’ names in Greek and are part of the original artwork. Heatmap colors span red (high contribution) to blue (low contribution).

Table 1. Breakdown of the dataset: number of icons and images per saint/event.

Class	Saint or Event	No. of Icons	No. of Images	Theme	Class	Saint or Event	No. of Icons	No. of Images	Theme
1	Mother of God and Jesus Christ	99	495	Saint	27	Saint Barbara	4	20	Saint
2	Jesus Christ	53	265	Saint	28	Saint Irene	4	20	Saint
3	Saint George	25	125	Saint	29	Saint Antony	4	20	Saint
4	Saint Nicolas	23	115	Saint	30	Saint David in Euboea	4	20	Saint
5	Saints Nicholas, Raphael and Irene	15	75	Saint	31	Saint John Chrysostom	4	20	Saint
6	Saint John the Baptist	13	65	Saint	32	Saint Minas	4	20	Saint
7	Saint Demetrios	12	60	Saint	33	Saint Fanourios	4	20	Saint
8	Saint Athanasios the Great	10	50	Saint	34	Saint Charalambos	4	20	Saint
9	Saint Paisios	10	50	Saint	35	Zoodochos Pigi	4	20	Event
10	Apostle Peter and Apostle Paul	9	45	Saint	36	The Three Holy Hierarchs	4	20	Saint
11	The Nativity of Christ	9	45	Event	37	Saint Catherine	3	15	Saint
12	Prophet Ilias	9	45	Saint	38	Saint Anastasia the Pharmacist	3	15	Saint
13	Saint Panteleimon	7	35	Saint	39	Saint Kyriaki	3	15	Saint
14	Saint Spyridon	7	35	Saint	40	Saints Anargyroi Kosmas and Damianos	3	15	Saint
15	Saint Irene Chrysovalantous	6	30	Saint	41	Saint Gregory Palamas	3	15	Saint
16	Saint Eleftherios	6	30	Saint	42	Saint Ephraim	3	15	Saint
17	Saint Nektarios of Aegina	6	30	Saint	43	Saint John the Theologian	3	15	Saint
18	The Assumption of the Virgin Mary	6	30	Event	44	Saint Parthenios	3	15	Saint
19	Saint Marina	5	25	Saint	45	Saint Tryfon	3	15	Saint
20	Saint Paraskevi	5	25	Saint	46	Apostle Paul	3	15	Saint
21	Saints Constantine and Helen	5	25	Saint	47	Archangel Michael	3	15	Saint
22	Saint George Karslidis	5	25	Saint	48	Matthew the Evangelist	3	15	Saint
23	Saint Stylianos of Paphlagon	5	25	Saint	49	The Crucifixion of Christ	3	15	Event
24	The Resurrection of Christ	5	25	Event	50	The Secret Supper	3	15	Event
25	The Baptism of Christ	5	25	Event	51	The Annunciation of the Virgin Mary	3	15	Event
26	The Archangels Michael and Gabriel	5	25	Saint	52	Saint Eugenia	2	10	Saint
53	Saint Efimia	2	10	Saint	77	Saint Calliope	1	5	Saint
54	Holy Trinity	2	10	Saint	78	Saint Mary of Egypt	1	5	Saint
55	Saints Sophia, Faith, Love and Hope	2	10	Saint	79	Saint Mary Magdalene	1	5	Saint
56	Saints Arsenios and Paisios	2	10	Saint	80	Saint Solomon	1	5	Saint
57	Saints Cyprian and Justine	2	10	Saint	81	Saint Chrisi	1	5	Saint
58	Saint Andrew	2	10	Saint	82	Saints Aquila and Priscilla	1	5	Saint
59	Saint Artemios	2	10	Saint	83	Saints Arsenios, Paisios and Porphyrios	1	5	Saint
60	Saint Basil	2	10	Saint	84	Saints Joachim and Anna	1	5	Saint
61	Saint Gregory the Theologian	2	10	Saint	85	Saints Nicholas, Panteleimon and Tryfon	1	5	Saint
62	Saint Dionysius, Bishop of Aegina	2	10	Saint	86	Saints Athanasios and Cyril	1	5	Saint
63	Saint Efthymios	2	10	Saint	87	Saint Alexander	1	5	Saint
64	Saint Theodore of Tyre	2	10	Saint	88	Saint Ambrose	1	5	Saint
65	Saint Jacob the Brother	2	10	Saint	89	Saint Amphilogios of Patmos	1	5	Saint
66	Agios Jacob Tsalikis	2	10	Saint	90	Saint Vlasios	1	5	Saint
67	Saint Jerome	2	10	Saint	91	Saint Gerasimos the Megalochori	1	5	Saint
68	Saint John the Russian	2	10	Saint	92	Saint Damianos	1	5	Saint
69	Saint Kosmas the Aetolian	2	10	Saint	93	Saint Dionysios of Mount Olympus	1	5	Saint
70	Saint Porphyrios	2	10	Saint	94	Saint Eustathios of Plakidas	1	5	Saint
71	Saint Savvas	2	10	Saint	95	Saint Ephraim the Syrian	1	5	Saint
72	John the Evangelist	2	10	Saint	96	Saint Herakleidios	1	5	Saint
73	Luke the Evangelist	2	10	Saint	97	Saint Theodosios	1	5	Saint
74	Mark the Evangelist	2	10	Saint	98	Saint Therapon Bishop of Cyprus	1	5	Saint
75	The Exaltation of the Holy Cross	2	10	Event	99	Saint Theonas Bishop of Thessaloniki	1	5	Saint
76	Saint Anna	1	5	Saint	100	Saint John of Damascus	1	5	Saint
101	Saint Kirikos	1	5	Saint
102	Saint Kyprianos	1	5	Saint
103	Saint Constantine	1	5	Saint
104	Saint Theodore the Neomartyr	1	5	Saint
105	Saint Nikephoros the Leper	1	5	Saint
106	Saint Seraphim of Sarov	1	5	Saint
107	Saint Simon the Athonite	1	5	Saint
108	Saint Stephen	1	5	Saint
109	Saint Simeon the Stylite	1	5	Saint
110	Saint Sozon	1	5	Saint
111	Saint Triantafyllos	1	5	Saint
112	Saint Philippe	1	5	Saint
113	Saint Christopher	1	5	Saint
114	Apostle Peter	1	5	Saint
115	The Holy Martyrs of Lesvos	1	5	Saint
116	Saint Luke	1	5	Saint
117	Prophet Zechariah	1	5	Saint
118	Prophet Ezekiel	1	5	Saint
119	The Holy Pentecost	1	5	Event
120	The Ascension of Christ	1	5	Event
121	The Vaiophoros	1	5	Event
122	The Savior’s Transfiguration	1	5	Event
123	The Nativity of the Virgin Mary	1	5	Event

Table 2. Parametrization of deep classification models.

HyperParameters	Values
Batch Size	32
Epochs	10
Optimizer	Adam, except NasNetLarge and NasNetMobile (rmsprop)
Weights	Ιmagenet
Target Size	(224 × 224) except NasNetLarge (331 × 331) and Xception (299 × 299)
loss	Categorical crossentropy
Output Layer	Dense (num_classes, activation = ‘softmax’)
Activation	ReLu (hidden) + softmax (output)
Class Mode	Categorical

Table 3. Description of the used data for Experiments 1–3.

Experiment 1		Experiment 2		Experiment 3
Class	No. of Images	Class	No. of Images	Class	No. of Images
Saints Nicholas, Raphael and Irene	45	Saints Nicholas, Raphael and Irene	80	Saints Nicholas, Raphael and Irene	60
Saint Athanasios the Great	45	Saint Athanasios the Great	50	Saint Demetrios	45
Saint Demetrios	45	Saint George	140	Saint George	60
Saint John the Baptist	45	Saint Demetrios	60	Saint John the Baptist	60
Saint Paisios	45	Saint John the Baptist	70	Saint Nicolas	60
Apostle Peter and Apostle Paul	45	Saint Nicolas	115	Saint Paisios	50
		Saint Paisios	55	Jesus Christ	60
		Saint Panteleimon	45	Mother of God and Jesus Christ	60
		Apostle Peter and Apostle Paul	45
		Jesus Christ	215
		Mother of God and Jesus Christ	445
		Prophet Ilias	45
Total	270	Total	1365	Total	455

Table 4. Performance evaluation of models in Experiment 1.

Model	Accuracy	Precision	Recall	F1-Score
DenseNet121	0.83	0.73	0.83	0.77
DenseNet201	0.80	0.87	0.80	0.77
EfficientNet	0.69	0.69	0.70	0.63
InceptionV3	0.83	0.73	0.83	0.77
MobileNet	0.86	0.87	0.86	0.86
MobileNetV2	0.66	0.50	0.66	0.56
NasNetLarge	0.89	0.93	0.90	0.89
NasNetMobile	0.53	0.42	0.53	0.46
ResNet50	0.76	0.68	0.76	0.70
ResNet101	0.86	0.90	0.86	0.83
VGG16	0.80	0.78	0.80	0.78
VGG19	0.89	0.91	0.90	0.90
Xception	0.83	0.73	0.83	0.77

Table 5. Performance evaluation of models in Experiment 2.

Model	Accuracy	Precision	Recall	F1-Score
DenseNet121	0.65	0.47	0.43	0.41
DenseNet201	0.69	0.43	0.46	0.41
EfficientNet	0.73	0.61	0.57	0.55
InceptionV3	0.63	0.46	0.49	0.45
MobileNet	0.78	0.73	0.62	0.64
MobileNetV2	0.72	0.55	0.53	0.52
NasNetLarge	0.65	0.42	0.43	0.42
NasNetMobile	0.57	0.35	0.38	0.35
ResNet50	0.70	0.47	0.47	0.45
ResNet101	0.69	0.47	0.47	0.46
VGG16	0.65	0.41	0.42	0.41
VGG19	0.73	0.54	0.55	0.54
Xception	0.68	0.57	0.54	0.5

Table 6. Performance evaluation of models in Experiment 3.

Model	Accuracy	Precision	Recall	F1-Score
DenseNet121	0.69	0.65	0.70	0.66
DenseNet201	0.85	0.79	0.85	0.81
EfficientNet	0.64	0.69	0.65	0.62
InceptionV3	0.69	0.67	0.70	0.65
MobileNet	0.87	0.92	0.88	0.86
MobileNetV2	0.85	0.87	0.85	0.85
NasNetLarge	0.47	0.40	0.47	0.40
NasNetMobile	0.50	0.44	0.50	0.44
ResNet50	0.67	0.62	0.68	0.62
ResNet101	0.72	0.83	0.72	0.68
VGG16	0.80	0.72	0.80	0.74
VGG19	0.69	0.74	0.70	0.67
Xception	0.50	0.44	0.50	0.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sidiropoulos, I.I.; Apostolidis, K.D.; Vrochidou, E.; Papakostas, G.A. Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography. Information 2026, 17, 340. https://doi.org/10.3390/info17040340

AMA Style

Sidiropoulos II, Apostolidis KD, Vrochidou E, Papakostas GA. Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography. Information. 2026; 17(4):340. https://doi.org/10.3390/info17040340

Chicago/Turabian Style

Sidiropoulos, Ilias I., Kyriakos D. Apostolidis, Eleni Vrochidou, and George A. Papakostas. 2026. "Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography" Information 17, no. 4: 340. https://doi.org/10.3390/info17040340

APA Style

Sidiropoulos, I. I., Apostolidis, K. D., Vrochidou, E., & Papakostas, G. A. (2026). Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography. Information, 17(4), 340. https://doi.org/10.3390/info17040340

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computer Vision in Spiritual Seeing: Recognition of Christian Saints in Orthodox Iconography

Abstract

1. Introduction

2. Christian Orthodox Iconography

3. Materials and Methods

3.1. Proposed Methodology

3.2. Dataset

3.3. Classification Models

3.4. Evaluation

3.4.1. Quantitative Evaluation Through Performance Metrics and Confusion Matrices

3.4.2. Qualitative Evaluation Through Class Attention Maps

4. Experimental Setup

5. Results and Discussion

5.1. Quantitative Results

5.1.1. Experiment 1

5.1.2. Experiment 2

5.1.3. Experiment 3

5.2. Qualitative Results

5.3. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI