Greek Sign Language Detection with Artificial Intelligence
Abstract
1. Introduction
2. Related Work
3. System Architecture
- (a)
- Input module: This is the mobile part of the system. It records video stream using Raspberry Pi 4 connected to an IP camera.
- (b)
- Processing unit: Executes a deep learning model based on YOLO11X-seg to recognize static and dynamic gestures. This unit can be a high-computational-power platform with a GPU capable of executing large, trained sign language models with high speed. Smaller models can be executed on laptops or embedded computers. Thus, the system architecture is scalable and extensible according to the application.
- (c)
- Output module: Displays the detected letter or word on a Graphical User Interface (GUI).
3.1. System Platforms and Components
- Ultralytics is used to train the YOLO11X-seg model with our data. It also supports the execution of the model within the Python program.
- Pytorch enables the training of the model and inference execution using the GPU instead of the CPU, which would result in a slower operation.
- OpenCV is required for recording videos, editing images, drawing bounding boxes, and displaying the results.
- Pandas is needed to handle the data in tables; it converts the YOLO11X-seg output into DataFrames for easier processing.
- Numpy handles arithmetic operations, arrays, and mathematical calculations.
- Pillow handles image editing, including converting text into images and using custom fonts as required to display Greek characters using the appropriate Greek fonts.
3.2. System Implementation Workflow
3.3. Brief Description of YOLO11X-seg
4. Dataset Creation
Labeling Images
5. Experimental Setup and Results
5.1. Training YOLO11X-seg Details
5.2. Analysis of Handform Success Rates in Different Light and Background Conditions
5.3. Performance Evaluation of the Greek Sign Language Recognition System
- True Positive (TP) is the number of images correctly recognized as belonging to a class.
- True Negative (TN) is the images correctly recognized as not belonging to a class.
- False Positive (FP) (Type I Error) is the images that are falsely recognized to belong to a class.
- False Negative (FN) (Type II Error) is the images falsely excluded from the class they belong.
5.4. Comparison with Related Works—Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Cooper, H.; Ong, E.-J.; Pugeault, N.; Bowden, R. Sign Language Recognition Using Sub-Units. J. Mach. Learn. Res. 2012, 13, 2205–2231. [Google Scholar] [CrossRef]
- Ben Haj Amor, A.; El Ghoul, O.; Jemni, M. Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review. Sensors 2023, 23, 8343. [Google Scholar] [CrossRef]
- Filipowska, A.; Filipowski, W.; Mieszczanin, J.; Bryzik, K.; Henkel, M.; Skwarek, E.; Raif, P.; Sieciński, S.; Doniec, R.; Mika, B.; et al. Pattern Recognition in the Processing of Electromyographic Signals for Selected Expressions of Polish Sign Language. Sensors 2024, 24, 6710. [Google Scholar] [CrossRef]
- Umut, İ.; Kumdereli, Ü.C. Novel Wearable System to Recognize Sign Language in Real Time. Sensors 2024, 24, 4613. [Google Scholar] [CrossRef] [PubMed]
- Liang, Y.; Jettanasen, C.; Chiradeja, P. Progression Learning Convolution Neural Model-Based Sign Language Recognition Using Wearable Glove Devices. Computation 2024, 12, 72. [Google Scholar] [CrossRef]
- Buttar, A.M.; Ahmad, U.; Gumaei, A.H.; Assiri, A.; Akbar, M.A.; Alkhamees, B.F. Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs. Mathematics 2023, 11, 3729. [Google Scholar] [CrossRef]
- Huang, J.; Chouvatut, V. Video-Based Sign Language Recognition via ResNet and LSTM Network. J. Imaging 2024, 10, 149. [Google Scholar] [CrossRef]
- Alsharif, B.; Altaher, A.S.; Altaher, A.; Ilyas, M.; Alalwany, E. Deep Learning Technology to Recognize American Sign Language Alphabet. Sensors 2023, 23, 7970. [Google Scholar] [CrossRef] [PubMed]
- Kondo, T.; Narumi, S.; He, Z.; Shin, D.; Kang, Y. A Performance Comparison of Japanese Sign Language Recognition with ViT and CNN Using Angular Features. Appl. Sci. 2024, 14, 3228. [Google Scholar] [CrossRef]
- Noor, T.H.; Noor, A.; Alharbi, A.F.; Faisal, A.; Alrashidi, R.; Alsaedi, A.S.; Alharbi, G.; Alsanoosy, T.; Alsaeedi, A. Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model. Sensors 2024, 24, 3683. [Google Scholar] [CrossRef] [PubMed]
- Kumari, D.; Anand, R.S. Isolated Video-Based Sign Language Recognition Using a Hybrid CNN-LSTM Framework Based on Attention Mechanism. Electronics 2024, 13, 1229. [Google Scholar] [CrossRef]
- Akdag, A.; Baykan, O.K. Enhancing Signer-Independent Recognition of Isolated Sign Language through Advanced Deep Learning Techniques and Feature Fusion. Electronics 2024, 13, 1188. [Google Scholar] [CrossRef]
- Lu, C.; Kozakai, M.; Jing, L. Sign Language Recognition with Multimodal Sensors and Deep Learning Methods. Electronics 2023, 12, 4827. [Google Scholar] [CrossRef]
- Gu, Y.; Oku, H.; Todoh, M. American Sign Language Recognition and Translation Using Perception Neuron Wearable Inertial Motion Capture System. Sensors 2024, 24, 453. [Google Scholar] [CrossRef]
- Wang, Y.; Jiang, H.; Sun, Y.; Xu, L. A Static Sign Language Recognition Method Enhanced with Self-Attention Mechanisms. Sensors 2024, 24, 6921. [Google Scholar] [CrossRef] [PubMed]
- Pu, M.; Lim, M.K.; Chong, C.Y. Siformer: Feature-Isolated Transformer for Efficient Skeleton-Based Sign Language Recognition. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 9387–9396. [Google Scholar] [CrossRef]
- Li, R.; Meng, L. Multi-View Spatial-Temporal Network for Continuous Sign Language Recognition. arXiv 2022. [Google Scholar] [CrossRef]
- Zuo, R.; Wei, F.; Mak, B. Towards Online Continuous Sign Language Recognition and Translation. arXiv 2024. [Google Scholar] [CrossRef]
- Srivastava, S.; Singh, S.; Pooja; Prakash, S. Continuous Sign Language Recognition System Using Deep Learning with MediaPipe Holistic. arXiv 2024. [Google Scholar] [CrossRef]
- Madhiarasan, M.; Roy, P.P. A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets. arXiv 2022. [Google Scholar] [CrossRef]
- Awaluddin, B.-A.; Chao, C.-T.; Chiou, J.-S. A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning. Mathematics 2024, 12, 1393. [Google Scholar] [CrossRef]
- Aldahir, R.; Grau, R.R. Using Convolutional Neural Networks for Visual Sign Language Recognition: Towards a System That Provides Instant Feedback to Learners of Sign Language. In Proceedings of the 21st International Web for All Conference, Singapore, 13–14 May 2024; pp. 70–74. [Google Scholar] [CrossRef]
- Sharma, A.; Guo, D.; Parmar, A.; Ge, J.; Li, H. Promoting Sign Language Awareness: A Deep Learning Web Application for Sign Language Recognition. In Proceedings of the 2024 8th International Conference on Deep Learning Technologies, Suzhou, China, 15–17 July 2024; pp. 22–28. [Google Scholar] [CrossRef]
- Bhadouria, A.; Bindal, P.; Khare, N.; Singh, D.; Verma, A. LSTM-Based Recognition of Sign Language. In Proceedings of the 2024 Sixteenth International Conference on Contemporary Computing, Noida, India, 8–10 August 2024; pp. 508–514. [Google Scholar] [CrossRef]
- Alayed, A. Machine Learning and Deep Learning Approaches for Arabic Sign Language Recognition: A Decade Systematic Literature Review. Sensors 2024, 24, 7798. [Google Scholar] [CrossRef] [PubMed]
- Borges-Galindo, E.A.; Morales-Ramírez, N.; González-Lee, M.; García-Martínez, J.R.; Nakano-Miyatake, M.; Perez-Meana, H. Sign Language Interpreting System Using Recursive Neural Networks. Appl. Sci. 2024, 14, 8560. [Google Scholar] [CrossRef]
- Antad, S.M.; Chakrabarty, S.; Bhat, S.; Bisen, S.; Jain, S. Sign Language Translation Across Multiple Languages. In Proceedings of the IEEE 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC), Bhubaneswar, India, 9–10 February 2024; pp. 741–746. [Google Scholar] [CrossRef]
- Priyadharshini, D.S.; Anandraj, R.; Prasath, K.R.G.; Manogar, S.A.F. A Comprehensive Application for Sign Language Alphabet and World Recognition, Text-to-Action Conversion for Learners, Multi-Language Support and Integrated Voice Output Functionality. In Proceedings of the IEEE 2024 International Conference on Science Technology Engineering and Management (ICSTEM), Coimbatore, India, 26–27 April 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Ultralytics. YOLO11 Documentation. 2025. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 28 July 2025).
- Dataset Used in This Article. Available online: https://drive.google.com/drive/folders/1TVxQ6NtMGzTIhgr5rTOerL1RwtF9JksA?usp=sharing (accessed on 28 July 2025).
Step | Description |
---|---|
1 | Capturing handforms in photos |
2 | Annotation using the LabelMe application |
3 | Creating the environment using Miniconda |
4 | Training the YOLO11X-seg model |
5 | Deploying Raspberry Pi 4 |
6 | Establishing private network with Zerotier |
7 | Streaming display with Moonlight app |
8 | Generating Python code for integration |
Dataset Specification | Value |
---|---|
Image resolution | 1280 × 720, 1912 × 1072, 1920 × 1080 |
Image extension | PNG |
Total images in dataset | 1669 |
Total classes (letters or words) | 28 |
Images in dataset per class | 30–60 |
Image size | 350–1700 KB |
Functional Domain | Operational Role | Instantiated Configurations | Parameters |
---|---|---|---|
Training Configuration | This encompasses parameters that regulate the learning process, such as optimization strategies, resource allocation, and training duration. These configurations have a direct impact on model convergence, training stability, and overall performance throughout the training phase. | optimization strategy | AdamW—selected automatically |
resource allocation | batch = 8 imgsz = 640 | ||
training duration | epochs = 100 | ||
precision management | Amp = Disable (because of GPU) | ||
determinism | deterministic = True seed = 0 | ||
weight regularization | weight_decay = 0.0005 | ||
Data Augmentation | This term refers to the suite of transformations implemented on input data throughout the training process to enhance dataset diversity artificially. Such augmentations serve to strengthen model robustness and generalization by replicating real-world variability. | augmentation status | Augmentation status = disabled |
auto augmentation policy | Auto augmentation = randaugment | ||
geometric transformations | scale = 0.5 translate = 0.1 degrees = 0.0 shear = 0.0 | ||
color space variation | hsv_h = 0.015 hsv_s = 0.7 hsv_v = 0.4 | ||
mixing strategies | mosaic = 1.0 mixup = 0.0 cutmix = 0.0 copy_paste = 0.0 | ||
Validation/Testing | This encompasses parameters used during model evaluation to objectively assess performance on novel datasets. These parameters include thresholds, evaluation modes, dataset splits, and logging mechanisms, all of which are essential for rigorous and impartial performance measurement. | evaluation split | split = val |
IoU threshold | iou = 0.7 | ||
confidence threshold | conf = None | ||
non-maximum suppression | agnostic_nms = False nms = False | ||
result visualization | plots = True | ||
Inference | The configuration includes parameters that manage model behavior during prediction. These parameters regulate input processing, hardware allocation, output post-processing, and performance optimization for both real-time and batch inference contexts. | task specification | task = segment |
model file | model = yolo11x-seg.pt | ||
max detections | max_det = 300 | ||
precision mode | half = False | ||
device allocation | device = None | ||
export format | format = torchscript | ||
Visualization | This specifies the parameters for rendering, displaying, and saving annotated outputs, such as bounding boxes and labels. These configurations enhance interpretability, facilitate debugging, and support qualitative evaluation of model results. | display elements | show = True show_boxes = True show_labels = True show_conf = True |
save options | save = True save_crop = False save_txt = False save_conf = False | ||
rendering customization | line_width = None | ||
Meta/Control | This section outlines high-level operational directives that establish the task context—such as detection, segmentation—and specify the runtime mode, including training, validation, or inference. The parameters detailed herein determine which subsystem or configuration branch will be engaged. | operation mode | mode = train |
task type | task = segment | ||
run identification | name = train59 project = None | ||
tracker config | tracker = botsort.yaml |
Letters/ Words | White Background/Bright Lighting | Black Background/Bright Lighting | White Background/Low Light | Black Background/Low Light |
---|---|---|---|---|
Greek Alphabet | ||||
A | 30/30 | 30/30 | 30/30 | 28/30 |
Β | 30/30 | 30/30 | 30/30 | 30/30 |
Γ | 30/30 | 30/30 | 30/30 | 30/30 |
Δ | 30/30 | 30/30 | 30/30 | 30/30 |
Ε | 30/30 | 30/30 | 30/30 | 29/30 |
Ζ | 30/30 | 30/30 | 30/30 | 30/30 |
H | 30/30 | 30/30 | 30/30 | 30/30 |
Θ | 30/30 | 30/30 | 30/30 | 30/30 |
Ι | 30/30 | 30/30 | 30/30 | 29/30 |
Κ | 30/30 | 30/30 | 30/30 | 30/30 |
Λ | 30/30 | 30/30 | 29/30 | 28/30 |
Μ | 30/30 | 29/30 | 28/30 | 29/30 |
Ν | 30/30 | 29/30 | 29/30 | 28/30 |
Ξ | 30/30 | 30/30 | 30/30 | 29/30 |
O | 30/30 | 30/30 | 30/30 | 30/30 |
Π | 30/30 | 30/30 | 29/30 | 29/30 |
Ρ | 30/30 | 30/30 | 30/30 | 30/30 |
Σ | 30/30 | 30/30 | 30/30 | 30/30 |
Τ | 30/30 | 30/30 | 30/30 | 29/30 |
Υ | 29/30 | 28/30 | 28/30 | 27/30 |
Φ | 30/30 | 30/30 | 30/30 | 29/30 |
Χ | 30/30 | 30/30 | 30/30 | 29/30 |
Ψ | 30/30 | 30/30 | 30/30 | 29/30 |
Ω | 30/30 | 30/30 | 30/30 | 30/30 |
Greek Words | ||||
Aεροπλάνο (Airplane) | 30/30 | 30/30 | 30/30 | 30/30 |
Aερόστατο (Hot Air Baloon) | 30/30 | 30/30 | 30/30 | 30/30 |
Aυτοκίνητο (Car) | 30/30 | 30/30 | 30/30 | 30/30 |
Ελικόπτερο (Helicopter) | 30/30 | 30/30 | 30/30 | 30/30 |
Background | Succes Rate (%) |
---|---|
White background—bright lighting | 99.88 |
Black background—bright lighting | 99.52 |
White background—low lighting | 99.17 |
Black background—low lighting | 97.86 |
Average success rate | 99.11 |
Letters/ Words | Woman White Background/Bright Lighting | Man White Background/Bright Lighting | Woman Outdoors Background/ Bright Lighting | Man Outdoors Background/ Bright Lighting |
---|---|---|---|---|
Alphabet | ||||
A | 30/30 | 30/30 | 29/30 | 28/30 |
Β | 30/30 | 30/30 | 30/30 | 30/30 |
Γ | 30/30 | 30/30 | 30/30 | 30/30 |
Δ | 30/30 | 30/30 | 30/30 | 30/30 |
Ε | 30/30 | 30/30 | 29/30 | 30/30 |
Ζ | 30/30 | 30/30 | 30/30 | 30/30 |
H | 30/30 | 30/30 | 30/30 | 30/30 |
Θ | 30/30 | 30/30 | 30/30 | 30/30 |
Ι | 30/30 | 30/30 | 29/30 | 29/30 |
Κ | 30/30 | 30/30 | 30/30 | 30/30 |
Λ | 30/30 | 30/30 | 28/30 | 28/30 |
Μ | 29/30 | 29/30 | 29/30 | 29/30 |
Ν | 28/30 | 29/30 | 29/30 | 30/30 |
Ξ | 30/30 | 30/30 | 30/30 | 29/30 |
O | 30/30 | 30/30 | 30/30 | 30/30 |
Π | 30/30 | 30/30 | 28/30 | 29/30 |
Ρ | 30/30 | 30/30 | 30/30 | 30/30 |
Σ | 30/30 | 30/30 | 30/30 | 30/30 |
Τ | 30/30 | 30/30 | 29/30 | 29/30 |
Υ | 28/30 | 28/30 | 27/30 | 27/30 |
Φ | 30/30 | 30/30 | 30/30 | 29/30 |
Χ | 30/30 | 30/30 | 30/30 | 30/30 |
Ψ | 30/30 | 30/30 | 29/30 | 29/30 |
Ω | 30/30 | 30/30 | 30/30 | 30/30 |
Greek Words | ||||
Aεροπλάνο (Airplane) | 30/30 | 30/30 | 30/30 | 30/30 |
Aερόστατο (Hot Air Baloon) | 30/30 | 30/30 | 30/30 | 30/30 |
Aυτοκίνητο (Car) | 30/30 | 30/30 | 30/30 | 30/30 |
Ελικόπτερο (Helicopter) | 30/30 | 30/30 | 30/30 | 30/30 |
Background | Succes Rate (%) |
---|---|
Man or Woman, White background/bright lighting | 99.64 |
Man or Woman, Outdoors background/ bright lighting | 98.33 |
Average success rate | 98.98 |
Class Name | Average Precision (AP) | True Positive (TP) | False Positive (FP) | False Negative (FN) |
---|---|---|---|---|
Greek Alphabet | ||||
A | 98.00% | 245 | 0 | 5 |
Β | 100.00% | 250 | 0 | 0 |
Γ | 100.00% | 250 | 0 | 0 |
Δ | 100.00% | 250 | 0 | 0 |
Ε | 99.20% | 248 | 2 | 0 |
Ζ | 100.00% | 250 | 0 | 0 |
H | 100.00% | 250 | 0 | 0 |
Θ | 100.00% | 250 | 0 | 0 |
Ι | 98.80% | 247 | 1 | 2 |
Κ | 100.00% | 250 | 0 | 0 |
Λ | 97.20% | 243 | 4 | 3 |
Μ | 96.80% | 242 | 4 | 4 |
Ν | 97.20% | 243 | 5 | 2 |
Ξ | 99.20% | 248 | 0 | 2 |
O | 100.00% | 250 | 0 | 0 |
Π | 98.00% | 245 | 3 | 2 |
Ρ | 100.00% | 250 | 0 | 0 |
Σ | 100.00% | 250 | 0 | 0 |
Τ | 98.80% | 247 | 2 | 1 |
Υ | 92.80% | 232 | 15 | 3 |
Φ | 99.20% | 248 | 0 | 2 |
Χ | 100.00% | 249 | 0 | 1 |
Ψ | 98.80% | 247 | 0 | 3 |
Ω | 100.00% | 250 | 0 | 0 |
Greek Words | ||||
Aεροπλάνο (Airplane) | 100.00% | 250 | 0 | 0 |
Aερόστατο (Hot Air Baloon) | 100.00% | 250 | 0 | 0 |
Aυτοκίνητο (Car) | 100.00% | 250 | 0 | 0 |
Ελικόπτερο (Helicopter) | 100.00% | 250 | 0 | 0 |
Overall Precision | 99.48% |
Overall Recall | 99.57% |
Overall F1-score | 99.48% |
Total True Positives (TP) | 6954 |
Total False Positives (FP) | 36 |
Total False Negatives (FN) | 30 |
Map (mean Average Precision) | 99.07% |
Reference | Target Signal Language | Technology Used | Systems Used | Signs | Recognition Time (ms) | Accuracy (%) |
---|---|---|---|---|---|---|
[3] | Polish | EMG, CNN | BIOPAC MP36 device, MyoWare 2.0 sensor | 24 | 51,479.8 | 95.53, 98.32 |
[6] | American | LSTM, YOLOv6 | NOT GIVEN | 32 | NOT GIVEN | 92.0, 96.0 |
[7] | Argentine | ResNet—LSTM | Intel (R) Xeon (R) CPU E5-2620 v4@2.10 GHz CPU, 16 GB of RAM, NVIDIA GeForce GTX 1080 Ti GPU 4 × 4 | 41 | NOT GIVEN | 86.25 |
[8] | American | AlexNet, ConvNeXt, EfficientNet, ResNet-50, VisionTransformer | NOT GIVEN | 26 | NOT GIVEN | 99.98, 99.95, 99.51, 99.50, 88.59 |
[9] | Japanese | ViT, CNN | NOT GIVEN | 46 | NOT GIVEN | 99.7, 99.3 |
[10] | Arabic | CNN, LTSM | Google Cloud, Ubuntu 20.04, CentOS Linux, Intel Xeon Platinum 8481C Processor, RAM 16 GB, Disk size 32 TB | 20 | NOT GIVEN | 94.4, 82.7 |
[11] | American | CNN—LTSM with MobileNetV2 | Intel(R) Core (TM) i7, NVIDIA GTX 1060 GPU, 16 GB RAM | 100 | 30.2 | 84.65 |
[12] | Turkish, Argentine | R3(2 + 1)D-SLR | NOT GIVEN | 744 | 116 | 94.52, 98.53 |
[13] | Japanese | CNN-BiLSTM | NOT GIVEN | 78 | NOT GIVEN | 84.13 |
[19] | Indian | LSTM | NOT GIVEN | 45 | NOT GIVEN | 88.23 |
[22] | British | YOLOv5 | NOT GIVEN | 26 | NOT GIVEN | 70.0 |
[26] | Mexican | RNN | 1270 MHz Apple M1 processor, 8 GB of RAM, 500 GB of storage, and a MacOS Sonoma operative system | 20 | NOT GIVEN | 93.0 |
GSL system | Greek | YOLO11X-seg | Intel I7 9700K, Nvidia Gtx 1660 super GPU, 16 GB RAM | 28 | 42.7 ms | 99.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Panopoulos, I.; Topalis, E.; Petrellis, N.; Hadellis, L. Greek Sign Language Detection with Artificial Intelligence. Electronics 2025, 14, 3241. https://doi.org/10.3390/electronics14163241
Panopoulos I, Topalis E, Petrellis N, Hadellis L. Greek Sign Language Detection with Artificial Intelligence. Electronics. 2025; 14(16):3241. https://doi.org/10.3390/electronics14163241
Chicago/Turabian StylePanopoulos, Ioannis, Evangelos Topalis, Nikos Petrellis, and Loukas Hadellis. 2025. "Greek Sign Language Detection with Artificial Intelligence" Electronics 14, no. 16: 3241. https://doi.org/10.3390/electronics14163241
APA StylePanopoulos, I., Topalis, E., Petrellis, N., & Hadellis, L. (2025). Greek Sign Language Detection with Artificial Intelligence. Electronics, 14(16), 3241. https://doi.org/10.3390/electronics14163241