MDPI - Publisher of Open Access Journals

15 pages, 6740 KiB

Open AccessEditor’s ChoiceArticle

Modulation Format Recognition Scheme Based on Discriminant Network in Coherent Optical Communication System

by Fangxu Yang, Qinghua Tian, Xiangjun Xin, Yiqun Pan, Fu Wang, José Antonio Lázaro, Josep M. Fàbrega, Sitong Zhou, Yongjun Wang and Qi Zhang

Electronics 2024, 13(19), 3833; https://doi.org/10.3390/electronics13193833 - 28 Sep 2024

Viewed by 1135

Abstract

In this paper, we skillfully utilize the discriminative ability of the discriminator to construct a conditional generative adversarial network, and propose a scheme that uses few symbols to achieve high accuracy recognition of modulation formats under low signal-to-noise ratio conditions in coherent optical [...] Read more.

In this paper, we skillfully utilize the discriminative ability of the discriminator to construct a conditional generative adversarial network, and propose a scheme that uses few symbols to achieve high accuracy recognition of modulation formats under low signal-to-noise ratio conditions in coherent optical communication. In the one thousand kilometres G.654E optical fiber transmission system, transmission experiments are conducted on the PDM-QPSK/-8PSK/-16QAM/-32QAM/-64QAM modulation format at 8G/16G/32G baud rates, and the signal-to-noise ratio parameters are traversed under experimental conditions. As a key technology in the next-generation elastic optical networks, the modulation format recognition scheme proposed in this paper achieves 100% recognition of the above five modulation formats without distinguishing signal transmission rates. The optical signal-to-noise ratio thresholds required to achieve 100% recognition accuracy are 12.4 dB, 14.3 dB, 15.4 dB, 16.2 dB, and 17.3 dB, respectively. Full article

(This article belongs to the Special Issue Advances in Optical Communication and Optical Computing)

► Show Figures

Figure 1

26 pages, 12966 KiB

Open AccessArticle

Optical Medieval Music Recognition—A Complete Pipeline for Historic Chants

by Alexander Hartelt, Tim Eipert and Frank Puppe

Appl. Sci. 2024, 14(16), 7355; https://doi.org/10.3390/app14167355 - 20 Aug 2024

Cited by 2 | Viewed by 1339

Abstract

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR [...] Read more.

Manual transcription of music is a tedious work, which can be greatly facilitated by optical music recognition (OMR) software. However, OMR software is error prone in particular for older handwritten documents. This paper introduces and evaluates a pipeline that automates the entire OMR workflow in the context of the Corpus Monodicum project, enabling the transcription of historical chants. In addition to typical OMR tasks such as staff line detection, layout detection, and symbol recognition, the rarely addressed tasks of text and syllable recognition and assignment of syllables to symbols are tackled. For quantitative and qualitative evaluation, we use documents written in square notation developed in the 11th–12th century, but the methods apply to many other notations as well. Quantitative evaluation measures the number of necessary interventions for correction, which are about 0.4% for layout recognition including the division of text in chants, 2.4% for symbol recognition including pitch and reading order and 2.3% for syllable alignment with correct text and symbols. Qualitative evaluation showed an efficiency gain compared to manual transcription with an elaborate tool by a factor of about 9. In a second use case with printed chants in similar notation from the “Graduale Synopticum”, the evaluation results for symbols are much better except for syllable alignment indicating the difficulty of this task. Full article

(This article belongs to the Special Issue Machine Learning in Audio Signal Processing and Music Information Retrieval)

► Show Figures

Figure 1

13 pages, 3017 KiB

Open AccessArticle

Comparative Analysis of Telepresence Robots’ Video Performance: Evaluating Camera Capabilities for Remote Teaching and Learning

by Aleksei Talisainen, Janika Leoste and Sirje Virkus

Appl. Sci. 2024, 14(1), 233; https://doi.org/10.3390/app14010233 - 27 Dec 2023

Cited by 3 | Viewed by 2509

Abstract

The COVID-19 outbreak demonstrated the viability of various remote working solutions, telepresence robots (TPRs) being one of them. High-quality video transmission is one of the cornerstones of using such solutions, as most of the information about the environment is acquired through vision. This [...] Read more.

The COVID-19 outbreak demonstrated the viability of various remote working solutions, telepresence robots (TPRs) being one of them. High-quality video transmission is one of the cornerstones of using such solutions, as most of the information about the environment is acquired through vision. This study aims to compare the camera capabilities of four models of popular telepresence robots using compact reduced LogMAR and Snellen optometry charts as well as text displayed on a projector screen. The symbols from the images are extracted using the Google Vision OCR (Optical Character Recognition) software, and the results of the recognition are compared with the symbols on the charts. Double 3 TPR provides the best quality images of optometric charts, but the OCR results of measurements of the image on the projector do not show the clear advantage of one single model over the others. The results demonstrated by Temi 2 and Double 3 TPRs are generally better than the others, suggesting that these TPRs are more feasible to be used in teaching and learning scenarios. Full article

(This article belongs to the Special Issue Advanced Robotics and Mechatronics)

► Show Figures

Figure 1

19 pages, 20258 KiB

Open AccessArticle

Design of a Semantic Understanding System for Optical Staff Symbols

by Fengbin Lou, Yaling Lu and Guangyu Wang

Appl. Sci. 2023, 13(23), 12627; https://doi.org/10.3390/app132312627 - 23 Nov 2023

Viewed by 1526

Abstract

Symbolic semantic understanding of staff images is an important technological support to achieve “intelligent score flipping”. Due to the complex composition of staff symbols and the strong semantic correlation between symbol spaces, it is difficult to understand the pitch and duration of each [...] Read more.

Symbolic semantic understanding of staff images is an important technological support to achieve “intelligent score flipping”. Due to the complex composition of staff symbols and the strong semantic correlation between symbol spaces, it is difficult to understand the pitch and duration of each note when the staff is performed. In this paper, we design a semantic understanding system for optical staff symbols. The system uses the YOLOv5 to implement the optical staff’s low-level semantic understanding stage, which understands the pitch and duration in natural scales and other symbols that affect the pitch and duration. The proposed note encoding reconstruction algorithm is used to implement the high-level semantic understanding stage. Such an algorithm understands the logical, spatial, and temporal relationships between natural scales and other symbols based on music theory and outputs digital codes for the pitch and duration of the main notes during performances. The model is trained with a self-constructed SUSN dataset. Experimental results with YOLOv5 show that the precision is 0.989 and that the recall is 0.972. The system’s error rate is 0.031, and the omission rate is 0.021. The paper concludes by analyzing the causes of semantic understanding errors and offers recommendations for further research. The results of this paper provide a method for multimodal music artificial intelligence applications such as notation recognition through listening, intelligent score flipping, and automatic performance. Full article

(This article belongs to the Special Issue Machine Learning in Audio Signal Processing and Music Information Retrieval)

► Show Figures

Figure 1

20 pages, 925 KiB

Open AccessReview

Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative Exploration

by Everistus Zeluwa Orji, Ali Haydar, İbrahim Erşan and Othmar Othmar Mwambe

Appl. Sci. 2023, 13(22), 12503; https://doi.org/10.3390/app132212503 - 20 Nov 2023

Cited by 2 | Viewed by 4941

Abstract

This paper comprehensively assesses the application of active learning strategies to enhance natural language processing-based optical character recognition (OCR) models for image-to-LaTeX conversion. It addresses the existing limitations of OCR models and proposes innovative practices to strengthen their accuracy. Key components of this [...] Read more.

This paper comprehensively assesses the application of active learning strategies to enhance natural language processing-based optical character recognition (OCR) models for image-to-LaTeX conversion. It addresses the existing limitations of OCR models and proposes innovative practices to strengthen their accuracy. Key components of this study include the augmentation of training data with LaTeX syntax constraints, the integration of active learning strategies, and the employment of active learning feedback loops. This paper first examines the current weaknesses of OCR models with a particular focus on symbol recognition, complex equation handling, and noise moderation. These limitations serve as a framework against which the subsequent research methodologies are assessed. Augmenting the training data with LaTeX syntax constraints is a crucial strategy for improving model precision. Incorporating symbol relationships, wherein contextual information is considered during recognition, further enriches the error correction. This paper critically examines the application of active learning strategies. The active learning feedback loop leads to progressive improvements in accuracy. This article underlines the importance of uncertainty and diversity sampling in sample selection, ensuring that the dynamic learning process remains efficient and effective. Appropriate evaluation metrics and ensemble techniques are used to improve the operational learning effectiveness of the OCR model. These techniques allow the model to adapt and perform more effectively in diverse application domains, further extending its utility. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence and Computer Vision for Detection and Analysis)

► Show Figures

Figure 1

17 pages, 1195 KiB

Open AccessArticle

Kernel Density Estimation and Convolutional Neural Networks for the Recognition of Multi-Font Numbered Musical Notation

by Qi Wang, Li Zhou and Xin Chen

Electronics 2022, 11(21), 3592; https://doi.org/10.3390/electronics11213592 - 3 Nov 2022

Cited by 4 | Viewed by 2449

Abstract

Optical music recognition (OMR) refers to converting musical scores into digitized information using electronics. In recent years, few types of OMR research have involved numbered musical notation (NMN). The existing NMN recognition algorithm is difficult to deal with because the numbered notation font [...] Read more.

Optical music recognition (OMR) refers to converting musical scores into digitized information using electronics. In recent years, few types of OMR research have involved numbered musical notation (NMN). The existing NMN recognition algorithm is difficult to deal with because the numbered notation font is changing. In this paper, we made a multi-font NMN dataset. Using the presented dataset, we use kernel density estimation with proposed bar line criteria to measure the relative height of symbols, and an accurate separation of melody lines and lyrics lines in musical notation is achieved. Furthermore, we develop a structurally improved convolutional neural network (CNN) to classify the symbols in melody lines. The proposed neural network performs hierarchical processing of melody lines according to the symbol arrangement rules of NMN and contains three parallel small CNNs called Arcnet, Notenet and Linenet. Each of them adds a spatial pyramid pooling layer to adapt to the diversity of symbol sizes and styles. The experimental results show that our algorithm can accurately detect melody lines. Taking the average accuracy rate of identifying various symbols as the recognition rate, the improved neural networks reach a recognition rate of 95.5%, which is 8.5% higher than the traditional convolutional neural networks. Through audio comparison and evaluation experiments, we find that the generated audio maintains a high similarity to the original audio of the NMN. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 3340 KiB

Open AccessArticle

Exploiting the Two-Dimensional Nature of Agnostic Music Notation for Neural Optical Music Recognition

by María Alfaro-Contreras and Jose J. Valero-Mas

Appl. Sci. 2021, 11(8), 3621; https://doi.org/10.3390/app11083621 - 17 Apr 2021

Cited by 20 | Viewed by 3038

Abstract

State-of-the-art Optical Music Recognition (OMR) techniques follow an end-to-end or holistic approach, i.e., a sole stage for completely processing a single-staff section image and for retrieving the symbols that appear therein. Such recognition systems are characterized by not requiring an exact alignment between [...] Read more.

State-of-the-art Optical Music Recognition (OMR) techniques follow an end-to-end or holistic approach, i.e., a sole stage for completely processing a single-staff section image and for retrieving the symbols that appear therein. Such recognition systems are characterized by not requiring an exact alignment between each staff and their corresponding labels, hence facilitating the creation and retrieval of labeled corpora. Most commonly, these approaches consider an agnostic music representation, which characterizes music symbols by their shape and height (vertical position in the staff). However, this double nature is ignored since, in the learning process, these two features are treated as a single symbol. This work aims to exploit this trademark that differentiates music notation from other similar domains, such as text, by introducing a novel end-to-end approach to solve the OMR task at a staff-line level. We consider two Convolutional Recurrent Neural Network (CRNN) schemes trained to simultaneously extract the shape and height information and to propose different policies for eventually merging them at the actual neural level. The results obtained for two corpora of monophonic early music manuscripts prove that our proposal significantly decreases the recognition error in figures ranging between 14.4% and 25.6% in the best-case scenarios when compared to the baseline considered. Full article

(This article belongs to the Special Issue Advances in Music Reading Systems)

► Show Figures

Figure 1

18 pages, 9041 KiB

Open AccessArticle

Usage of Real Time Machine Vision in Rolling Mill

by Jiří David, Pavel Švec, Vít Pasker and Romana Garzinová

Sustainability 2021, 13(7), 3851; https://doi.org/10.3390/su13073851 - 31 Mar 2021

Cited by 15 | Viewed by 3707

Abstract

This article deals with the issue of computer vision on a rolling mill. The main goal of this article is to describe the designed and implemented algorithm for the automatic identification of the character string of billets on the rolling mill. The algorithm [...] Read more.

This article deals with the issue of computer vision on a rolling mill. The main goal of this article is to describe the designed and implemented algorithm for the automatic identification of the character string of billets on the rolling mill. The algorithm allows the conversion of image information from the front of the billet, which enters the rolling process, into a string of characters, which is further used to control the technological process. The purpose of this identification is to prevent the input pieces from being confused because different parameters of the rolling process are set for different pieces. In solving this task, it was necessary to design the optimal technical equipment for image capture, choose the appropriate lighting, search for text and recognize individual symbols, and insert them into the control system. The research methodology is based on the empirical-quantitative principle, the basis of which is the analysis of experimentally obtained data (photographs of billet faces) in real operating conditions leading to their interpretation (transformation into the shape of a digital chain). The first part of the article briefly describes the billet identification system from the point of view of technology and hardware resources. The next parts are devoted to the main parts of the algorithm of automatic identification—optical recognition of strings and recognition of individual characters of the chain using artificial intelligence. The method of optical character recognition using artificial neural networks is the basic algorithm of the system of automatic identification of billets and eliminates ambiguities during their further processing. Successful implementation of the automatic inspection system will increase the share of operation automation and lead to ensuring automatic inspection of steel billets according to the production plan. This issue is related to the trend of digitization of individual technological processes in metallurgy and also to the social sustainability of processes, which means the elimination of human errors in the management of the billet rolling process. Full article

(This article belongs to the Special Issue Green ICT, Artificial Intelligence and Smart Cities)

► Show Figures

Figure 1

26 pages, 6378 KiB

Open AccessArticle

A Digitization and Conversion Tool for Imaged Drawings to Intelligent Piping and Instrumentation Diagrams (P&ID)

by Sung-O Kang, Eul-Bum Lee and Hum-Kyung Baek

Energies 2019, 12(13), 2593; https://doi.org/10.3390/en12132593 - 5 Jul 2019

Cited by 65 | Viewed by 23492

Abstract

In the Fourth Industrial Revolution, artificial intelligence technology and big data science are emerging rapidly. To apply these informational technologies to the engineering industries, it is essential to digitize the data that are currently archived in image or hard-copy format. For previously created [...] Read more.

In the Fourth Industrial Revolution, artificial intelligence technology and big data science are emerging rapidly. To apply these informational technologies to the engineering industries, it is essential to digitize the data that are currently archived in image or hard-copy format. For previously created design drawings, the consistency between the design products is reduced in the digitization process, and the accuracy and reliability of estimates of the equipment and materials by the digitized drawings are remarkably low. In this paper, we propose a method and system of automatically recognizing and extracting design information from imaged piping and instrumentation diagram (P&ID) drawings and automatically generating digitized drawings based on the extracted data by using digital image processing techniques such as template matching and sliding window method. First, the symbols are recognized by template matching and extracted from the imaged P&ID drawing and registered automatically in the database. Then, lines and text are recognized and extracted from in the imaged P&ID drawing using the sliding window method and aspect ratio calculation, respectively. The extracted symbols for equipment and lines are associated with the attributes of the closest text and are stored in the database in neutral format. It is mapped with the predefined intelligent P&ID information and transformed to commercial P&ID tool formats with the associated information stored. As illustrated through the validation case studies, the intelligent digitized drawings generated by the above automatic conversion system, the consistency of the design product is maintained, and the problems experienced with the traditional and manual P&ID input method by engineering companies, such as time consumption, missing items, and misspellings, are solved through the final fine-tune validation process. Full article

(This article belongs to the Special Issue Applications of Engineering Digitalization and Construction IT for Energy Projects)

► Show Figures

Figure 1

28 pages, 13418 KiB

Open AccessArticle

Staff, Symbol and Melody Detection of Medieval Manuscripts Written in Square Notation Using Deep Fully Convolutional Networks

by Christoph Wick, Alexander Hartelt and Frank Puppe

Appl. Sci. 2019, 9(13), 2646; https://doi.org/10.3390/app9132646 - 29 Jun 2019

Cited by 9 | Viewed by 5525

Abstract

Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval [...] Read more.

Even today, the automatic digitisation of scanned documents in general, but especially the automatic optical music recognition (OMR) of historical manuscripts, still remains an enormous challenge, since both handwritten musical symbols and text have to be identified. This paper focuses on the Medieval so-called square notation developed in the 11th–12th century, which is already composed of staff lines, staves, clefs, accidentals, and neumes that are roughly spoken connected single notes. The aim is to develop an algorithm that captures both the neumes, and in particular its melody, which can be used to reconstruct the original writing. Our pipeline is similar to the standard OMR approach and comprises a novel staff line and symbol detection algorithm based on deep Fully Convolutional Networks (FCN), which perform pixel-based predictions for either staff lines or symbols and their respective types. Then, the staff line detection combines the extracted lines to staves and yields an

F_{1}

-score of over 99% for both detecting lines and complete staves. For the music symbol detection, we choose a novel approach that skips the step to identify neumes and instead directly predicts note components (NCs) and their respective affiliation to a neume. Furthermore, the algorithm detects clefs and accidentals. Our algorithm predicts the symbol sequence of a staff with a diplomatic symbol accuracy rate (dSAR) of about 87%, which includes symbol type and location. If only the NCs without their respective connection to a neume, all clefs and accidentals are of interest, the algorithm reaches an harmonic symbol accuracy rate (hSAR) of approximately 90%. In general, the algorithm recognises a symbol in the manuscript with an

F_{1}

-score of over 96%. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 7786 KiB

Open AccessArticle

State-of-the-Art Model for Music Object Recognition with Deep Learning

by Zhiqing Huang, Xiang Jia and Yifan Guo

Appl. Sci. 2019, 9(13), 2645; https://doi.org/10.3390/app9132645 - 29 Jun 2019

Cited by 44 | Viewed by 7683

Abstract

Optical music recognition (OMR) is an area in music information retrieval. Music object detection is a key part of the OMR pipeline. Notes are used to record pitch and duration and have semantic information. Therefore, note recognition is the core and key aspect [...] Read more.

Optical music recognition (OMR) is an area in music information retrieval. Music object detection is a key part of the OMR pipeline. Notes are used to record pitch and duration and have semantic information. Therefore, note recognition is the core and key aspect of music score recognition. This paper proposes an end-to-end detection model based on a deep convolutional neural network and feature fusion. This model is able to directly process the entire image and then output the symbol categories and the pitch and duration of notes. We show a state-of-the-art recognition model for general music symbols which can get 0.92 duration accurary and 0.96 pitch accuracy . Full article

(This article belongs to the Special Issue Sound and Music Computing -- Music and Interaction)

► Show Figures

Figure 1

21 pages, 5551 KiB

Open AccessArticle

A Baseline for General Music Object Detection with Deep Learning

by Alexander Pacha, Jan Hajič and Jorge Calvo-Zaragoza

Appl. Sci. 2018, 8(9), 1488; https://doi.org/10.3390/app8091488 - 29 Aug 2018

Cited by 53 | Viewed by 14010

Abstract

Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on [...] Read more.

Deep learning is bringing breakthroughs to many computer vision subfields including Optical Music Recognition (OMR), which has seen a series of improvements to musical symbol detection achieved by using generic deep learning models. However, so far, each such proposal has been based on a specific dataset and different evaluation criteria, which made it difficult to quantify the new deep learning-based state-of-the-art and assess the relative merits of these detection models on music scores. In this paper, a baseline for general detection of musical symbols with deep learning is presented. We consider three datasets of heterogeneous typology but with the same annotation format, three neural models of different nature, and establish their performance in terms of a common evaluation standard. The experimental results confirm that the direct music object detection with deep learning is indeed promising, but at the same time illustrates some of the domain-specific shortcomings of the general detectors. A qualitative comparison then suggests avenues for OMR improvement, based both on properties of the detection model and how the datasets are defined. To the best of our knowledge, this is the first time that competing music object detection systems from the machine learning paradigm are directly compared to each other. We hope that this work will serve as a reference to measure the progress of future developments of OMR in music object detection. Full article

(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)

► Show Figures

Figure 1

23 pages, 3681 KiB

Open AccessArticle

End-to-End Neural Optical Music Recognition of Monophonic Scores

by Jorge Calvo-Zaragoza and David Rizo

Appl. Sci. 2018, 8(4), 606; https://doi.org/10.3390/app8040606 - 11 Apr 2018

Cited by 67 | Viewed by 21465

Abstract

Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks [...] Read more.

Optical Music Recognition is a field of research that investigates how to computationally decode music notation from images. Despite the efforts made so far, there are hardly any complete solutions to the problem. In this work, we study the use of neural networks that work in an end-to-end manner. This is achieved by using a neural model that combines the capabilities of convolutional neural networks, which work on the input image, and recurrent neural networks, which deal with the sequential nature of the problem. Thanks to the use of the the so-called Connectionist Temporal Classification loss function, these models can be directly trained from input images accompanied by their corresponding transcripts into music symbol sequences. We also present the Printed Music Scores dataset, containing more than 80,000 monodic single-staff real scores in common western notation, that is used to train and evaluate the neural approach. In our experiments, it is demonstrated that this formulation can be carried out successfully. Additionally, we study several considerations about the codification of the output musical sequences, the convergence and scalability of the neural models, as well as the ability of this approach to locate symbols in the input score. Full article

(This article belongs to the Special Issue Digital Audio and Image Processing with Focus on Music Research)

► Show Figures

Graphical abstract

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (13)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI