MDPI - Publisher of Open Access Journals

18 pages, 1461 KiB

Open AccessArticle

Two-Stage Unet with Gated-Conv Fusion for Binaural Audio Synthesis

by Wenjie Zhang, Changjun He, Yinghan Cao, Shiyun Xu and Mingjiang Wang

Sensors 2025, 25(6), 1790; https://doi.org/10.3390/s25061790 - 13 Mar 2025

Viewed by 629

Binaural audio is crucial for creating immersive auditory experiences. However, due to the high cost and technical complexity of capturing binaural audio in real-world environments, there has been increasing interest in synthesizing binaural audio from monaural sources. In this paper, we propose a [...] Read more.

Binaural audio is crucial for creating immersive auditory experiences. However, due to the high cost and technical complexity of capturing binaural audio in real-world environments, there has been increasing interest in synthesizing binaural audio from monaural sources. In this paper, we propose a two-stage framework for binaural audio synthesis. Specifically, monaural audio is initially transformed into a preliminary binaural signal, and the shared common portion across the left and right channels, as well as the distinct differential portion in each channel, are extracted. Subsequently, the POS-ORI self-attention module (POSA) is introduced to integrate spatial information of the sound sources and capture their motion. Based on this representation, the common and differential components are separately reconstructed. The gated-convolutional fusion module (GCFM) is then employed to combine the reconstructed components and generate the final binaural audio. Experimental results demonstrate that the proposed method can accurately synthesize binaural audio and achieves state-of-the-art performance in phase estimation (Phase-

l_{2}

: 0.789, Wave-

l_{2}

: 0.147, Amplitude-

l_{2}

: 0.036). Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 7985 KiB

Open AccessArticle

Noise Separation Technique for Enhancing Substation Noise Assessment Using the Phase Conjugation Method

by Shengping Fan, Jiang Liu, Linyong Li and Sheng Li

Appl. Sci. 2024, 14(5), 1761; https://doi.org/10.3390/app14051761 - 21 Feb 2024

Cited by 1 | Viewed by 1232

Abstract

The intrinsic noise of different transformers in the same substation belongs to the same type of noise, which is strongly coherent and difficult to separate, greatly increasing the cost of substation noise assessment and treatment. To solve the problem, the present paper proposes [...] Read more.

The intrinsic noise of different transformers in the same substation belongs to the same type of noise, which is strongly coherent and difficult to separate, greatly increasing the cost of substation noise assessment and treatment. To solve the problem, the present paper proposes a noise separation technique using the phase conjugation method to separate the intrinsic noise signals of different transformers: firstly, the reconstruction of sound source information is realized by the phase conjugation method based on the measurement and emission of a line array; secondly, the intrinsic noise signals of the sound source are obtained by the equivalent point source method. The error of the separation technique is analyzed by point source simulation, and the optimal arrangement form of the microphone line array is studied. A validation experiment in a semi-anechoic chamber is also carried out, and the results prove that the error of separation technique is less than 2dBA, which is the error tolerance of engineering applications. Finally, a noise separation test of three transformers is performed in a substation using the proposed technique. The results show that the proposed technique is able to realize the intrinsic noise separation of each transformer in the substation, which is of positive significance for substation noise assessment and management. Full article

► Show Figures

Figure 1

9 pages, 258 KiB

Open AccessArticle

Patient Blood Management in Microsurgical Procedures for Reconstructive Surgery

by Maria Beatrice Rondinelli, Luca Paolo Weltert, Giovanni Ruocco, Matteo Ornelli, Pietro Francesco Delle Femmine, Alessandro De Rosa, Luca Pierelli and Nicola Felici

Diagnostics 2023, 13(17), 2758; https://doi.org/10.3390/diagnostics13172758 - 25 Aug 2023

Viewed by 1565

Abstract

Introduction: The main purpose of reconstructive surgery (RS) is to restore the integrity of soft tissues damaged by trauma, surgery, congenital deformity, burns, or infection. Microsurgical techniques consist of harvesting tissues that are separated from the vascular sources of the donor site and [...] Read more.

Introduction: The main purpose of reconstructive surgery (RS) is to restore the integrity of soft tissues damaged by trauma, surgery, congenital deformity, burns, or infection. Microsurgical techniques consist of harvesting tissues that are separated from the vascular sources of the donor site and anastomosed to the vessels of the recipient site. In these procedures, there are some preoperative modifiable factors that have the potential to influence the outcome of the flap transfer and its anastomosis. The management of anemia, which is always present in the postoperative period and plays a decisive role in the implantation of the flap, covers significant importance, and is associated with clinical and laboratory settings of chronic inflammation. Methods: Chronic inflammatory anemia (ACD) is a constant condition in patients who have undergone RS and correlates with the perfusion of the free flap. The aim of this treatment protocol is to reduce the transfusion rate by maintaining both a good organ perfusion and correction of the patient’s anemic state. From January 2017 to September 2019, we studied 16 patients (16 males, mean age 38 years) who underwent microsurgical procedures for RS. Their hemoglobin (Hb) levels, corpuscular indexes, transferrin saturation (TSAT) ferritin concentrations and creatinine clearance were measured the first day after surgery (T0), after the first week (T1), and after five weeks (T2). At T0, all the patients showed low hemoglobin levels (average 7.4 g/dL, STD 0.71 range 6.2–7.4 g dL⁻¹), with an MCV of 72, MCH of 28, MCHC of 33, RDW of 16, serum iron of 35, ferritin of 28, Ret% of 1.36, TRF of 277, creatinine clearance of 119 and high ferritin levels (range 320–560 ng mL⁻¹) with TSAT less than 20%. All the patients were assessed for their clinical status, medical history and comorbidities before the beginning of the therapy. Results: A collaboration between the two departments (Department of Transfusion Medicine and Department of Reconstructive Surgery) resulted in the application of a therapeutic protocol with erythropoietic stimulating agents (ESAs) (Binocrit 6000 UI/week) and intravenous iron every other day, starting the second day after surgery. Thirteen patients received ESAs and FCM (ferric carboxymaltose, 500–1000 mg per session), three patients received ESAs and iron gluconate (one vial every other day). No patients received blood transfusions. No side effects were observed, and most importantly, no limb or flap rejection occurred. Conclusions: Preliminary data from our protocol show an optimal therapeutic response, notwithstanding the very limited scientific literature and data available in this specific surgical field. The enrollment of further patients will allow us to validate this therapeutic protocol with statistically sound data. Full article

(This article belongs to the Special Issue Editorial Board Members’ Collection Series - Patient Blood Management: Advances in Diagnosis and Treatment)

17 pages, 4643 KiB

Open AccessArticle

Multiple Sound Source Localization, Separation, and Reconstruction by Microphone Array: A DNN-Based Approach

by Long Chen, Guitong Chen, Lei Huang, Yat-Sze Choy and Weize Sun

Appl. Sci. 2022, 12(7), 3428; https://doi.org/10.3390/app12073428 - 28 Mar 2022

Cited by 20 | Viewed by 5896

Abstract

Synchronistical localization, separation, and reconstruction for multiple sound sources are usually necessary in various situations, such as in conference rooms, living rooms, and supermarkets. To improve the intelligibility of speech signals, the application of deep neural networks (DNNs) has achieved considerable success in [...] Read more.

Synchronistical localization, separation, and reconstruction for multiple sound sources are usually necessary in various situations, such as in conference rooms, living rooms, and supermarkets. To improve the intelligibility of speech signals, the application of deep neural networks (DNNs) has achieved considerable success in the area of time-domain signal separation and reconstruction. In this paper, we propose a hybrid microphone array signal processing approach for the nearfield scenario that combines the beamforming technique and DNN. Using this method, the challenge of identifying both the sound source location and content can be overcome. Moreover, the use of a sequenced virtual sound field reconstruction process enables the proposed approach to be quite suitable for a sound field which contains a dominant, stronger sound source and masked, weaker sound sources. Using this strategy, all traceable, mainly sound, sources can be discovered by loops in a given sound field. The operational duration and accuracy of localization are further improved by substituting the broadband weighted multiple signal classification (BW-MUSIC) method for the conventional delay-and-sum (DAS) beamforming algorithm. The effectiveness of the proposed method for localizing and reconstructing speech signals was validated by simulations and experiments with promising results. The localization results were accurate, while the similarity and correlation between the reconstructed and original signals was high. Full article

(This article belongs to the Special Issue Machine Learning in Vibration and Acoustics)

► Show Figures

Figure 1

15 pages, 8091 KiB

Open AccessArticle

Electromagnetic Safety of Remote Communication Devices—Videoconference

by Artur Przybysz, Krystian Grzesiak and Ireneusz Kubiak

Symmetry 2021, 13(2), 323; https://doi.org/10.3390/sym13020323 - 16 Feb 2021

Cited by 8 | Viewed by 5725

Abstract

Devices powered by electricity become sources of electromagnetic emissions in the course of their operation. In the case of devices oriented to process information, these emissions can have a character of revealing emissions, i.e., those whose reception and analysis allow for remote reconstruction [...] Read more.

Devices powered by electricity become sources of electromagnetic emissions in the course of their operation. In the case of devices oriented to process information, these emissions can have a character of revealing emissions, i.e., those whose reception and analysis allow for remote reconstruction of related data. The best known example of this phenomenon is the formation of revealing emissions during the operation of imaging devices: monitors, projectors or printers. Increasingly more often, these components are used for communication in the form of videoconferences with other network users. The article presents the result of tests and analyses of threats related to the use of such solutions (monitors, personal computers, VoIP terminals) for the confidentiality of conversations and the data presented during them. The focus is on video signals; however, the potential possibilities of revealing speech signals were also indicated. Such phenomenon causes a huge threat to data confidentiality because the combination of graphics and sound can undoubtedly contain much more information about the protected data than just graphics or sound separately. The presented results of analyses apply to graphic data, possibilities of non-invasive acquisition of such data, similarity of images and of patterns and reconstructed image and image recognition. The results indicate that there is still a risk of loss of data confidentiality due to a phenomenon of an electromagnetic leakage, and specialized instrumentation is not required for its interception under favorable circumstances. This may particularly apply to audio data that may be accidentally received by home radio receivers. In particular, the presented results of analyses apply to a Special Issue of Symmetry which is characterized by security and privacy in communication systems and networks, signal processing, video and image processing, multimedia communications and electromagnetic compatibility. All these scientific and technical areas have either symmetrical or asymmetrical approaches, and they have to be taken into consideration as a whole in order to choose the best combinations to protect processed information. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Communications Engineering)

► Show Figures

Figure 1

25 pages, 6602 KiB

Open AccessArticle

Low-Frequency Sound Prediction of Structures with Finite Submerge Depth Based on Sparse Vibration Measurement

by Wenbo Wang, Desen Yang and Jie Shi

Appl. Sci. 2021, 11(2), 768; https://doi.org/10.3390/app11020768 - 14 Jan 2021

Cited by 3 | Viewed by 1911

Abstract

In the non-free-field, with the effect of reflection sounds from the reflection boundary, the vibration character of a submerged structure often changes, which may have significant influences on the measurement system configurations. To reduce the engineering cost in low-frequency sound prediction of a [...] Read more.

In the non-free-field, with the effect of reflection sounds from the reflection boundary, the vibration character of a submerged structure often changes, which may have significant influences on the measurement system configurations. To reduce the engineering cost in low-frequency sound prediction of a submerged structure with finite depth, two methods based on the theory of acoustic radiation mode (ARM) are proposed. One is called the vibration reconstruction equivalent source method (VR-ESM), which utilizes the ARM to reconstruct the total vibration of the structure, and the sound prediction is completed with the equivalent source method (ESM); the other is called the compressed modal equivalent source method (CMESM), which utilizes the theory of compressive sensing (CS) and the ARM to reinforce the sparsity of source strengths. The sound field separation (SFS) technology is combined with the above two methods for constructing the ARMs accurately in the non-free field. Simulations show that both methods are efficient. Compared with the traditional method based on the structural modal analysis, the methods based on the ARM could efficiently reduce the scale of the measurement system. However, the measurement point arrangement should be optimized to keep the prediction results accurate. In this paper, the optimization process is completed with the efficient independence (EFI) method. In addition, some factors that may affect the prediction accuracy are also analyzed in this paper. When the submerge depth is large enough, the process of contrasting ARMs could be further simplified. The results of the paper could help in saving engineering costs to predict the low-frequency sound radiation of submerged structures in the future. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

16 pages, 4038 KiB

Open AccessArticle

Flow Synthesizer: Universal Audio Synthesizer Control with Normalizing Flows

by Philippe Esling, Naotake Masuda, Adrien Bardet, Romeo Despres and Axel Chemla-Romeu-Santos

Appl. Sci. 2020, 10(1), 302; https://doi.org/10.3390/app10010302 - 31 Dec 2019

Cited by 33 | Viewed by 6046

Abstract

The ubiquity of sound synthesizers has reshaped modern music production, and novel music genres are now sometimes even entirely defined by their use. However, the increasing complexity and number of parameters in modern synthesizers make them extremely hard to master. Hence, the development [...] Read more.

The ubiquity of sound synthesizers has reshaped modern music production, and novel music genres are now sometimes even entirely defined by their use. However, the increasing complexity and number of parameters in modern synthesizers make them extremely hard to master. Hence, the development of methods allowing to easily create and explore with synthesizers is a crucial need. Recently, we introduced a novel formulation of audio synthesizer control based on learning an organized latent audio space of the synthesizer’s capabilities, while constructing an invertible mapping to the space of its parameters. We showed that this formulation allows to simultaneously address automatic parameters inference, macro-control learning, and audio-based preset exploration within a single model. We showed that this formulation can be efficiently addressed by relying on Variational Auto-Encoders (VAE) and Normalizing Flows (NF). In this paper, we extend our results by evaluating our proposal on larger sets of parameters and show its superiority in both parameter inference and audio reconstruction against various baseline models. Furthermore, we introduce disentangling flows, which allow to learn the invertible mapping between two separate latent spaces, while steering the organization of some latent dimensions to match target variation factors by splitting the objective as partial density evaluation. We show that the model disentangles the major factors of audio variations as latent dimensions, which can be directly used as macro-parameters. We also show that our model is able to learn semantic controls of a synthesizer, while smoothly mapping to its parameters. Finally, we introduce an open-source implementation of our models inside a real-time Max4Live device that is readily available to evaluate creative applications of our proposal. Full article

(This article belongs to the Special Issue Digital Audio Effects)

► Show Figures

Figure 1

21 pages, 586 KiB

Open AccessArticle

Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales

by Thibaud Necciari, Nicki Holighaus, Peter Balazs, Zdeněk Průša, Piotr Majdak and Olivier Derrien

Appl. Sci. 2018, 8(1), 96; https://doi.org/10.3390/app8010096 - 11 Jan 2018

Cited by 24 | Viewed by 7037

Abstract

Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and [...] Read more.

Many audio applications rely on filter banks (FBs) to analyze, process, and re-synthesize sounds. For these applications, an important property of the analysis–synthesis system is the reconstruction error; it has to be minimized to avoid audible artifacts. Other advantageous properties include stability and low redundancy. To exploit some aspects of auditory perception in the signal chain, some applications rely on FBs that approximate the frequency analysis performed in the auditory periphery, the gammatone FB being a popular example. However, current gammatone FBs only allow partial reconstruction and stability at high redundancies. In this article, we construct an analysis–synthesis system for audio applications. The proposed system, referred to as Audlet, is an oversampled FB with filters distributed on auditory frequency scales. It allows perfect reconstruction for a wide range of FB settings (e.g., the shape and density of filters), efficient FB design, and adaptable redundancy. In particular, we show how to construct a gammatone FB with perfect reconstruction. Experiments demonstrate performance improvements of the proposed gammatone FB when compared to current gammatone FBs in terms of reconstruction error and stability, especially at low redundancies. An application of the framework to audio source separation illustrates its utility for audio processing. Full article

(This article belongs to the Special Issue Sound and Music Computing)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI