MDPI - Publisher of Open Access Journals

17 pages, 6833 KiB

Open AccessArticle

A Regional Brightness Control Method for a Beam Projector to Avoid Human Glare

by Hyeong-Gi Jeon and Kyoung-Hee Lee

Appl. Sci. 2024, 14(4), 1335; https://doi.org/10.3390/app14041335 - 6 Feb 2024

Viewed by 1304

In this study, we proposed a system to reduce the speaker’s suffering from the strong light of a beam projector by applying regional brightness control over the screen. Since the original image and the projected one on the screen are quite different in [...] Read more.

In this study, we proposed a system to reduce the speaker’s suffering from the strong light of a beam projector by applying regional brightness control over the screen. Since the original image and the projected one on the screen are quite different in area, brightness, and color, the proposed system first transforms them so that they have the same area and similar color tone. Then, to accurately determine the difference between those images, we have introduced a SSIM map, which is a perception-based method of measuring image similarity. Accordingly, an image segmentation model is used to determine the speaker’s silhouette from the SSIM map. We applied a couple of well-trained segmentation models, such as Selfie and DeepLab-v3, provided with MediaPipe. The experimental results showed the operability of the proposed system and that it determines most of a lecturer’s body area on the screen. To closely evaluate the system’s effectiveness, we have measured error rates consisting of false-positive and false-negative errors in the confusion matrix. With the measured results, the error rates appeared so insignificant and stable that the proposed system provides a practical effect for the speakers, especially in the case of applying DeepLab-v3. With the results, it is implied that an accurate segmentation model can considerably elevate the effectiveness of the system. Full article

(This article belongs to the Special Issue Multimedia Systems Studies)

► Show Figures

Figure 1

15 pages, 1345 KiB

Open AccessArticle

GERP: A Personality-Based Emotional Response Generation Model

by Ziyi Zhou, Ying Shen, Xuri Chen and Dongqing Wang

Appl. Sci. 2023, 13(8), 5109; https://doi.org/10.3390/app13085109 - 19 Apr 2023

Cited by 3 | Viewed by 3111

Abstract

It is important for chatbots to emotionally communicate with users. However, most emotional response generation models generate responses simply based on a specified emotion, neglecting the impacts of speaker’s personality on emotional expression. In this work, we propose a novel model named GERP [...] Read more.

It is important for chatbots to emotionally communicate with users. However, most emotional response generation models generate responses simply based on a specified emotion, neglecting the impacts of speaker’s personality on emotional expression. In this work, we propose a novel model named GERP to generate emotional responses based on the pre-defined personality. GERP simulates the emotion conversion process of humans during the conversation to make the chatbot more anthropomorphic. GERP adopts the OCEAN model to precisely define the chatbot’s personality. It can generate the response containing the emotion predicted based on the personality. Specifically, to select the most-appropriate response, a proposed beam evaluator was integrated into GERP. A Chinese sentiment vocabulary and a Chinese emotional response dataset were constructed to facilitate the emotional response generation task. The effectiveness and superiority of the proposed model over five baseline models was verified by the experiments. Full article

(This article belongs to the Special Issue Advances in Speech and Language Processing)

► Show Figures

Figure 1

44 pages, 6909 KiB

Open AccessFeature PaperArticle

Inaudible Attack on AI Speakers

by Seyitmammet Saparmammedovich Alchekov, Mohammed Abdulhakim Al-Absi, Ahmed Abdulhakim Al-Absi and Hoon Jae Lee

Electronics 2023, 12(8), 1928; https://doi.org/10.3390/electronics12081928 - 19 Apr 2023

Cited by 3 | Viewed by 5460

Abstract

The modern world does not stand still. We used to be surprised that technology could speak, but now voice assistants have become real family members. They do not simply turn on the alarm clock or play music. They communicate with children, help solving [...] Read more.

The modern world does not stand still. We used to be surprised that technology could speak, but now voice assistants have become real family members. They do not simply turn on the alarm clock or play music. They communicate with children, help solving problems, and sometimes even take offense. Since all voice assistants have artificial intelligence, when communicating with the user, they take into account the change in their location, time of day and days of the week, search query history, previous orders in the online store, etc. However, voice assistants, which are part of modern smartphones or smart speakers, pose a threat to their owner’s personal data since their main function is to capture audio commands from the user. Generally, AI smart speakers such as Siri, Google Assistance, Google Home, and so on are moderately harmless. As voice assistants become versatile, like any other product, they can be used for the most nefarious purposes. There are many common attacks that people with bad intentions can use to hack our voice assistant. We show in our experience that a laser beam can control Google Assistance, smart speakers, and Siri. The attacker does not need to make physical contact with the victim’s equipment or interact with the victim; since the attacker’s laser can hit the smart speaker, it can send commands. In our experiments, we achieve a successful attack that allows us to transmit invisible commands by aiming lasers up to 87 m into the microphone. We have discovered the possibility of attacking Android and Siri devices using the built-in voice assistant module through the charging port. Full article

(This article belongs to the Special Issue Artificial Intelligence and Machine Learning with RFID Technology for IoT)

► Show Figures

Figure 1

19 pages, 3063 KiB

Open AccessArticle

An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction

by Lijiang Chen, Zhendong Mo, Jie Ren, Chunfeng Cui and Qi Zhao

Appl. Sci. 2023, 13(1), 469; https://doi.org/10.3390/app13010469 - 29 Dec 2022

Cited by 3 | Viewed by 2115

Abstract

The extraction of a target speaker from mixtures of different speakers has attracted extensive amounts of attention and research. Previous studies have proposed several methods, such as SpeakerBeam, to tackle this speech extraction problem using clean speech from the target speaker to provide [...] Read more.

The extraction of a target speaker from mixtures of different speakers has attracted extensive amounts of attention and research. Previous studies have proposed several methods, such as SpeakerBeam, to tackle this speech extraction problem using clean speech from the target speaker to provide information. However, clean speech cannot be obtained immediately in most cases. In this study, we addressed this problem by extracting features from the electroglottographs (EGGs) of target speakers. An EGG is a laryngeal function detection technology that can detect the impedance and condition of vocal cords. Since EGGs have excellent anti-noise performance due to the collection method, they can be obtained in rather noisy environments. In order to obtain clean speech from target speakers out of the mixtures of different speakers, we utilized deep learning methods and used EGG signals as additional information to extract target speaker. In this way, we could extract target speaker from mixtures of different speakers without needing clean speech from the target speakers. According to the characteristics of the EGG signals, we developed an EGG_auxiliary network to train a speaker extraction model under the assumption that EGG signals carry information about speech signals. Additionally, we took the correlations between EGGs and speech signals in silent and unvoiced segments into consideration to develop a new network involving EGG preprocessing. We achieved improvements in the scale invariant signal-to-distortion ratio improvement (SISDRi) of 0.89 dB on the Chinese Dual-Mode Emotional Speech Database (CDESD) and 1.41 dB on the EMO-DB dataset. In addition, our methods solved the problem of poor performance with target speakers of the same gender and the different between the same gender situation and the problem of greatly reduced precision under the low SNR circumstances. Full article

(This article belongs to the Special Issue Automatic Speech Recognition)

► Show Figures

Figure 1

14 pages, 5740 KiB

Open AccessArticle

Nonlinear Piezoelectric Energy Harvester: Experimental Output Power Mapping

by Ioan Burda

Vibration 2022, 5(3), 483-496; https://doi.org/10.3390/vibration5030027 - 27 Jul 2022

Cited by 2 | Viewed by 2668

Abstract

In this paper, the output power map of a nonlinear energy harvester (PEH) made of a console beam and the membrane of a resonant vibration speaker is analyzed experimentally. The PEH uses two large piezoelectric patches (PZT-5H) bonded into a parallel bimorph configuration. [...] Read more.

In this paper, the output power map of a nonlinear energy harvester (PEH) made of a console beam and the membrane of a resonant vibration speaker is analyzed experimentally. The PEH uses two large piezoelectric patches (PZT-5H) bonded into a parallel bimorph configuration. The nonlinear response of the deformable structure provides a wider bandwidth in which power can be harvested, compensating for the mistuning effect of linear counterparts. The nonlinear response of the proposed PEH is analyzed from the perspective of its electrical performance. The proposed experimental method provides novelty by measuring the effects produced by the nonlinearity of the deformable structure on the output power map. The objective of this analysis is to optimize the size of the PZT patch in relation to the size of the console beam, providing experimental support for the design. The presentation of the most significant experimental results of a nonlinear PEH, followed by experimental mapping of the output power, ensured that the proposed objective was achieved. The accuracy of the experimental results was determined by the high degree of automation in the experimental setup, assisted by advanced data processing. Full article

(This article belongs to the Special Issue Advancing Engineering Technologies and Applications in Structural Dynamics and Vibrations)

► Show Figures

Figure 1

18 pages, 9518 KiB

Open AccessArticle

Sound Localization and Speech Enhancement Algorithm Based on Dual-Microphone

by Tao Tao, Hong Zheng, Jianfeng Yang, Zhongyuan Guo, Yiyang Zhang, Jiahui Ao, Yuao Chen, Weiting Lin and Xiao Tan

Sensors 2022, 22(3), 715; https://doi.org/10.3390/s22030715 - 18 Jan 2022

Cited by 18 | Viewed by 4705

Abstract

In order to simplify the complexity and reduce the cost of the microphone array, this paper proposes a dual-microphone based sound localization and speech enhancement algorithm. Based on the time delay estimation of the signal received by the dual microphones, this paper combines [...] Read more.

In order to simplify the complexity and reduce the cost of the microphone array, this paper proposes a dual-microphone based sound localization and speech enhancement algorithm. Based on the time delay estimation of the signal received by the dual microphones, this paper combines energy difference estimation and controllable beam response power to realize the 3D coordinate calculation of the acoustic source and dual-microphone sound localization. Based on the azimuth angle of the acoustic source and the analysis of the independent quantity of the speech signal, the separation of the speaker signal of the acoustic source is realized. On this basis, post-wiener filtering is used to amplify and suppress the voice signal of the speaker, which can help to achieve speech enhancement. Experimental results show that the dual-microphone sound localization algorithm proposed in this paper can accurately identify the sound location, and the speech enhancement algorithm is more robust and adaptable than the original algorithm. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

9 pages, 4432 KiB

Open AccessArticle

Heterodyne Angle Deviation Interferometry in Vibration and Bubble Measurements

by Ming-Hung Chiu, Jia-Ze Shen and Jian-Ming Huang

Appl. Sci. 2016, 6(7), 205; https://doi.org/10.3390/app6070205 - 19 Jul 2016

Cited by 2 | Viewed by 5397

Abstract

We proposed heterodyne angle deviation interferometry (HADI) for angle deviation measurements. The phase shift of an angular sensor (which can be a metal film or a surface plasmon resonance (SPR) prism) is proportional to the deviation angle of the test beam. The method [...] Read more.

We proposed heterodyne angle deviation interferometry (HADI) for angle deviation measurements. The phase shift of an angular sensor (which can be a metal film or a surface plasmon resonance (SPR) prism) is proportional to the deviation angle of the test beam. The method has been demonstrated in bubble and speaker’s vibration measurements in this paper. In the speaker’s vibration measurement, the voltage from the phase channel of a lock-in amplifier includes the vibration level and frequency. In bubble measurement, we can count the number of bubbles passing through the cross section of the laser beam and measure the bubble size from the phase pulse signal. Full article

(This article belongs to the Special Issue Selected Papers from International Conference on Applied System Innovation 2016)

► Show Figures

Graphical abstract

23 pages, 1012 KiB

Open AccessReview

Local Control of Audio Environment: A Review of Methods and Applications

by Jussi Kuutti, Juhana Leiwo and Raimo E. Sepponen

Technologies 2014, 2(1), 31-53; https://doi.org/10.3390/technologies2010031 - 10 Feb 2014

Cited by 11 | Viewed by 14068

Abstract

The concept of a local audio environment is to have sound playback locally restricted such that, ideally, adjacent regions of an indoor or outdoor space could exhibit their own individual audio content without interfering with each other. This would enable people to listen [...] Read more.

The concept of a local audio environment is to have sound playback locally restricted such that, ideally, adjacent regions of an indoor or outdoor space could exhibit their own individual audio content without interfering with each other. This would enable people to listen to their content of choice without disturbing others next to them, yet, without any headphones to block conversation. In practice, perfect sound containment in free air cannot be attained, but a local audio environment can still be satisfactorily approximated using directional speakers. Directional speakers may be based on regular audible frequencies or they may employ modulated ultrasound. Planar, parabolic, and array form factors are commonly used. The directivity of a speaker improves as its surface area and sound frequency increases, making these the main design factors for directional audio systems. Even directional speakers radiate some sound outside the main beam, and sound can also reflect from objects. Therefore, directional speaker systems perform best when there is enough ambient noise to mask the leaking sound. Possible areas of application for local audio include information and advertisement audio feed in commercial facilities, guiding and narration in museums and exhibitions, office space personalization, control room messaging, rehabilitation environments, and entertainment audio systems. Full article

► Show Figures

Graphical abstract

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI