Search for Articles

Article

63 Citations

8,434 Views

19 Pages

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

Madina Abdrakhmanova,
Askat Kuzdeuov,
Sheikh Jarju,
Yerbolat Khassanov,
Michael Lewis and
Huseyin Atakan Varol

Sensors2021, 21(10), 3465;https://doi.org/10.3390/s21103465

-

16 May 2021

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interact...

111 Results Found

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

Annotated-VocalSet: A Singing Voice Dataset

How Do You Speak about Immigrants? Taxonomy and StereoImmigrants Dataset for Identifying Stereotypes about Immigrants

Mental Illness Stigma and Associated Factors among Arabic-Speaking Religious and Community Leaders

A Personalized Multi-Turn Generation-Based Chatbot with Various-Persona-Distribution Data

Utilization of the Spanish Bisyllable Word Recognition Test to Assess Cochlear Implant Performance Trajectory

Updated Swiss Growth References 2025: No Height Differences, but BMI Variations Associated with Migration

Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning

DisCaaS: Micro Behavior Analysis on Discussion by Camera as a Sensor

Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning

Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation

Visual Lip Reading Dataset in Turkish

Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach

Atom Search Optimization with Deep Learning Enabled Arabic Sign Language Recognition for Speaking and Hearing Disability Persons

The Real-Time Image Sequences-Based Stress Assessment Vision System for Mental Health

Effects of Sinusoidal Model on Non-Parallel Voice Conversion with Adversarial Learning

Recommending Advanced Deep Learning Models for Efficient Insect Pest Detection

BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research

Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children

MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion

Electroencephalogram Dataset of Visually Imagined Arabic Alphabet for Brain–Computer Interface Design and Evaluation

Robust Audio–Visual Speaker Localization in Noisy Aircraft Cabins for Inflight Medical Assistance

Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection

Pre- Trained Language Models for Mental Health: An Empirical Study on Arabic Q&A Classification

Learning the Relative Dynamic Features for Word-Level Lipreading

Intelligent IoT (IIoT) Device to Identifying Suspected COVID-19 Infections Using Sensor Fusion Algorithm and Real-Time Mask Detection Based on the Enhanced MobileNetV2 Model

MCD-Temporal: Constructing a New Time-Entropy Enhanced Dynamic Weighted Heterogeneous Ensemble for Cognitive Level Classification

Cognitive Assessment of Japanese Older Adults with Text Data Augmentation

Feature Fusion for Emotion Recognition †

Atypical Genotypes for Canine Agouti Signaling Protein Suggest Novel Chromosomal Rearrangement

Evaluation of an Arabic Chatbot Based on Extractive Question-Answering Transfer Learning and Language Transformers

DialogCIN: Contextual Inference Networks for Emotional Dialogue Generation

Identity Leadership, Employee Burnout and the Mediating Role of Team Identification: Evidence from the Global Identity Leadership Development Project

Simplicial-Map Neural Networks Robust to Adversarial Examples

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Identification of 3D Lip Shape during Japanese Vowel Pronunciation Using Deep Learning

Extracting Information from Unstructured Medical Reports Written in Minority Languages: A Case Study of Finnish

Understanding Customers’ Transport Services with Topic Clustering and Sentiment Analysis

Toward the Alleviation of the H0 Tension in Myrzakulov f(R,T) Gravity

Deepsign: Sign Language Detection and Recognition Using Deep Learning

Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech

Voluntary Singlehood in a Greek-Speaking Cohort: Different Priorities and Giving Up Intimate Relationships as Reasons for Singlehood

Automatic Segmentation and Classification of Heart Sounds Using Modified Empirical Wavelet Transform and Power Features

Evaluating Voice Biomarkers and Deep Learning for Neurodevelopmental Disorder Screening in Real-World Conditions

Evaluating the Performance of Large Language Models in Predicting Diagnostics for Spanish Clinical Cases in Cardiology

A Web-Based Model to Predict a Neurological Disorder Using ANN

VAD-CLVA: Integrating CLIP with LLaVA for Voice Activity Detection

Depression Detection Based on Hybrid Deep Learning SSCL Framework Using Self-Attention Mechanism: An Application to Social Networking Data

Continuous Arabic Sign Language Recognition Models

Abstractive vs. Extractive Summarization: An Experimental Review

Feature Fusion for Emotion Recognition ^†

Toward the Alleviation of the H₀ Tension in Myrzakulov f(R,T) Gravity