Leveraging MongoDB in Real-Time Emotion Recognition from Video for Scalable and Efficient Data Handling †
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Video Processing and Emotion Feature Extraction
- Input Layer: Each video frame that is 224 × 224 pixels with 3 color channels (RGB) is converted into an input with 224 × 224 × 3 dimensions.
- Convolutional Layers: CNNs have multiple convolutional layers that serve to extract spatial features from images. The first layer (Conv1) has 64 filters with a 3 × 3 kernel to capture simple patterns, followed by a pooling layer to reduce dimensionality [15].
- Hidden Layers (Fully Connected Layers): After feature extraction, the results of the convolution layer are processed through multiple fully connected layers that connect all neurons to make emotion predictions.
- Output Layer: The output layer has as many neurons as there are emotion classes (e.g., seven classes: happy, angry, sad, scared, surprised, disgusted, and neutral). A Softmax activation function is used to generate probabilities for each detected emotion class.
3.2. Data Storage and Management Using MongoDB
- Timestamp: The time of each frame is analyzed for easy tracking.
- Emotion label: The emotion detected in the frame.
- Belief score: A score that indicates the model’s level of confidence in the recognized emotion.
3.3. System Scalability and Performance with MongoDB
- Sharding distributes data across multiple servers, enabling horizontal scalability to manage increasingly large data loads. It also reduces processing latency and optimizes load balancing.
- Indexing is used on emotion labels and timestamps to speed up data retrieval, so that processing can be performed efficiently without any delays.
3.4. Real-Time Processing and Data Capture
- Replication in MongoDB ensures data redundancy and system reliability. In the event of a failure on one server, data remains available on other servers without significant disruption.
- Real-time processing allows the system to provide emotion analysis results immediately, so it is suitable for applications that require instant processing such as emotion recognition in streaming videos.
3.5. Performance Testing and Evaluation
- Emotion recognition accuracy.
- Processing speed per frame.
- Data retrieval latency, compared to SQL and some other NoSQL databases.
4. Implementation
- Hardware: A workstation equipped with an Intel Core i7-11700K CPU (3.6 GHz, 8 cores), 32 GB RAM, NVIDIA GeForce RTX 3060 GPU (12 GB VRAM), and 1 TB SSD was used for video processing.
- Software: MongoDB v6.0 (MongoDB Inc., New York, NY, USA) was used for data storage, and TensorFlow v2.12 (Google LLC, Mountain View, CA, USA) was used for emotion recognition. Zhang et al. demonstrated the effective integration of machine learning models with MongoDB for managing real-time video data [20].
4.1. MongoDB Database Schema
- _id: The _id field stores the unique identifier for each document in MongoDB. The identifier is generated automatically in the ObjectId format, which ensures that every document is uniquely identifiable across the database. This is illustrated by the entries in the provided image where the ObjectId is represented as a string of alphanumeric characters: 678513b82d762fdca2f2e170 for the first document, 678513b82d762fdca2f2d3b9 for the second one, and so on. This field guarantees uniqueness even if multiple documents are stored in the same collection.
- Emotion: The emotion field captures the emotion label detected by the CNN model from the video. In the first example image, the emotion detected is “sad”, the second image detects “neutral”, and the previously mentioned example (not shown in these images) detected “happy”. This field allows the system to classify emotions such as happiness, sadness, fear, anger, etc., based on facial expressions or other cues in the video data:
- Example for Sad: The emotion is labeled as “sad” with a confidence score of 0.95279.
- Example for Neutral: The emotion is labeled as “neutral” with a confidence score of 0.85447.
- The previously discussed happy label had a confidence score of 0.99999, indicating very high confidence in detecting that emotion.
- Confidence: The confidence field stores a value that represents how certain the model is about the detected emotion. This is a floating-point number that ranges between 0 and 1, with higher values indicating higher confidence in the accuracy of the detected emotion. Take the following for example:
- For “sad”, the confidence score is 0.95279, indicating a fairly high level of certainty.
- For “neutral”, the confidence score is 0.85447, still reasonably high but lower than “sad.”
- The happy emotion previously showed a confidence score of 0.999995, signifying near-perfect certainty in its detection.
- (1)
- Timestamp: The timestamp field stores the time when the video frame was analyzed in ISODate format, which is crucial for tracking and referencing when the emotion is detected. This timestamp helps to align the emotion data with specific moments in the video, enabling precise real-time processing:
- For “sad”, the timestamp is recorded as “2025-01-13T13:23:04.726Z”.
- For “neutral”, the timestamp is recorded as “2025-01-13T13:23:04.580Z”.
- The timestamp for happy would be recorded similarly, ensuring temporal alignment with the respective video frame.
4.2. Experiment Setup
4.3. Data Type
- Video Frames: Each video clip is divided into multiple image frames, with each frame serving as input for the Convolutional Neural Network (CNN) model. These frames are carefully analyzed to extract visual features related to emotional expressions, which are essential for accurate emotion detection.
- Emotion Labels: Every video frame is labeled with the corresponding emotion detected by the model, such as “happy,” “sad,” and “angry”. These labels represent the classification targets of the model, helping it identify specific emotional states from the facial expressions captured in the frames.
- Confidence Score: Along with the emotion label, the model generates a confidence score for each prediction, indicating how certain the model is about the accuracy of its emotion detection. This score plays a vital role in assessing the reliability of the model’s predictions.
- Timestamp: Each video frame is also paired with a timestamp, marking the precise moment when the frame was captured.
4.4. Real-Time Video Data Management System Architecture for Emotion Recognition Using MongoDB
5. Results and Discussion
5.1. Accuracy (%)
- Accuracy is measured based on how well the model identifies or predicts emotions from the video data.
- The accuracy calculation is performed by comparing the predicted results of the model (as produced by the emotion recognition system) with the correct label or ground truth.
- The general formula for accuracy is as follows:
- If the model correctly predicts 92 out of 100 video frames, then the accuracy is 92%.
5.2. Processing Time (ms/Frame)
- Processing Time measures the time taken to process one video frame by a database (in this case MongoDB, SQL, or other NoSQL).
- To measure this, we can record the time taken to process a certain number of video frames and then calculate the average processing time per frame.
- The formula used is as follows:
- If the system processes 1000 frames in 15 s, then the processing time per frame is 15 ms/frame.
5.3. Data Retrieval Latency (ms)
- Data Retrieval Latency measures the time taken by the system to retrieve data (for example, video frames or emotion recognition results) from the database.
- To measure this, we record the time taken to retrieve data from the database during query execution or when data are accessed.
- The formula for latency is as follows:
- If the time taken to access the video frame data are 25 ms, then the latency is 25 ms.
- MongoDB has better performance because it uses features such as sharding and indexing, which divide data across multiple servers and speeds up data retrieval.
- SQL is slower in terms of data processing and retrieval because it is based on a more rigid data model, with no sharding and indexing mechanisms optimized for big data management.
- Other NoSQL is also better than SQL in some ways, but it may not be completely as efficient as MongoDB in managing large data volumes with real-time processing.
5.4. MongoDB Performance Comparison Table with Other Databases in Various Aspects of Data Processing
- Operational Performance: Data processing speed (such as insertion, retrieval, and query execution operations). MongoDB shows an advantage with sharding that improves the efficiency of big data processing.
- Scalability: The ability to handle horizontal data growth, with sharding allowing MongoDB to manage very large volumes of data.
- Storage Efficiency: How MongoDB optimizes storage space for large datasets, as well as the overhead that may arise compared to more rigid relational databases like MySQL.
- Real-Time Support: MongoDB’s efficiency in real-time data processing, compared to other databases in the context of semi-stream processing or streaming data.
- Schema Flexibility: MongoDB’s flexible schema advantage allows for unstructured data management, compared to databases that require a rigid schema structure such as MySQL.
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Elgabli, A.; Liu, K.; Aggarwal, V. Optimized preference-aware multi-path video streaming with scalable video coding. IEEE Trans. Mobile Comput. 2019, 19, 159–172. [Google Scholar] [CrossRef]
- Barhoumi, C.; BenAyed, Y. Real-time speech emotion recognition using deep learning and data augmentation. Artif. Intell. Rev. 2024, 58, 49. [Google Scholar] [CrossRef]
- Tan, C.; Ceballos, G.; Kasabov, N.; Puthanmadam Subramaniyam, N. Fusionsense: Emotion classification using feature fusion of multimodal data and deep learning in a brain-inspired spiking neural network. Sensors 2020, 20, 5328. [Google Scholar] [CrossRef] [PubMed]
- Mehmood, E.; Anees, T. Performance analysis of not only SQL semi-stream join using MongoDB for real-time data warehousing. IEEE Access 2019, 7, 134215–134225. [Google Scholar] [CrossRef]
- Zhu, F.; Yuan, M.; Xie, X.; Wang, T.; Zhao, S.; Rao, W.; Zeng, J. A data-driven sequential localization framework for big telco data. IEEE Trans. Knowl. Data Eng. 2019, 33, 3007–3019. [Google Scholar] [CrossRef]
- Baruffa, G.; Femminella, M.; Pergolesi, M.; Reali, G. Comparison of MongoDB and Cassandra databases for supporting open-source platforms tailored to spectrum monitoring as-a-service. IEEE Trans. Netw. Serv. Manag. 2020, 17, 346–360. [Google Scholar] [CrossRef]
- Liu, X.; Cheng, X.; Lee, K. GA-SVM based facial emotion recognition using facial geometric features. IEEE Sens. J. 2020, 21, 11532–11542. [Google Scholar] [CrossRef]
- Abbes, H.; Boukettaya, S.; Gargouri, F. Learning ontology from Big Data through MongoDB database. In Proceedings of the 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), Marrakech, Morocco, 17–20 November 2015; pp. 1–7. [Google Scholar] [CrossRef]
- Gonzalez, H.; George, R.; Muzaffar, S.; Acevedo, J.; Hoppner, S.; Mayr, C.; Elfadel, I. Hardware acceleration of EEG-based emotion classification systems: A comprehensive survey. IEEE Trans. Biomed. Circuits Syst. 2021, 15, 412–442. [Google Scholar] [CrossRef]
- Satriyawan, H.; Susanto, D.S. Optimasi keamanan smart grid melalui autentikasi dua lapis: Meningkatkan efisiensi dan privasi dalam era digital. J. RESTIKOM 2023, 5, 319–333. [Google Scholar] [CrossRef]
- Zhong, H.; Wu, F.; Xu, Y.; Cui, J. QoS-aware multicast for scalable video streaming in software-defined networks. IEEE Trans. Multimed. 2020, 23, 982–994. [Google Scholar] [CrossRef]
- Park, J.; Chung, K. Layer-assisted video quality adaptation for improving QoE in wireless networks. IEEE Access 2020, 8, 77518–77527. [Google Scholar] [CrossRef]
- Yang, J.; Qian, T.; Zhang, F.; Khan, S.U. Real-time facial expression recognition based on edge computing. IEEE Access 2021, 9, 76178–76190. [Google Scholar] [CrossRef]
- Zhang, K.; Li, Y.; Wang, J.; Cambria, E.; Li, X. Real-time video emotion recognition based on reinforcement learning and domain knowledge. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1034–1047. [Google Scholar] [CrossRef]
- Chen, L.; Li, M.; Su, W.; Wu, M.; Hirota, K.; Pedrycz, W. Adaptive feature selection-based AdaBoost-KNN with direct optimization for dynamic emotion recognition in human–robot interaction. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 205–213. [Google Scholar] [CrossRef]
- Xiang, L.; Huang, J.; Shao, X.; Wang, D. A MongoDB-based management of planar spatial data with a flattened R-tree. ISPRS Int. J. Geo-Inf. 2016, 5, 119. [Google Scholar] [CrossRef]
- Eyada, M.M.; Saber, W.; El Genidy, M.M.; Amer, F. Performance evaluation of IoT data management using MongoDB versus MySQL databases in different cloud environments. IEEE Access 2020, 8, 110656–110668. [Google Scholar] [CrossRef]
- Patil, M.M.; Hanni, A.; Tejeshwar, C.H.; Patil, P. A qualitative analysis of the performance of MongoDB vs MySQL database based on insertion and retrieval operations using a web/android application to explore load balancing—Sharding in MongoDB and its advantages. In Proceedings of the 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 10–11 February 2017; pp. 325–330. [Google Scholar] [CrossRef]
- Sharma, M.; Sharma, V.D.; Bundele, M.M. Performance analysis of RDBMS and NoSQL databases: PostgreSQL, MongoDB and Neo4j. In Proceedings of the 2018 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), Jaipur, India, 22–25 November 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Mearaj; Maheshwari, P.; Kaur, M.J. Data conversion from traditional relational database to MongoDB using XAMPP and NoSQL. In Proceedings of the 2018 Fifth HCT Information Technology Trends (ITT), Dubai, United Arab Emirates, 28–29 November 2018; pp. 94–98. [Google Scholar] [CrossRef]
- Soussi, N.; Boumlik, A.; Bahaj, M. Mongo2SPARQL: Automatic and semantic query conversion of MongoDB query language to SPARQL. In Proceedings of the 2017 Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 17–19 April 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Yilmaz, N.; Alatli, O.; Ciloglugil, B.; Erdur, R.C. Evaluation of storage and query performance of sensor-based Internet of Things data with MongoDB. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Chatterjee, R.; Mazumdar, S.; Sherratt, R.S.; Halder, R.; Maitra, T.; Giri, D. Real-time speech emotion analysis for smart home assistants. IEEE Trans. Consum. Electron. 2021, 67, 68–76. [Google Scholar] [CrossRef]
- Guetari, R.; Chetouani, A.; Tabia, H.; Khlifa, N. Real time emotion recognition in video stream, using B-CNN and F-CNN. In Proceedings of the 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 2–5 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Kim, S.-H.; Yang, H.-J.; Nguyen, N.A.T.; Prabhakar, S.K.; Lee, S.-W. WeDea: A New EEG-Based Framework for Emotion Recognition. IEEE J. Biomed. Health Inform. 2022, 26, 264–275. [Google Scholar] [CrossRef]
- Hafeez, T.; Xu, L.; McArdle, G. Edge intelligence for data handling and predictive maintenance in IIoT. IEEE Access 2021, 9, 49355–49371. [Google Scholar] [CrossRef]
- Anand, V.; Rao, C.M. MongoDB and Oracle NoSQL: A technical critique for design decisions. In Proceedings of the 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai, India, 24–26 February 2016; pp. 1–4. [Google Scholar] [CrossRef]
- Kang, Y.-S.; Park, I.-H.; Rhee, J.; Lee, Y.-H. MongoDB-based repository design for IoT-generated RFID/sensor big data. IEEE Sens. J. 2016, 16, 485–497. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Y.; Jin, Y. Research on the improvement of MongoDB auto-sharding in cloud environment. In Proceedings of the 2012 7th International Conference on Computer Science & Education (ICCSE), Melbourne, VIC, Australia, 14–17 July 2012; pp. 851–854. [Google Scholar] [CrossRef]
- Roh, Y.; Heo, G.; Whang, S.E. A survey on data collection for machine learning: A big data-AI integration perspective. IEEE Trans. Knowl. Data Eng. 2021, 33, 1328–1347. [Google Scholar] [CrossRef]
Database | Accuracy (%) | Processing Time (ms/Frame) | Data Retrieval Latency (ms) |
---|---|---|---|
MongoDB | 92.5 | 15 | 25 |
SQL | 85.3 | 45 | 100 |
Database | Accuracy (%) | Processing Time (ms/Frame) | Data Retrieval Latency (ms) |
---|---|---|---|
MongoDB | 92.5 | 15 | 25 |
SQL | 85.3 | 45 | 100 |
No | Journal | Aspects Compared | MongoDB vs. Other Databases |
---|---|---|---|
1 | Patil, M. M., Hanni, A., Tejeshwar, C. H., & Patil, P. [17] | Operational Performance (Insertion, Retrieval, Sharding) | MongoDB vs. MySQL in data insertion and retrieval operations, benefits of sharding |
2 | Eyada, M. M., Saber, W., El Genidy, M. M., & Amer, F. [16] | Efficiency in a Cloud Environment | MongoDB vs. MySQL in IoT data management in the cloud, speed, latency, and scalability |
3 | Mehmood, E., & Anees, T. [4] | Real-Time and Data Warehousing Support | MongoDB vs. other databases in real-time data management for data warehousing |
4 | Kang, Y.-S., Park, I.-H., Rhee, J., & Lee, Y.-H. [28] | IoT Data Storage Efficiency | MongoDB vs. other alternatives for sensor- and RFID-based IoT data management |
5 | Baruffa, G., Femminella, M., Pergolesi, M., & Reali, G. [6] | Latency and Throughput Efficiency | MongoDB vs. Cassandra for spectrum monitoring applications, latency, throughput, storage efficiency |
Aspects | MongoDB | Redis | Cassandra |
---|---|---|---|
Scalability | Supports horizontal scalability with sharding, ideal for big data. | Limited scalability as it uses in-memory storage. | Excellent horizontal scalability for big data and distribution. |
Real-time Data Processing | Real-time processing support with advanced indexing and dynamic metadata storage. | Low latency, ideal for applications that require fast response. | Not optimal for real-time data processing, more suitable for write-heavy applications. |
Video Data Processing | Document-based storage enables the management of large video data with great flexibility. | Not designed for large video data storage. | Distributed storage, but less efficient for processing large video data. |
Indexation | Supports advanced indexation to speed up data retrieval. | Does not support indexation for complex data search. | Limited indexing, focusing more on fast write operations. |
Data Access Speed | Fast, but can be affected by very high read/write loads without optimal configuration. | Very fast for in-memory data access. | Fast for write operations, but low-latency data capture can be slower. |
Availability and Replicability | Automatic replication for data redundancy and high availability. | Does not have integrated replication features for high availability. | Automatic replication and the ability to support high availability despite server failures. |
Weaknesses | Performance may degrade at high loads if sharding and indexing are not configured properly. | Not suitable for large data storage as it uses memory. | Performance is sub-optimal in real-time applications and data capture latency is high. |
Advantage | Scalability, schema flexibility, and efficient management of big data. | High speed and low latency, suitable for caching. | Excellent scalability and replication for write-heavy and distributed data applications. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kurniawan, H.M.; Maulidan, M.; Zulmaulidin, M.F.; Sujjada, A. Leveraging MongoDB in Real-Time Emotion Recognition from Video for Scalable and Efficient Data Handling. Eng. Proc. 2025, 107, 84. https://doi.org/10.3390/engproc2025107084
Kurniawan HM, Maulidan M, Zulmaulidin MF, Sujjada A. Leveraging MongoDB in Real-Time Emotion Recognition from Video for Scalable and Efficient Data Handling. Engineering Proceedings. 2025; 107(1):84. https://doi.org/10.3390/engproc2025107084
Chicago/Turabian StyleKurniawan, Haikal Muhammad, Muhammad Maulidan, Muhammad Faisal Zulmaulidin, and Alun Sujjada. 2025. "Leveraging MongoDB in Real-Time Emotion Recognition from Video for Scalable and Efficient Data Handling" Engineering Proceedings 107, no. 1: 84. https://doi.org/10.3390/engproc2025107084
APA StyleKurniawan, H. M., Maulidan, M., Zulmaulidin, M. F., & Sujjada, A. (2025). Leveraging MongoDB in Real-Time Emotion Recognition from Video for Scalable and Efficient Data Handling. Engineering Proceedings, 107(1), 84. https://doi.org/10.3390/engproc2025107084