EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition

López-Echeverry, Ana-María; López-Flórez, Sebastián; Bedoya-Guapacha, Jovany; De-La-Prieta, Fernando

doi:10.3390/systems13090750

Open AccessArticle

EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition

by

Ana-María López-Echeverry

^1,2,*,†

,

Sebastián López-Flórez

^2,*,†

,

Jovany Bedoya-Guapacha

^1,†

and

Fernando De-La-Prieta

³

¹

Engineering Faculty, Universidad Tecnológica de Pereira, Pereira 660003, Colombia

²

Doctorate in Computer Science, Doctoral School, Universidad de Salamanca, 37008 Salamanca, Spain

³

Department of Computer Science and Automation, Faculty of Science, Universidad de Salamanca, 37008 Salamanca, Spain

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Systems 2025, 13(9), 750; https://doi.org/10.3390/systems13090750

Submission received: 4 July 2025 / Revised: 22 August 2025 / Accepted: 26 August 2025 / Published: 29 August 2025

(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)

Download

Browse Figures

Versions Notes

Abstract

Comprehensive healthcare seeks to uphold the right to health by providing patient-centred care in both personal and work environments. However, the unequal distribution of healthcare services significantly restricts access in remote or underserved areas—a challenge that is particularly critical in mental health care within low-income countries. On average, there is only one psychiatrist for every 200,000 people, which severely limits early diagnosis and continuous monitoring in patients’ daily environments. In response to these challenges, this research explores the feasibility of implementing an information system that integrates cloud computing with an intelligent Facial Expression Recognition (FER) module to enable psychologists to remotely and periodically monitor patients’ emotional states. This approach enhances comprehensive clinical assessments, supporting early detection, ongoing management, and personalised treatment in mental health care. This applied research follows a descriptive and developmental approach, aiming to design, implement, and evaluate an intelligent cloud-based solution that enables remote monitoring of patients’ emotional states through Facial Expression Recognition (FER). The methodology integrates principles of user-centred design, software engineering best practices, and machine learning model development, ensuring a robust and scalable solution aligned with clinical and technological requirements. The development process followed the Software Development Life Cycle (SDLC) and included functional, performance, and integration testing. To assess overall system quality, we defined an evaluation framework based on ISO/IEC 25010 quality characteristics: functional suitability, performance efficiency, usability, and security. The intelligent FER model achieved strong validation results, with a loss of 0.1378 and an accuracy of 96%, as confirmed by the confusion matrix and associated performance metrics.

Keywords:

mental health; e-health monitoring; facial emotion recognition—FER; cloud computing

1. Introduction

Primary Health Care (PHC) is a comprehensive approach aimed at promoting equitable health and well-being by addressing the needs and preferences of individuals and communities—from health promotion to palliative care—within their everyday environments [1]. In Colombia, national policies such as Resolution 3202 of 2016 and the Ten-Year Public Health Plan reflect this model [2], which emphasises patient-centred care and the provision of services within patients’ immediate personal or work environments. These policies advocate for integrated service delivery networks and comprehensive care routes tailored to individual health needs. However, implementing such models remains challenging in geographically dispersed and rural areas with limited access to health infrastructure. The Mental Health Action Plan 2030, endorsed at the 72nd World Health Assembly, calls for a 20% increase in service coverage for mental disorders and regular monitoring of key indicators such as prevalence and access to treatment [3]. However, in low- and middle-income countries, significant disparities persist. Although half of the global population resides in rural areas, only 36% of the health workforce serves these regions [4]. Countries such as Canada, the U.S., Brazil, Bangladesh, and India illustrate this imbalance, with urban areas having significantly more health professionals than rural ones. The shortage extends across all healthcare roles and severely restricts service accessibility. These challenges are particularly acute in mental health care, where consistent monitoring and follow-up are essential. According to the WHO [5], low-income countries invest as little as USD 0.25 per capita in mental health and may have one psychiatrist—or fewer—for every 200,000 people. Such limitations hinder early diagnosis, timely interventions, and the possibility of community-based treatments, often resulting in delayed care and reliance on specialised institutions. Recognising these barriers, the WHO also highlights the potential of digital health technologies to enhance mental health systems by improving public education, supporting healthcare providers, enabling remote care, and facilitating self-help solutions [6]. In line with this, the 13th OECD Rural Development Conference emphasised that innovative remote care models—including telemedicine and digital solutions—can significantly improve healthcare access in rural and underserved areas [7].

Patient monitoring, through ambulatory assessment, experience sampling, and real-time data capture (EMA), includes a range of methods to evaluate patient behaviour within their natural environments [8]. These techniques allow integration with other clinical measures, providing medical professionals with a richer, context-aware understanding of the patient’s condition over time. Given that emotional states fluctuate and directly influence mental health severity, Experience Sampling Methods (ESMs) and related outpatient monitoring approaches enable the continuous tracking of emotional patterns contributing to a dynamic and comprehensive picture of the patient’s mental state. ESMs facilitate structured, in situ self-reporting, often through daily or random entries, capturing both affective and contextual data [9] using tools such as self-report questionnaires or electronic diaries. An emerging alternative is Facial Emotion Recognition (FER), which enables the real-time capture and analysis of facial expressions, providing objective and continuous emotional assessment [10]. Based on Ekman’s discrete model of emotion, six basic emotions—anger, joy, sadness, fear, disgust, and surprise—are universally expressed through facial cues [11]. Supporting this, Mehrabian’s study found that emotions are communicated 55% through facial expressions, 38% through tone of voice, and only 7% through verbal language [12]. We adopt an applied, mixed-methods approach that combines descriptive research to analyse user needs, technical requirements, and clinical considerations; developmental research to design and implement a functional prototype integrating an intelligent Facial Expression Recognition (FER) module within a secure cloud platform; and experimental evaluation to assess functional suitability, performance, usability, and security following international standards. The proposed system leverages information and communication technologies to enable remote patient monitoring through web-based data acquisition and a scalable cloud infrastructure. It records patient interactions and incorporates an intelligent classification module based on image and video analysis to assess and classify patients’ emotional states automatically. The development process involves defining the operating model, identifying essential modules, components, and functionalities, designing the system architecture, implementing and integrating the intelligent FER model, and performing comprehensive validation tests on the integrated solution.

This research is structured as follows: Section 2 presents a review of related work on remote patient monitoring and sentiment analysis. Section 3 presents Materials and Methods, which details the sentiment analysis process using artificial intelligence applied to video data and outlines the system architecture enabling patient–physician interaction. Section 4 presents implementation and results, while Section 5 offers discussion and Section 6 presents conclusions and directions for future work.

2. Literature Review

The authors of [1,6,13] present goals and benefits of e-health for universal health coverage in which technology plays a transformative role in PHC, enabling timely risk identification, improving diagnostics, optimising patient data management, and enhancing service coordination across the health system. In addition, the authors in [14] indicate consensus, based on the experience of a group of clinical and scientific experts, that a key advantage of e-mental health is to provide access, resulting in a low care gap, particularly in rural areas with limited health service provision.

Considering the vision of clinical experts, we can establish the relevance of the proposed solution, as it enables a reduction in the gap in patient care through a technological tool that integrates remote care mechanisms, thereby expanding access and improving care delivery. In addition, in response to global mental health priorities, it could support primary care staff in the detection and management of mental health conditions through digital tools that aid in early identification, basic intervention, and follow-up, particularly in underserved areas.

Remote patient monitoring has benefited from technological advances that enable stable and robust systems for data collection, transmission, and processing, meeting the defined requirements for a specific class of service through communication processes that use the Internet. Below are several initiatives that implement solutions using digital technologies to support healthcare processes at both general and mental health levels, some including patient tracking.

The authors in [15] present a system for continuous and long-term patient monitoring, including a real scenario as a case study for monitoring Parkinson’s disease by constructing an architecture for remote monitoring of Parkinson’s PDRMA. Similarly, this research presents a technological architecture that enables patient monitoring and provides an integrated operating model that considers the active participation of the patient and the medical staff.

The work presented in [16] deals with the detection of freezing of gait in patients with Parkinson’s disease with the help of wearable acceleration sensors placed on the legs and hips, resulting in their real-time remote monitoring. While this suggests the possibility of monitoring, there is a need to include, in addition to data collection and processing using a classification model, elements related to the possibility of tracking the patient’s development with visualisation capabilities for medical staff, allowing decisions to be made regarding the course of treatment.

The authors in [17] implement an asynchronous cloud-based system for telemonitoring of patients to support clinicians in assessing neurological speech disorders and dysarthria through video analysis with a convolutional network. In their implementation, they follow the methodology proposed in [18] to measure facial alignment from RGB videos of ALS and stroke patients. This asynchronous monitoring approach is considered a suitable alternative to adjust to the periodicity required according to the personalised treatment of each patient. A health monitoring tool should contemplate this flexibility, as not all processes need real-time monitoring, reserving the latter for emergency care situations.

The authors in [19] present a framework under the IoMT healthcare paradigm in which they include a module for emotional recognition focused on the nursing care needs of patients. Their contributions include a local descriptor that preserves signal characteristics, a hierarchical cognitive model to classify emotions in multimodal data, and a robust tracking model with Bluetooth and 5G technology for patient location by caregivers. Following this approach, this research defines concretely the operation model for the care processes, allowing to consider the interaction needs of the users through a web application as well as the possibility of access to data storage and means of visualisation of historical information of the patients in treatment.

Some previous work related to sentiment analysis is presented as an essential element that helps to define the scope of the present study. The study considers integrating a system with information and communication technologies to monitor patient emotional states, including an intelligent module and user management modules with patient and medical profiles.

The authors in [20] used a novel approach to detect emotion based on lip structure using a recurrent neural network to analyse the pattern over time. They obtained the classification of emotions into six different classes. They found that the proposed D-FES model can work in a real-time environment to track and classify emotions accurately. The results obtained provide confidence that allows the implementation of technological solutions. However, convolutional neural networks are suggested to improve accuracy levels. The present research considers complete face information and uses CNN convolutional neural networks.

The authors in [21] propose a video analysis system and a facial expression state algorithm with the help of a video analysis server using Hikvision’s facial action units (AUs) combined with user-side acquisition devices such as cloud storage services, messaging services, and databases, allowing to have an architecture that separates the application and analysis processes, demonstrating the possibility of integration of components for solutions at the engineering level in specific contexts. The solution presents an architecture that integrates components that allow tracking emotions. However, identification of the emotional state is performed by a third-party service. The proposed solution deploys an analysis module based on implementing an intelligent classifier with a CNN.

The authors in [22] present an edge computing system for emotion recognition based on a CNN with an accuracy of 96.6%. In this research, they integrate edge computing, where data preprocessing and recognition are performed, and cloud computing, where they store the preprocessed images. CNNs are a class of deep neural networks that do not require manual feature analysis but require computational capabilities to process operations. Based on the results obtained, the present project adopts convolutional neural networks for emotion classification and considers elements of cloud architecture. However, due to the operation model, both processing and storage are performed exclusively in the cloud.

Recent studies, refs. [23,24,25], have reported advances in facial emotion recognition using various datasets, methodologies, and intelligent models such as CNN, VGG, and their optimised versions, applied to databases including CK+, FER-2013, and JAFFE. We considered these works for performance benchmarking and to identify potential directions for future research.

Table 1 presents a consolidated analysis of the information from the literature review, allowing quick reference to the elements assessed in each, considering their limitations and strengths.

3. Materials and Methods

This study proposes a cloud-based platform, EmotiCloud, for identifying patients’ moods. This platform allows integration into mental health care pathways, providing timely follow-up for individuals in remote areas with access to a computer or a smartphone with an internet connection. This methodology integrates cloud computing, intelligent emotion recognition, and user-centred design within a rigorous software engineering framework. It ensures that the developed prototype is technically robust, clinically relevant, secure, and scalable while remaining fully aligned with international quality guidelines and ethical standards.

Figure 1 exhibits the main steps we consider necessary for conducting the research, which is an iterative process that commences with task analysis, followed sequentially, as indicated by the arrows, by the definition of requirements, the design and development of the prototype, and ultimately the implementation of the system and its operational testing.

Building on the findings presented by the authors in [26], we propose a user-centred design approach aimed at understanding users and their needs, defining the requirement specification, designing and implementing the prototype, and evaluating system performance—all with a consistent focus on the needed tasks.
We define the solution’s operational framework by identifying the tasks assigned to each actor within the system and outlining their interaction requirements, including those of medical staff, patients, and management personnel. This design incorporates recommendations, best practices, and regulations established by national and international organisations for delivering world-class, technology-mediated remote health services. To support this stage, we conduct a systematic information search using predefined search strings, enabling the analysis and synthesis of key elements.
Once we analysed the task, we define the system requirements in terms of users and infrastructure to operate.
Following the considerations, we establish the physical deployment diagram of the system shown in Part A of Figure 2, as a private cloud infrastructure leveraging virtualisation with VMware technology, to provide isolated, scalable computing resources. The system includes the administrator, the doctor-user and the patient-user as system actors that access the system from their computer’s web browser, via a secure HTTPS connection, requiring an authentication process with the system.
Implementation of the information system: The design and implementation of the information system follow the Software Development Life Cycle (SDLC), which, as described by the authors in [27], defines the phases and activities required to develop a software application from conception through implementation and support. In this study, the process comprises the following steps: requirements identification, high-level system architecture design, low-level system design, software coding, functional testing, and performance evaluation of the solution.
We define the system requirements based on the elements that shaped the operating model, including its modular components aligned with the planned interactions for each user profile. The use case diagram in Part B of Figure 2 illustrates how the different actors within the cloud system, EmotiCloud, interact with it and the actions they can perform. Upon authentication through the web application front end, patients can access remote scheduled follow-up and complete a virtual interview, responding to mood-related questions presented through a chat interface. Also, the doctor-user, once authenticated, can access their patients’ records, scheduled follow-up sessions, review interviews, and examine the preliminary diagnosis generated by the intelligent module after requesting its analysis. Based on their clinical expertise, the doctor can then establish a definitive diagnosis and download a report containing the patient’s diagnostic information.
In addition, we consider the MVC model for the architecture design because it provides a solid and scalable structure, separating the application into three layers: data, logic, and presentation [28]. The defined layers are the following:
- Presentation layer: This layer is the web application frontend. It is responsible for the user interface and interaction with them through the use of technologies such as HTML, CSS and JavaScript, and the Bootstrap framework, allowing the creation of an attractive and easy-to-use interface. This layer interacts with the backend application services.
- Business logic layer: This layer defines the system’s core business logic, encompassing user management, the execution of interviews and assessments, and the processing of collected data, ensuring that the system does not perform sensitive computation on the client side. We implement it using the Python 3.7 programming language in combination with the Django framework. Through backend services, which reside and execute within the cloud environment, we deliver the business logic required to handle user requests.
- Data Access Layer: This layer ensures the persistence of system data, enabling high availability, redundancy, and adherence to internal data governance policies. It is hosted in the private cloud, isolated within secured subnets, and accessed only by authorised application components. The solution employs both MySQL and MongoDB databases to store and retrieve user information, interview records, and other relevant data. We define a polyglot persistence architecture to leverage the strengths of different database paradigms. The SQL database, as noted in [29], ensures strong consistency through its ACID (Atomicity, Consistency, Isolation, Durability) properties. In parallel, we use a NoSQL database to maximise availability and resilience, following the BASE (Basically Available, Soft State, Eventually Consistent) model [30]. This combination allows the system to optimise performance for different data types and operational requirements. Following the synthesis presented in [31], the technical specifications for development are established, along with the design of the user interface (UI) and user experience (UX). These guide the coding phase, ensuring the seamless integration of all system components.
Since the intelligent model constitutes the core component of the solution, the methodology for its development is described in detail below.
Implementation and Testing of the Intelligent Model
As noted in [32], convolutional neural networks (CNNs) are particularly effective for image processing as they automatically learn and extract the most relevant features without requiring extensive expert intervention to identify and analyse them. This particularity enables the model to leverage the complete set of features while minimising the need for manual feature engineering.
According to [33], a CNN typically consists of an input layer, which receives the raw data; one or more convolutional layers, which act as filters to extract high-level features; pooling layers, which perform subsampling to reduce dimensionality and eliminate redundant details, thereby lowering computational complexity; and a classification layer composed of fully connected layers for final output generation.
In this study, a CNN is implemented by first defining its architecture, specifying the convolutional, pooling, and fully connected layers in alignment with the characteristics of video-based data. We prepare and preprocess the dataset using normalisation and data augmentation techniques to enhance model robustness. We train the model using the Adam optimisation algorithm due to its adaptive learning rate capabilities, enabling effective weight adjustment. We evaluate model performance using precision-based metrics to minimise false positives and ensure reliable classification.
System Operation Tests.
We evaluate the system using the quality model defined in ISO/IEC 25010 [34], focusing on the characteristics most relevant to the prototype stage: functional suitability, performance efficiency, usability, and security. For each characteristic, we define a specific metric and calculate a final weighted metric that integrates all results.
- Functional suitability. We measure the system’s ability to perform the required actions by assessing functional completeness and functional correctness.
- Performance efficiency. We assess the application’s behaviour under varying load conditions by measuring time behaviour, resource utilisation, and capacity. The metrics measure system performance and resource usage during testing. They include the average CPU usage under load, calculated as the mean percentage of CPU consumption across all samples, and the maximum CPU usage (peak), which records the highest CPU consumption observed. Total disk activity is measured in megabytes by summing the bytes transferred per second and converting to MB. Likewise, total network activity follows the same approach, summing bytes sent or received per second.
- Usability. We conduct end-user testing to measure recognisability, learnability, operability, user error protection, user interface aesthetics, and accessibility, using two questions for each aspect for both patient users and physician users.
- Security. We examine the system’s resilience and trustworthiness by evaluating confidentiality, authenticity, and access control. The evaluation is performed through automated Python scripts using the requests library to simulate HTTP requests and analyse server responses. We run three sets of tests, each focusing on a specific security subfeature.
  –
  Authentication Compliance Rate. This metric measures the percentage of tested protected pages that successfully blocked anonymous (non-authenticated) access. It is calculated by dividing the number of pages that denied anonymous access by the total number of protected pages tested, then multiplying by 100%.
  –
  Access Control Effectiveness by Role. This metric evaluates the effectiveness of role-based access control. It represents the percentage of privileged endpoints tested that successfully blocked requests from users with incorrect roles. It is obtained by dividing the number of endpoints that rejected access for unauthorised roles by the total number of privileged endpoints tested, then multiplying by 100%.
  –
  Confidentiality. This metric measures the percentage of tested pages containing sensitive data that successfully prevented unauthorised access. It is calculated by dividing the number of sensitive-data pages that blocked access by the total number of sensitive-data pages tested, then multiplying by 100%.

Finally, we discuss the results to draw conclusions and establish future lines of work.

4. Implementation and Results

The following describes each stage implemented and its respective outcomes.

4.1. Operation Model

Figure 3 illustrates the system’s operating model. The process begins with an interview between the patient and the treating physician, who, based on the patient’s health status, defines a virtual follow-up plan. The patient registers a user account, which the system administrator activates, granting access to the interview interface via a chatbox. This interface prompts the patient with questions regarding their mood and recent activities while securely storing the associated video and audio recordings in a database. Intelligent models process these recordings to detect the patient’s emotions and generate corresponding labels. These labels are made available to the treating physician during scheduled follow-ups, enabling timely and informed decisions regarding the patient’s treatment.

After registering in the system, users can access and perform the actions allowed according to their profile. The patient can select the scheduled follow-up and enter the automatic interview module, where the system presents nine questions related to their mood, which the patient answers individually. After preprocessing, the videos are encrypted and stored in a database to guarantee their confidentiality, appending a specific ID.

The treating physician logs into the system using a physician user profile and accesses the database records for each of their patients. The physician can schedule remote follow-up sessions through the system and view the video responses to each question the patient answered during the different interviews. The intelligent model in the backend analyses each response and assigns a corresponding label.

The cloud-based information system stores videos for each interview question and analysis results generated by an AI module in the backend, utilising a non-relational database for the videos and a relational database for the labels. The frontend enables users to register and access the system according to their specific profiles.

Figure 4 displays the different interfaces presented to the patient, where it can create an account, sign an electronic consent and, once activated, select a virtual interview for scheduled monitoring. The blue screen provides instructions to initiate the interview, enabling the patient to respond to the questionnaire administered by the virtual interviewer, Ethan. The interview system requires a camera and a microphone. Upon completion, the patient receives a notification confirming the successful submission of the interview.

Figure 4 displays the interface the physician accesses when selecting the patient management option. Upon choosing “Diagnosis,” a panel opens to facilitate patient monitoring. Within this panel, the physician can review recordings or virtual interviews conducted by the chatbot "Ethan" and view the patient’s diagnosis given by the intelligent module to gain a better context for including the final diagnosis in the system.

4.2. Information System Implementation

The developed information system facilitates asynchronous interaction between patients and treating physicians. Following a management plan defined by the physician, patients autonomously complete diagnostic interviews within the system using a set of predefined questions. The system records the patients’ responses on video, enabling physicians to evaluate them and conduct a preliminary assessment through an intelligent module that identifies the patients’ specific emotions.

4.2.1. Design

Based on the established requirements and scope, we defined a cascading methodology for development. This approach involved defining the system architecture, designing and implementing the modules, integrating an intelligent emotion-identification model, and conducting validation tests on the integrated components.

We mention the main functionalities of the system below:

Access Security: This feature allows registered users to access their accounts using login credentials securely. It safeguards the privacy and protection of personal data and includes password recovery through the user’s registered email to maintain security.
Patient follow-up scheduling: This feature enables the physician to schedule the remote follow-up plan through the system, making it accessible for the patient once authorised by the platform administrator.
Video capture: This feature enables users to record videos expressing their feelings or sharing experiences in response to posed questions.
Emotional Evaluation: This feature enables the physician to request the intelligent model to analyse the patient’s emotions based on the recorded video.
Result Visualisation: Upon completing the interview or assessment, the system confirms the end of the recording. The emotion classification module then preprocesses and stores the videos for subsequent analysis.

We developed the server modules considering security practices and designed them for scalability to accommodate growth and adapt to the evolving requirements of the business model. The user management system supports registration, authentication, and account administration, with role-based permissions governing access. Cryptographic protocols are employed to safeguard the confidentiality and integrity of all data transmitted between the client and server.

4.2.2. Deployment

To implement the solution, we utilised a server with the following specifications to ensure proper system operation: a 3.0 GHz multi-core processor, 16 GB of RAM, Asus, manufactured in China. 1 TB of storage, and a stable Internet connection. The software environment included the Linux operating system, Python or higher, the Django framework, MySQL for the relational database, MongoDB for the non-relational database, and Python libraries such as pymysql, pymongo, moviepy, cryptography, ffmpeg, speechrecognition, mysql-connector-python, email, and hashlib.

We developed the relational database using MySQL, implementing tables with relationships and constraints according to the design. We also configured MongoDB by creating collections and defining indexes to optimise data retrieval. In the following, we describe the modules implemented in the solution:

User Authentication and Management Module: This module handles user login, registration, profile management, and password recovery. It interacts with the SQL database to store and retrieve user credentials, personal data, and access permissions.
Video Capture and Processing Module: This module enables users to record videos expressing their feelings or sharing experiences related to their mood. It utilises the MoviePy library for video processing and FFmpeg for format conversion. To ensure data security and compliance with personal data protection regulations, we implemented encryption using the Python cryptography library with the SHA-1 hashing function. Recorded videos are stored in the NoSQL database to leverage its capability for managing unstructured data.
Interview Module: Through a chat interface, users can complete interviews based on a predefined set of mood-assessment questions. We integrated an algorithm that uses the SpeechSynthesis API to read questions aloud, facilitating seamless interaction with the video capture and processing module.

4.3. Emotion Recognition

For the intelligent emotion analysis component, we integrated a model based on convolutional neural networks (CNNs) capable of identifying emotions from images and video. We integrated a compressed version of the model, guaranteeing data entry in the required formats and storing the processing results in a database for later viewing by the doctor.

The intelligent module consists of a classification model based on global facial features. Below, we present the database used and describe the classification model.

4.3.1. Dataset

In this study, we used the Extended Cohn-Konade CK+ database. This database is a video-based action unit-coded database generated from the CK database with posed expressions, including neutral, anger, disgust, fear, happiness, sadness, and surprise, with spontaneous emotions added [35,36].

We had to reorganise the Ck+ database structure to ensure efficient operation and compatibility with previously created scripts. The structure consists of a main folder with seven subfolders, each associated with one of the emotions to be classified. Within each of these subfolders are the corresponding images in PNG format.

Although CK+ includes spontaneous expressions, we evaluated the resulting model with other databases available to verify generalisation, including Japanese Female Facial Expression (JAFFE) [37], FACES [38], and AffectNet. According to authors in [39,40], there is concern regarding the fairness of intelligent models, as they are trained on databases corresponding to a specific population, which may introduce biases related to phenotypic and gender subgroups. Due to the results, we decided to train the model with a mixed database composed of Affetnet and RAF-DB datasets.

The Balanced AffectNet Dataset [41] is a class-balanced, standardised version of the Affect-fer composite dataset, optimised for deep learning in Facial Emotion Recognition (FER). Derived from AffectNet’s over one million annotated facial images, it supports both categorical and continuous emotion models. This resource enhances model performance and comparability, with baseline deep neural networks outperforming traditional approaches.

The Real-world Affective Faces Multi-Label (RAF-ML) [42] dataset contains 4908 diverse facial images with blended emotions, collected from the Internet under varied identities, poses, lighting, and occlusions. The dataset includes a six-dimensional expression distribution vector supporting multi-label emotion recognition in real-world conditions.

4.3.2. Learning Model Implementation

We implement a convolutional neural network (CNN) model designed in Keras that processes (640,490,1) grayscale images. It starts with a convolutional layer of 64 filters of size 3 × 3 filters that allows the extraction of high-level features, followed by a batch normalisation that decreases covariance shifts and speeds up learning, and then a ReLU activation layer that provides sparsity to the data, improving the generalisation of the network, and then a 2 × 2 max pooling layer is included to extract core features and reduce redundancy and a 20% dropout layer that helps prevent overtraining. This pattern is then repeated with a second layer of 128 filters of size 5 × 5 filters, a third layer of 512 filters of size 3 × 3 filters, and a fourth layer of 512 filters of size 3 × 3 filters, all followed by batch normalisation, ReLU activation, Max Pooling, and 20% dropout. After these convolutional layers, two fully connected layers are added: one with 256 neurons and one with 512, with ReLU activation and dropout of 25% and 20%, respectively. The output layer is fully connected with seven neurons and softmax activation for seven-class classification. We compile the model using the Adam optimiser with a learning rate 0.0001 and the categorical cross-entropy loss function. The evaluation metric is accuracy.

4.4. Operation and Performance Tests

We perform operation and performance tests to establish the solution’s operating levels under normal and high load conditions.

We design and execute the solution’s operational tests in alignment with the established methodology, following the ISO/IEC 25010 standard to evaluate functionality, performance, usability, and security. We define the scope of each test and present the corresponding results.

We evaluate functional adequacy through automated load tests (pytest) simulating the complete ”Patient” workflow—account creation, login, and a full nine-question interview with video upload. We consider the functionality correct only when the end-to-end process succeeds under five concurrency levels (one to five simultaneous users), verifying stability as load increases.

Performance evaluation consists of automated load tests designed to identify the system’s breaking point across five scenarios ranging from one to five concurrent users. In each case, the simulated users logg in and complete a nine-question interview, generating nine video uploads per user. Server resources, including CPU, memory, disk, and network, are continuously monitored at 1-s intervals throughout the tests to assess performance under increasing load.

We conduct usability tests with five patient-type users and three doctor-type users to measure recognisability, ease of learning, operability, protection against user errors, user interface aesthetics, and accessibility, using two questions for each aspect.

We perform the security evaluation using automated Python scripts that utilise the requests library to simulate HTTP requests and analyse server responses. We run three sets of tests, each focusing on a specific security sub-feature.

4.5. Results

4.5.1. Operational Results

According to ISO/IEC 25010, Functional Adequacy measures the extent to which a system delivers the required functions to meet stated and implied needs under defined conditions. In this study, we calculate this metric as the average of the results of the functional completeness and functional correctness test. Table 2 presents the results of the functionality tests. Analysis of these results confirms that the web portal is functionally complete for the patient workflow, with all the required features for registration and interview execution fully implemented and operational. However, testing revealed a limitation in functional correctness under load: the application fails to maintain expected behaviour with more than three simultaneous users, resulting in incomplete interview processes when 27 concurrent video uploads occurred.

In Table 3, we can observe that the load test analysis precisely identifies the web portal’s operational capacity under stress.

The system maintains stable performance with 1–3 concurrent users, successfully processing and storing up to 27 videos while completing all interviews within acceptable response times despite high CPU usage. However, at four or more concurrent users, sustained CPU saturation at 100% leads to severe performance degradation, excessive timeouts, and the inability to complete interview flows, marking the system’s performance breaking point.

In Table 4, we can observe the usability testing results, which indicate good overall usability, with an average score of 7.87/10 (8.17 for patients and 7.57 for doctors). Patients report a significantly better experience, suggesting a more polished and intuitive workflow for their role, with operability as their strongest feature (9.0/10). Doctors rate the interface aesthetics highest (8.33/10), reflecting a positive perception of the platform’s visual design.

In Table 5, we can observe the results of security validation. Automated testing confirmed the application’s robustness in key security aspects. In authenticity testing, it correctly blocked access to all six protected URLs, redirecting anonymous users to the login page. Access control tests validated proper role-based privilege management, preventing patients from accessing interviews or altering their role to doctor. Confidentiality tests verified adequate protection of interview video pages, with all ten tested pages denying access to unauthenticated users.

After completing each set of tests on the defined characteristics, the results were normalised on a 0–100% scale to calculate a final weighted average. Based on the outcomes for functionality (80%), performance (60%), usability (78.7%), and security (100%), the system achieves an overall performance score of 79.68%.

4.5.2. Model Results

During training of the CNN with the Ck+ dataset, we determined that using a batch size larger than 16 exceeded the system’s computational capacity. Consequently, we fixed the batch size at 16, with the model trained over 30 epochs.

Figure 5 shows examples of classes of emotion images from the AffectNet dataset included in the intelligent model training process: anger, disgust, fear, happiness, neutral, sadness, and surprise.

Figure 5 also presents emotional heat maps that visually depict bodily sensations associated with different emotions. These maps typically employ a colour-coding scheme, using warm tones to indicate higher activation and cool tones to represent lower activation, offering an intuitive representation of how we experience various emotions.

Figure 6 presents the resulting confusion matrix for the CNN CK+, CNN JAFFE, CNN FACES, CNN AffectNet corresponding to the validation of the model trained with the CK+ dataset from which we derived the global performance metrics. For the CK+ dataset, we can observe the performance metrics in Table 6.

The confusion matrix for CNN CK+ shows that the model achieves a high overall accuracy of 96.31%, with most predictions correctly matching the true emotion labels and only a few scattered errors. Neutral emerges as the primary source of confusion, as several other emotions, including anger, disgust, happiness, sadness, and surprise, are occasionally misclassified as neutral. Fear stands out as being correctly classified with no errors, while other classes exhibit only minor misclassifications. Overall, the model demonstrates strong performance in distinguishing emotions, with potential improvements focused on reducing confusion with the neutral category through enhanced feature extraction or better class balancing. Furthermore, Table 7 provides the classification report per-class, including precision, recall, F1-score, and support.

The training and validation curves of the CNN CK+ model shown in Figure 7 indicate that the model achieved rapid convergence, with both loss values decreasing sharply during the initial epochs and accuracy surpassing 90% within the first six epochs. After approximately epoch 10, the training loss stabilised at a very low level, while the validation loss remained similarly low, aside from occasional spikes that correspond to brief drops in validation accuracy. These fluctuations suggest slight sensitivity to specific validation samples rather than systematic overfitting, as training and validation performance remained closely aligned throughout. At the end of training, the model maintained high accuracy (95–98% validation) and low loss across both datasets, demonstrating strong generalisation capability.

The authors in [43] present a comparison of the performance of different intelligent models with the CK+ dataset, which we can observe in Table 8, corroborating the good results with CNN models.

The results obtained for the validation of the model with the datasets JAFFE, FACES, and AffectNet are in Table 9. Overall, the performance results indicate that the CNN model achieves moderate accuracy across datasets, with substantial variability in class-wise precision, recall, and F1-scores. In the JAFFE dataset, performance is relatively balanced for certain emotions, such as happiness and surprise, but significantly lower for others, notably fear and anger, suggesting difficulty in recognising less represented or more nuanced expressions.

The FACES dataset reflects similar patterns, with perfect recall for anger and happiness, yet complete failure to detect fear and neutral, indicating class-specific biases and possible data imbalance effects. In AffectNet, the markedly lower overall accuracy and macro F1-score highlight the increased complexity of in-the-wild facial expression recognition, with inferior results for sadness and neutral classes. These results collectively suggest that while the model can achieve high precision and recall for certain distinct expressions, their generalisation across all classes, especially in more diverse and unconstrained datasets, remains limited and requires further optimisation in feature representation, data balancing, and class-specific discrimination.

Figure 6 also presents the resulting confusion matrix for the CNN model trained and validated with the Mixed Dataset from AffectNet and RAF-ML from which we derived the global performance metrics in Table 10 and the specific emotions metrics in Table 11.

The results indicate that the model achieves an overall accuracy of 66.3%, with relatively balanced macro precision (0.6351), recall (0.6417), and F1-score (0.6223), suggesting moderate performance across all classes without extreme bias toward any single one. The model performs best for happiness, with high precision, recall, and F1-score, followed by neutral and sadness, which show balanced and reliable detection. Angry achieves moderate results, while surprise has very high precision but low recall, missing many true cases. Disgust and fear are the weakest classes, with notably low recall and, in the case of fear, low precision as well. These results indicate a strong performance for explicit expressions, but difficulty with subtle or less distinct emotions.

5. Discussion

The integration of components into system performance tests enables us to assess the alignment with the operational model of the process to monitor patients’ emotional states.

We can evaluate the performance of the proposed solution through functional testing of integrated front-end components, such as data acquisition and result display modules, in conjunction with backend processing and analysis modules.

By evaluating the different components of the system, we can define a weighted measurement metric to indicate whether the web application integrated with an intelligent emotion classification module responds to the established business strategy.

According to the initial design, the correct operation of the functionalities allows us to establish that the response levels of the solution meet expectations. However, based on end-user interactions, usability elements related to ease of interaction and font size, among other things, must be considered.

Performance tests allow us to determine whether the prototype solution works adequately to withstand future operational testing in a real-patient context.

Comparing results with different intelligent models and databases allows us to establish the need to address new approaches in the implementation processes of intelligent models for the detection of mental illnesses.

A scalable development allows us to incorporate new functionalities in line with future needs identified by end users, as well as additional modules related to diagnostic processes based on intelligent models, providing elements for comparison between different approaches to disease detection and enabling prediagnosis validation.

6. Conclusions

The innovation and primary contribution of this research lie in the development of a prototype-level tool that ensures compliance with the LEAD standard for longitudinal patient analysis. It uses technologies that make patient care and monitoring accessible, even in geographically dispersed areas, provided users have an internet connection supporting multimedia communication. The following additional contributions accompany this work.

The tool includes an intelligent module based on an emotion identification model with a performance comparable to that of the state of the art.

The prototype tool offers the possibility of monitoring mental health, allowing the participation of patients and medical staff, within the framework of a series of follow-up appointments that enable the systematisation of diagnostic interviews at defined times.

The interviews can be viewed and analysed by the doctor and by the intelligent module, allowing the system to generate a preliminary emotional diagnosis that provides the doctor with additional information and context to assess the patient’s state of health, enabling earlier interventions by identifying emotional changes that indicate the development or worsening of mental disorders such as depression, anxiety, or bipolar disorder.

The contribution of this research is grounded in an interdisciplinary approach that extends beyond developing a computational tool or intelligent model. It envisions an integrated technological solution that addresses contextual needs while ensuring the creation of a robust, efficient, and secure software system.

The solution incorporates an intelligent module with state-of-the-art performance, enabling the integration of advanced artificial intelligence developments. This approach allows the system to deliver practical, real-world solutions tailored to specific application contexts.

The application offers a solid and flexible structure, with a polyglot microservice architecture that allows storing structured data using MySQL and unstructured data using MongoDB.

We ensure software quality through the SDLC software development cycle, which includes functional, performance, and integration testing.

We define an evaluation metric based on ISO/IEC 25010 quality characteristics—functional suitability, performance efficiency, usability, and security—to assess overall system performance, obtaining 79.68%, with usability (78.7%), security (100%), performance (60%), and functionality (80%). We validate the generated solution at the operational and component integration levels, and the intelligent model shows a loss of 0.1378 and an accuracy of 96%. We also identify future improvements for the application and the intelligent model.

This solution provides a tool for accessing healthcare services to monitor patients’ emotional states remotely. Its adoption will reduce the healthcare gap in rural and dispersed areas, allowing primary care personnel to expand their care capabilities.

Integrating information systems and cloud computing with artificial intelligence provides an innovative solution for monitoring patient treatments, allowing for their clinical evaluation.

Future work could include improving components of the solution at the encryption level, chatbox, intelligent models with a higher level of generalisation, and the identification and integration of a solution for text generation.

Author Contributions

Conceptualisation, methodology, software, validation, formal analysis, and investigation, A.-M.L.-E., and S.L.-F.; data curation, writing—original draft preparation, and project administration, A.-M.L.-E.; resources, funding acquisition, and supervision, J.B.-G., and F.D.-L.-P.; writing—review and editing, S.L.-F., J.B.-G., and F.D.-L.-P. Cloud System to Monitor Patients Using AI Facial Emotions Recognition. All authors have read and agreed to the published version of the manuscript.

Funding

This work is based on studies and research carried out within a PhD study, with the collaboration of the Research Group Nyquist of Universidad Tecnológica de Pereira in Colombia, and Research Group Bisite of Universidad de Salamanca in Spain.

Institutional Review Board Statement

This study is part of the project “Model For The Automatic Diagnosis Of Depression From Audio And Video With Deep Learning,” which was validated by the PhD in Engineering curricular committee with the support of external academic peers who reviewed the subject matter and scope and recommended approval of the PhD thesis project.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study during validation of the solution’s functionalities.

Data Availability Statement

Data from interviews will be made available on request for research and experimental purposes after signing an agreement. Data for the AI model should be requested from https://www.kaggle.com/datasets/davilsena/ckdataset, after signing an agreement that governs its use and returning the completed form. Accessed on 12 March 2024.

Acknowledgments

This research is part of the International Chair Project on Trustworthy Artificial Intelligence and Demographic Challenge within the National Strategy for Artificial Intelligence (ENIA), in the framework of the European Recovery, Transformation and Resilience Plan. Referencia: TSI-100933-2023-0001. This project is funded by the Secretary of State for Digitalisation and Artificial Intelligence and by the European Union(Next Generation). We conducted this research with the support of the Technological University of Pereira through the Nyquist telecommunications research group and the engineering doctoral program of the Faculty of Engineering. We also had the support of the Bisite research group of the University of Salamanca and the accompaniment of psychologist Miguel Ángel Gómez Bermeo, coordinator of the psychology program of the Universidad Cooperativa de Colombia, Cartago campus, who provided his experience to provide feedback regarding the final solution as possible end users of the results generated in a future pilot test.

Conflicts of Interest

The authors declare no conflicts of interest that might have influenced the work presented in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

FER	Facial Emotion Recognition
SDLC	Software Development Life Cycle
ISO	International Standard Organisation
CNN	Convolutional Neural Network
PHC	Patient Health Care
WHO	World Health Organisation
OECD	Organisation for Economic Co-operation and Development
ESM	Experience Sampling Methods

References

World Health Organization; United Nations Children’s Fund (UNICEF). A Vision for Primary Health Care in the 21st Century: Towards Universal Health Coverage and the Sustainable Development Goals; Technical Documents; World Health Organization: Geneva, Switzerland, 2018. [Google Scholar]
Ministerio de Salud y Protección Social. Plan Decenal de Salud Pública PDSP 2022–2031; 2022. Available online: https://www.minsalud.gov.co/plandecenal/Paginas/PDSP-2022-2031.aspx (accessed on 19 June 2024).
Moitra, M.; Santomauro, D.; Collins, P.Y.; Vos, T.; Whiteford, H.; Saxena, S.; Ferrari, A.J. The global gap in treatment coverage for major depressive disorder in 84 countries from 2000–2019: A systematic review and Bayesian meta-regression analysis. PLoS Med. 2022, 19, e1003901. [Google Scholar] [CrossRef]
World Health Organization. WHO Guideline on Health Workforce Development, Attraction, Recruitment and Retention in Rural and Remote Areas: A Summary; World Health Organization: Geneva, Switzerland, 2021; p. 5. [Google Scholar]
Organización Mundial de la Salud. Depresión; Organización Mundial de la Salud: Geneva, Switzerland, 2023; Available online: https://www.who.int/es/news-room/fact-sheets/detail/depression (accessed on 19 June 2024).
Organización Mundial de la Salud. Informe Mundial Sobre Salud Mental: Transformar la Salud Mental Para Todos. Panorama General [World Mental Health Report: Transforming Mental Health for all. Executive Summary]; Organización Mundial de la Salud: Geneva, Switzerland, 2022. [Google Scholar]
Bryce, B.A. Rural Proofing; Organisation for Economic Co-operation and Development: Paris, France, 2024. [Google Scholar] [CrossRef]
Myin-Germeys, I.; Kuppens, P. The Open Handbook of Sampling Methodology: A Step-by-Step Guide to Designing, Conducting, and Analyzing ESM Studies; The Center for Research on Experience Sampling and Ambulatory Methods Leuven: Leuven, Belgium, 2021. [Google Scholar]
Csíkszentmihályi, M.; Larson, R.W. Validity and Reliability of the Experience-Sampling Method. J. Nerv. Ment. Dis. 1987, 175, 526–536. [Google Scholar] [CrossRef] [PubMed]
McDuff, D.; El Kaliouby, R.; Picard, R.W. Affective sensing: Analyzing facial expressions and their application to cognitive affective states. IEEE Trans. Affect. Comput. 2016, 6, 145–160. [Google Scholar]
Ekman, P. Universals and cultural differences in facial expressions of emotion. In Proceedings Nebraska Symposium on Motivation; University of Nebraska Press: Lincoln, NE, USA, 1971. [Google Scholar]
Mehrabian, A. Communication without words. In Communication Theory, 2nd ed.; Routledge: London, UK, 2008. [Google Scholar]
Rajan, D.; Rouleau, K.; Winkelmann, J.; Kringos, D.; Jakab, M.; Khalid, F. (Eds.) Implementing the Primary Health Care Approach: A Primer; Global Report on Primary Health Care; World Health Organization: Geneva, Switzerland, 2024. [Google Scholar]
Seiferth, C.; Vogel, L.; Aas, B.; Brandhorst, I.; Carlbring, P.; Conzelmann, A.; Esfandiari, N.; Finkbeiner, M.; Hollmann, K.; Lautenbache, H.; et al. How to e-mental health: A guideline for researchers and practitioners using digital technology in the context of mental health. Nat. Ment. Health 2023, 1, 542–554. [Google Scholar] [CrossRef]
d’Angelis, O.; Di Biase, L.; Vollero, L.; Merone, M. IoT architecture for continuous long term monitoring: Parkinson’s Disease case study. Internet Things 2022, 20, 100614. [Google Scholar] [CrossRef]
Ghosh, N.; Banerjee, I. IoT-based freezing of gait detection using grey relational analysis. Internet Things 2021, 13, 100068. [Google Scholar] [CrossRef]
Migliorelli, L.; Berardini, D.; Cela, K.; Coccia, M.; Villani, L.; Frontoni, E.; Moccia, S. A store-and-forward cloud-based telemonitoring system for automatic assessing dysarthria evolution in neurological diseases from video-recording analysis. Comput. Biol. Med. 2023, 163, 107194. [Google Scholar] [CrossRef]
Bandini, A.; Rezaei, S.; Guarín, D.L.; Kulkarni, M.; Lim, D.; Boulos, M.I.; Zinman, L.; Yunusova, Y.; Taati, B. A new dataset for facial motion analysis in individuals with neurological disorders. IEEE J. Biomed. Health Inform. 2020, 25, 1111–1119. [Google Scholar] [CrossRef]
Zhang, T.; Liu, M.; Yuan, T.; Al-Nabhan, N. Emotion-Aware and Intelligent Internet of Medical Things Toward Emotion Recognition During COVID-19 Pandemic. IEEE Internet Things J. 2020, 8, 16002–16013. [Google Scholar] [CrossRef]
Sharma, S.; Kumar, K.; Singh, N. D-FES: Deep facial expression recognition system. In Proceedings of the 2017 Conference on Information and Communication Technology (CICT), Gwalior, India, 3–5 November 2017; pp. 1–6. [Google Scholar]
Zhu, J.; Wang, B.; Sun, W.; Dai, J. Facial Expression Recognition Video Analysis System based on Facial Action Units: A Feasible Engineering Implementation Scheme. In Proceedings of the 2020 13th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 12–13 December 2020; pp. 238–243. [Google Scholar] [CrossRef]
Muhammad, G.; Hossain, M.S. Emotion Recognition for Cognitive Edge Computing Using Deep Learning. IEEE Internet Things J. 2021, 8, 16894–16901. [Google Scholar] [CrossRef]
Sălăgean, G.L.; Leba, M.; Ionica, A.C. Leveraging Symmetry and Addressing Asymmetry Challenges for Improved Convolutional Neural Network-Based Facial Emotion Recognition. Symmetry 2025, 17, 397. [Google Scholar] [CrossRef]
Meena, G.; Mohbey, K.K.; Indian, A.; Khan, M.Z.; Kumar, S. Identifying emotions from facial expressions using a deep convolutional neural network-based approach. Multimed. Tools Appl. 2023, 83, 15711–15732. [Google Scholar] [CrossRef]
Ajlouni, N.; Özyavaş, A.; Ajlouni, F.; Takaoğlu, F.; Takaoğlu, M. Enhanced hybrid facial emotion detection & classification. Frankl. Open 2025, 10, 100200. [Google Scholar] [CrossRef]
Begnum, M.E.N. Common approaches to universal design of IT. J. Des. Res. 2022, 20, 243–255. [Google Scholar] [CrossRef]
Wang, Y.M.; Elhag, T.M. Fuzzy TOPSIS method based on alpha level sets with an application to bridge risk assessment. Expert Syst. Appl. 2006, 31, 309–319. [Google Scholar] [CrossRef]
Kulesza, R.; de Sousa, M.F.; de Araújo, M.L.M.; de Araújo, C.P.; Filho, A.M. Evolution of web systems architectures: A roadmap. In Special Topics in Multimedia, IoT and Web Technologies; Springer: Cham, Switzerland, 2020; pp. 3–21. [Google Scholar]
Kazanavičius, J.; Mažeika, D.; Kalibatienė, D. An approach to migrate a monolith database into multi-model polyglot persistence based on microservice architecture: A case study for mainframe database. Appl. Sci. 2022, 12, 6189. [Google Scholar] [CrossRef]
Khine, P.P.; Wang, Z. A review of polyglot persistence in the big data world. Information 2019, 10, 141. [Google Scholar] [CrossRef]
Hossain, M. Software Development Life Cycle (SDLC) Methodologies for Information Systems Project Management. Int. J. Multidiscip. Res. 2023, 5, 1–36. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, B.; Guo, S.; Zhang, X. MFCC-CNN: A patient-independent seizure prediction model. Neurol. Sci. 2024, 45, 5897–5908. [Google Scholar] [CrossRef]
Sonmez, E.; Kacar, S.; Uzun, S. A new deep learning model combining CNN for engine fault diagnosis. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 644. [Google Scholar] [CrossRef]
ISO/IEC 25002:2024(en); Systems and Software Engineering—Systems and Software Quality Requirements and Evaluation (SQuaRE)—Quality Model Overview and Usage. International Organisation for Standardisation and International Electrotechnical Commission: Geneva, Switzerland, 2024.
Kanade, T.; Cohn, J.F.; Tian, Y. Comprehensive database for facial expression analysis. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, 28–30 March 2000; pp. 46–53. [Google Scholar]
Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. In Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
Lyons, M.J.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding Facial Expressions with Gabor Wavelets. In Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar]
Ebner, N.C.; Riediger, M.; Lindenberger, U. FACES—A database of facial expressions in young, middle-aged, and older women and men: Development and validation. Behav. Res. Methods 2010, 42, 351–362. [Google Scholar] [CrossRef]
Buolamwini, J.; Gebru, T. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, New York, NY, USA, 23–24 February 2018; Friedler, S.A., Wilson, C., Eds.; PMLR, Proceedings of Machine Learning Research; Volume 81, pp. 77–91. [Google Scholar]
Datta, A.; Swamidass, S. Fair-Net: A Network Architecture for Reducing Performance Disparity between Identifiable Sub-populations. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence. SCITEPRESS-Science and Technology Publications, Virtual Event, 3–5 February 2022. [Google Scholar] [CrossRef]
Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2019, 10, 18–31. [Google Scholar] [CrossRef]
Li, S.; Deng, W. Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning. Int. J. Comput. Vis. 2019, 127, 884–906. [Google Scholar] [CrossRef]
Dada, E.G.; Oyewola, D.O.; Joseph, S.B.; Emebo, O.; Oluwagbemi, O.O. Facial Emotion Recognition and Classification Using the Convolutional Neural Network-10 (CNN-10). Appl. Comput. Intell. Soft Comput. 2023, 2023, 2457898. [Google Scholar] [CrossRef]

Figure 1. Methodology steps for the project.

Figure 2. Physical deployment and use case diagrams.

Figure 3. System’s operating model.

Figure 4. Patient and doctor display screens.

Figure 5. Emotion sample images AffetNet dataset [41].

Figure 6. Confusion matrix CCN CK+, CNN JAFFE, CNN FACES, CNN AffectNet, and CNN AffectNet+RAF.

Figure 7. Comparison of accuracy and loss during training and validation CNN CK+.

Table 1. Summary of studies on technological solutions in health *.

Reference	Theme	Method	Objective	Limitations	Proposal
[1,6,13]	Regulations	Technical report and Book	Goals and benefits of e-health for universal health coverage	Requires commitment	Broaden coverage solution
[14]	Experts	Questionnaire	Benefits of e-health	Represents a desire or need	Bridge the care gap
[15]	Remote monitoring	Cloud-fog-edge architecture	Telemedicine solution for symptom monitoring	It does not include operating model or access to data by health personnel	Participation of patients and medical staff
[16]	Remote monitoring	IoT Architecture for FoG Detection	Gait freezing detection with leg and hip sensors	Only considers the classification model	Cloud system for monitoring, data collection, and visualisation of results
[17]	Remote monitoring	Cloud architecture with CNN	Evaluate dysarthria in neurological diseases	Does not analyse the emotional state of the patient	Emotional tracking with a CNN-based module
[19]	Remote monitoring	Emotion-aware IoMT	Remote monitoring and decision making based on emotion	Do not follow periodic treatment processes	Structured remote follow-up for emotional analysis based on video
[20]	FER	RNN neural model for emotion classification	Real-time emotion classification	Only consider lip structure	As recommended, use of a CNN to improve accuracy
[21]	FER	Cloud computing	Implement Hikvision facial action units	Emotional identification through third parties	Intelligent CNN-based classifier to ensure independence
[22]	FER	Edge and cloud computing with a CNN model	Edge computing system with intelligent capabilities and storage in the cloud	The operating model is not specifically applied to the medical field	Validation of the use of CNN for emotion classification and performance of both processing and storage entirely in the cloud for higher levels of security.
[23,24,25]	FER	Research	Improve the performance of intelligent models	New approaches	Performance benchmarking and future research direction

* The table presents the progression from the identified need for technological solutions to support primary healthcare processes—particularly in mental health—to the technologies and AI advancements that enable the integration of comprehensive solutions for remote patient care and monitoring.

Table 2. Functional adequacy metrics.

Sub Feature	Metric	Result
Functional Completeness	Flow Feature Coverage	100.00%
Functional Correction	Functional Success Rate Under Load	60.00%
Functional Suitability (Average)		80.00%

Table 3. Performance metrics by concurrent users.

Users	Success Rate (%)	Execution Time (s)	Average CPU (%)	Max CPU (%)	Average Memory (%)	Total Disk Activity (MB)	Total Network Activity (MB)
1	100.00%	68.60	35.80%	97.50%	28.30%	15.1	11.9
2	100.00%	68.15	46.10%	98.20%	28.90%	26.6	11.8
3	100.00%	87.37	85.40%	100.00%	29.90%	15.1	11.9
4	0.00%	251.40	80.10%	100.00%	35.10%	26.6	11.8
5	0.00%	256.07	76.50%	100.00%	34.50%	45.9	18.2

Table 4. Summary of usability normalisation scores by role.

	Doctor	Patient
Sub-Characteristic	Nk 0–10	Nk 0–10
Appropriateness Recognisability	7.50	8.50
Learnability	7.08	8.75
Operability	7.50	9.00
User Error Protection	7.08	7.75
User Interface Aesthetics	8.33	7.25
Accessibility	7.92	7.75
Role Usability	7.57	8.17
GLOBAL SYSTEM USABILITY (Scale 0 to 10)		7.87

Table 5. Summary of security metrics.

Sub-Characteristic (ISO/IEC 25010)	Test Cases	Proposed Metric	Obtained Result	Ideal Goal
Authenticity	6	Authentication Compliance Rate	100%	100%
Access Control	3	Role-Based Access Control Efficacy	100%	100%
Confidentiality	10	Resource Protection Coverage	100%	100%
Global Security (Average)		100%

Table 6. Global metrics CNN CK+ (Validation/Test).

Metric	Value
Accuracy	0.9631
Macro Precision	0.9684
Macro Recall	0.9684
Macro F1-score	0.9682

Table 7. Classification report CNN CK+ (per class).

Class	Precision	Recall	F1-Score	Support
anger	1.0000	0.9730	0.9863	111
disgust	0.8976	0.9421	0.9194	121
fear	0.9857	1.0000	0.9928	69
happiness	1.0000	0.9854	0.9926	205
neutral	0.9363	0.9441	0.9402	358
sadness	0.9660	0.9726	0.9693	146
surprise	0.9933	0.9613	0.9770	155

Table 8. Performance comparison for Ck+ [43].

Metric	VGG19	INCEPTIONV3	ViT	CNN-10
Accuracy	0.77	0.76	0.52	0.99
Macro Precision	0.76	0.77	0.54	1.00
Macro Recall	0.80	0.56	0.57	1.00
Macro F1-score	0.80	0.58	0.59	1.00

Table 9. Global metrics CNN (JAFFE, FACES, AffectNet Datasets).

Metric	JAFFE	FACES	AffectNet
Accuracy	0.4648	0.4722	0.2606
Macro Precision	0.6838	0.4087	0.2554
Macro Recall	0.4658	0.4722	0.2975
Macro F1-score	0.4625	0.3944	0.2494

Table 10. Global metric CNN Mixed Dataset (AffectNet-RAF).

Metric	Value
Accuracy	0.6630
Macro Precision	0.6351
Macro Recall	0.6417
Macro F1-score	0.6223

Table 11. Classification report for CNN Mixed Dataset (AffectNet-RAF).

Class	Precision	Recall	F1-Score	Support
anger	0.5315	0.6282	0.5758	1880
disgust	0.6692	0.2915	0.4061	2720
fear	0.4695	0.7008	0.5623	1738
happiness	0.9261	0.8701	0.8973	3889
neutral	0.7274	0.8862	0.7990	3048
sadness	0.6176	0.6722	0.6438	2062
surprise	0.5046	0.4429	0.4717	2249

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

López-Echeverry, A.-M.; López-Flórez, S.; Bedoya-Guapacha, J.; De-La-Prieta, F. EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition. Systems 2025, 13, 750. https://doi.org/10.3390/systems13090750

AMA Style

López-Echeverry A-M, López-Flórez S, Bedoya-Guapacha J, De-La-Prieta F. EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition. Systems. 2025; 13(9):750. https://doi.org/10.3390/systems13090750

Chicago/Turabian Style

López-Echeverry, Ana-María, Sebastián López-Flórez, Jovany Bedoya-Guapacha, and Fernando De-La-Prieta. 2025. "EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition" Systems 13, no. 9: 750. https://doi.org/10.3390/systems13090750

APA Style

López-Echeverry, A.-M., López-Flórez, S., Bedoya-Guapacha, J., & De-La-Prieta, F. (2025). EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition. Systems, 13(9), 750. https://doi.org/10.3390/systems13090750

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EmotiCloud: Cloud System to Monitor Patients Using AI Facial Emotion Recognition

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

4. Implementation and Results

4.1. Operation Model

4.2. Information System Implementation

4.2.1. Design

4.2.2. Deployment

4.3. Emotion Recognition

4.3.1. Dataset

4.3.2. Learning Model Implementation

4.4. Operation and Performance Tests

4.5. Results

4.5.1. Operational Results

4.5.2. Model Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI