Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework

Esperança, Marco; Freitas, Diogo; Paixão, Pedro V.; Marcos, Tomás A.; Martins, Rafael A.; Ferreira, João C.

doi:10.3390/app15126673

Open AccessArticle

Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework

by

Marco Esperança

¹

,

Diogo Freitas

¹

,

Pedro V. Paixão

²

,

Tomás A. Marcos

²,

Rafael A. Martins

² and

João C. Ferreira

^1,3,*

¹

ISTAR, Instituto Universitário de Lisboa (ISCTE—IUL), 1649-026 Lisboa, Portugal

²

Vision, Intelligence and Pattern Analysis Laboratory (VIPA), 9050-021 Funchal, Portugal

³

Faculty of Logistics, Molde University College, NO-6410 Molde, Norway

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6673; https://doi.org/10.3390/app15126673

Submission received: 11 May 2025 / Revised: 4 June 2025 / Accepted: 11 June 2025 / Published: 13 June 2025

(This article belongs to the Special Issue Application of Data Science, Artificial Intelligence, and Blockchain for Smart Systems)

Download

Browse Figures

Versions Notes

Abstract

The digital transformation of public services has led to a surge in the volume and complexity of informatics-related complaints, often marked by ambiguous language, inconsistent terminology, and fragmented reporting. Conventional keyword-based approaches are inadequate for detecting semantically similar issues expressed in diverse ways. This study proposes an AI-powered framework that employs BERT-based sentence embeddings, semantic clustering, and classification algorithms, structured under the CRISP-DM methodology, to standardize and automate complaint analysis. Leveraging real-world interaction logs from a public sector agency, the system harmonizes heterogeneous complaint narratives, uncovers latent issue patterns, and enables early detection of technical and usability problems. The approach is deployed through a real-time dashboard, transforming complaint handling from a reactive to a proactive process. Experimental results show a 27% reduction in repeated complaint categories and a 32% increase in classification efficiency. The study also addresses ethical concerns, including data governance, bias mitigation, and model transparency. This work advances citizen-centric service delivery by demonstrating the scalable application of AI in public sector informatics.

Keywords:

artificial intelligence; public sector; complaint management; semantic clustering; predictive analytics; digital transformation

1. Introduction

The domain of informatics is not immune to user dissatisfaction, with complaints frequently stemming from a multitude of factors that can broadly be categorized as usability impediments, system availability issues, accessibility barriers, and deficiencies in information clarity and presentation. Historically, addressing these grievances has largely relied on manual categorization processes coupled with keyword-based filtering techniques, approaches that are often resource-intensive, prone to human error, and limited in their ability to discern nuanced patterns and latent relationships within the complaint data [1]. In recent years, the burgeoning field of artificial intelligence has garnered significant attention within the realm of e-government applications, giving rise to innovative solutions such as AI-powered chatbots designed to enhance citizen engagement, smart decision support systems aimed at optimizing resource allocation, and predictive analytics tools engineered to forecast citizen needs and proactively address potential issues [2]. However, despite the growing recognition of AI’s transformative potential within the public sector, a notable gap persists in the application of AI techniques to the analysis of log file patterns for the purpose of preemptively identifying and mitigating the underlying causes of user dissatisfaction within informatics systems. The digitization of data is crucial for AI adoption in public administration, enabling functions like administrative burden reduction and cost savings by moving away from paper records [3].

Proactively addressing citizen dissatisfaction before it becomes a formal complaint can improve public service delivery, enhance administrative efficiency, and build trust in government. AI can automate functions like benefits processing and compliance, and use predictive analytics to optimize resource allocation and workflow [4]. By leveraging machine learning algorithms to analyze vast quantities of log file data, it becomes possible to identify subtle patterns and anomalies that may indicate impending system failures, usability bottlenecks, or accessibility challenges [5]. Such preemptive insights can then be used to trigger automated remediation processes, such as server load balancing, software patching, or the dynamic adjustment of user interface elements, thereby minimizing the likelihood of user complaints and ensuring a seamless user experience. AI is useful in recognizing patterns of security-related activities or data security rules [6]. These AI-driven interventions can also be personalized to cater to the specific needs and preferences of individual users, further enhancing user satisfaction and promoting greater adoption of e-government services. Furthermore, AI has been shown to bolster governmental services through strong data utilization, the prediction of public needs, and the application of data-driven encryption techniques, enabling careful access decision-making and ensuring that personal data is accessible only to authorized parties, preventing privacy breaches and data abuse [7].

The ethical considerations surrounding the deployment of AI in e-government applications, particularly those involving the analysis of user data, cannot be overlooked [8,9]. Moreover, it is imperative to establish robust data governance frameworks that prioritize transparency, accountability, and user control over data collection and usage practices. These frameworks should incorporate mechanisms for obtaining informed consent from users, ensuring data anonymization and pseudonymization techniques are employed to protect user privacy, and providing users with the ability to access, rectify, and erase their personal data [10,11].

Additionally, it is crucial to address potential biases in AI algorithms that could lead to discriminatory outcomes or unfair treatment of certain user groups [12]. Algorithmic bias can happen when algorithms are trained with biased data; such bias may produce unfair outcomes. This issue has serious ethical implications. AI systems should be designed and trained using diverse and representative datasets, and their performance should be rigorously evaluated across different demographic groups to identify and mitigate any potential biases. Moreover, explainability and interpretability of AI decision-making processes are essential to fostering trust and ensuring accountability.

While the Cross-Industry Standard Process for Data Mining (CRISP-DM) has been widely adopted across various domains, including public sector applications such as labor inspection data analysis [13], and automated complaint classification has been explored in e-commerce contexts [14], the integration of semantic clustering using sentence embeddings within the CRISP-DM framework for public sector complaint management remains underexplored. Our study distinguishes itself by employing BERT-based sentence embeddings to semantically cluster complaint descriptions, enabling the identification of latent issue categories beyond traditional keyword-based methods. Furthermore, we operationalize these insights through a real-time dashboard, facilitating proactive monitoring and decision support. To our knowledge, this is one of the first studies to combine semantic clustering, advanced classification techniques, and real-time operational integration within the context of public sector informatics complaint management.

To address these challenges, this paper presents an AI-driven approach that leverages both data mining and text mining techniques to standardize and classify informatics complaints in a scalable and semantically aware manner. A key contribution of this work lies in the practical integration of BERT-based sentence embeddings and semantic clustering within the CRISP-DM framework to support complaint management in a real-world public administration context. While the core components—BERT embeddings, K-Means clustering, and Random Forest classification—are established methods, their combined application to unstructured informatics complaints enables the harmonization of varied user expressions referring to similar technical problems. More importantly, the framework translates these insights into a real-time dashboard that supports proactive remediation and operational decision-making. This end-to-end implementation demonstrates how off-the-shelf components can be orchestrated effectively to address long-standing bottlenecks in digital service delivery, offering a replicable model for other public sector settings.

Unlike previous work, which often focuses separately on classification or clustering, our approach offers a full end-to-end complaint management framework: from semantic understanding to real-time monitoring and proactive remediation, validated with real-world data from a public sector agency. This study thus contributes not only technically, but also operationally, to the digital transformation of public service delivery.

The main contributions of this work are threefold: (1) the development of a methodology for harmonizing complaint topics through natural language processing and pattern recognition; (2) the deployment of a machine learning pipeline that transforms raw interaction logs into actionable insights; and (3) the evaluation of the framework through a real-world case study conducted in a local Portuguese government agency, focusing on the automation of error handling in equipment data import processes. The findings highlight the potential for proactive digital service management in public administration. Figure 1, shows the major work concept.

The concept of proactive complaint management refers to the ability of a system to anticipate, detect, and resolve issues before they escalate into formal user complaints. In contrast to traditional reactive approaches—where user reports trigger support interventions—proactive systems leverage predictive analytics and semantic pattern recognition to identify anomalies, recurring issues, or latent user dissatisfaction embedded in operational data such as logs and free-text submissions.

In the context of public sector informatics, proactivity is particularly valuable for improving service quality, minimizing user frustration, and reducing administrative overhead. Prior studies have emphasized the need for early-warning systems in digital governance to detect usability bottlenecks, performance anomalies, or systemic failures in real time [15,16].

Operationally, our framework enables proactivity by clustering semantically similar complaint narratives, identifying unusual patterns through anomaly detection, and triggering alerts when certain issue categories exceed predefined thresholds. For example, if complaints relating to software slowness spike within a specific department, the system not only classifies these but flags them for intervention before users escalate the issue to IT support. This transforms complaint management from a passive recording function to an active monitoring and remediation mechanism.

Thus, we define proactivity not just as an architectural feature, but as a measurable outcome: the ability to reduce repeated complaint types, accelerate issue resolution, and mitigate problems at an earlier stage of the user experience lifecycle.

The remainder of this paper is structured as follows. Section 2 reviews related work on AI applications in public sector complaint management. Section 3 details the methodological approach, following the CRISP-DM framework. Section 4 discusses the implementation and evaluation of the system in the local government case. Section 5 reflects on operational, technical, and ethical implications. Finally, Section 6 presents the conclusions and outlines future directions, including real-time deployment and cross-agency integration.

2. Literature Review

Public sector interest in artificial intelligence (AI) has grown markedly, with numerous pilot projects and early implementations now underway [17]. Governments are exploring AI to design better policies, improve decision-making, engage citizens, and enhance service quality [18].

This literature review surveys recent studies (2020–2025) on the use of AI in public administration, with a focus on AI-driven decision-making tools. Particular attention is paid to applications leveraging pattern recognition or user interaction logs (e.g., analyzing citizen complaints), which are relevant for managing informatics-related issues. We also examine AI’s broader roles in policymaking, service delivery, and efforts toward transparency and accountability.

2.1. AI in Public Administration

AI technologies are being adopted across various governmental functions. A recent survey of 250 public-sector AI use cases in the European Union found that most applications aim to support public service delivery, followed by internal management functions, with relatively fewer cases directly assisting policy or decision-making processes [4]. In other words, governments primarily use AI to improve how services are delivered to citizens or to optimize back-office operations, whereas the integration of AI into high-level policymaking is still in its early stages. Machine learning algorithms, knowledge-based systems, predictive models, conversational AI, and robotic process automation are widely applied in tasks such as anomaly detection, policy forecasting, customer service, and administrative automation.

Recent studies underscore the growing emphasis on data-driven decision-making and governance. Across different levels of government, case studies reveal AI being piloted in domains like healthcare administration, smart cities and urban planning, law enforcement and justice (e.g., for predictive policing or case prioritization), social services eligibility determinations, and beyond. One landscaping study, for instance, found that AI is used for detecting new social issues, monitoring policy development, streamlining service delivery, and enhancing internal processes [4]. These developments suggest that AI is evolving from isolated experiments to more integrated tools supporting core government functions.

However, the literature also notes wide variation in adoption. Factors influencing AI adoption include organizational readiness, leadership attitudes, regulatory environments, and public acceptance [19]. A case study on AI adoption in a municipal context, for example, pointed out that supportive leadership and clear strategic vision can significantly drive the implementation of AI projects, whereas concerns about risk and lack of expertise can hinder progress [20].

These trends are mirrored in global comparative studies. For example, the OECD’s 2019 report on AI in the Public Sector outlines how countries such as Canada, the United States, and Singapore are aligning AI strategies with ethical frameworks and digital government reforms [21]. Mergel et al. (2019) also provide a foundational framework for understanding AI adoption across different administrative cultures, highlighting institutional capacity and digital maturity as key enablers [22].

2.2. AI for Decision-Making and Policymaking

One of the most promising uses of AI in the public sector is to augment decision-making and policymaking through data-driven insights. AI systems excel at pattern recognition in large datasets, which can support public officials in making more informed decisions. For example, AI-driven analytics have been employed to improve policy planning by forecasting trends and simulating policy scenarios. Governments can use AI to generate more accurate forecasts (e.g., economic or demand projections) and to simulate complex systems, allowing policymakers to experiment with “what-if” scenarios before implementing policies [4]. Such capabilities help in anticipating the outcomes of policy options, thereby potentially leading to better-designed interventions. One study noted that AI is even used to detect emerging social issues by mining data (such as social media or online complaints) so that policymakers can proactively base new policies on these insights.

Beyond analytics, AI decision-support systems have been adopted in specific decision processes. In administrative adjudications and eligibility determinations, machine learning models have been trialed to help officials sort through applications or flag cases that need attention. For instance, some welfare agencies use algorithms to predict which beneficiaries might be at risk (e.g., of unemployment or health issues), aiming to target interventions more effectively. Similarly, law enforcement and judicial bodies have experimented with AI for risk assessment (such as predicting recidivism or identifying crime hotspots), although these uses remain controversial due to concerns about bias. Research in the last five years emphasizes that algorithmic decisions in government carry both potential and peril: they can improve consistency and speed, but may also encode biases or obscure the rationale behind a decision. One article by van Noordt and Misuraca (2022) highlighted that while AI has the capacity to “greatly improve policy making, enhance public service delivery and strengthen internal management”, its impact on policy decision-making in practice has so far been limited and cautious [4].

2.3. AI in Service Delivery

Service delivery represents the area where AI adoption is most visible. Governments increasingly deploy AI-driven solutions, such as chatbots and virtual assistants on public websites, to enhance citizen interaction. These AI systems handle routine inquiries, guide users through services, and even process simple requests, greatly improving operational efficiency and user satisfaction [23]. Studies show that conversational AI can save between 40 and 50% of staff hours in administrative and analyst roles, with potential savings rising to 80% as AI systems learn and improve [23]. For example, virtual agents now assist citizens in filling out forms online, reporting issues, and accessing services around the clock.

Furthermore, AI in service delivery enhances personalization and targeting by analyzing user data to tailor services to individual needs. In healthcare and education, AI provides personalized recommendations, such as suggesting health interventions or training programs. A study in India showed that AI in citizen services can increase satisfaction by proactively addressing needs and streamlining procedures [24]. In transportation and urban services, AI optimizes public transit routes, manages traffic with smart signals, and controls utilities like energy grids to improve service reliability, creating more responsive and data-driven service delivery systems.

Despite these benefits, citizen acceptance remains crucial. A 2022 study found that citizens preferred AI solutions for routine services, viewing them as faster and more reliable [17]. However, for sensitive services involving personal counseling or high-stakes decisions, human officials were strongly preferred. Concerns about errors, lack of empathy, and privacy issues significantly influenced citizen trust. These findings suggest governments must carefully select where to deploy AI and communicate transparently to build public confidence.

2.4. AI for Complaint Management and Pattern Recognition

A particularly relevant application of AI in public administration is the handling of citizen complaints and feedback, especially through the analysis of textual data and user interaction logs. Government agencies often collect large volumes of complaints, service requests, and feedback (for example, through 311 systems, online portals, or social media). Pattern recognition techniques in AI, notably natural language processing (NLP) and text mining, have been leveraged to make sense of these massive datasets. Multiple studies have applied text mining to public complaint records to “analyze complaints, gain insights and identify prominent issues” emerging from the public [25]. By automatically categorizing and prioritizing complaints, AI can help administrators spot recurrent problems or systemic issues that may require managerial attention.

One practical use-case is automatic complaint classification and routing. Rather than relying solely on staff to read and forward each complaint to the right department, machine learning classifiers can be trained to recognize the topic of a complaint (e.g., garbage collection, road maintenance, IT service glitch) and route it to the responsible unit. For instance, Mahfud et al. (2017) developed a text classification system for a city’s e-complaint center that assigns incoming complaints to the appropriate municipal department, significantly speeding up response times [26]. In recent years, such approaches have become more sophisticated, using advanced NLP and even deep learning. These systems learn from historical complaint data and can continually improve their accuracy in triaging issues. This is highly relevant for informatics-related complaints (such as IT service complaints in a public agency), where AI can quickly group similar incident reports and flag if a particular system is generating many trouble tickets.

Beyond classification, sentiment analysis is another AI tool used for citizen feedback. Sentiment analysis algorithms assess whether the text of a complaint or comment is positive, negative, or neutral in tone. Public managers can use this to gauge public sentiment about specific services or initiatives. For example, if there is a surge of negative sentiment in feedback about a new digital service, it may indicate usability problems. One IEEE study compared multiple machine learning models (SVM, Naive Bayes, Random Forest, XGBoost) for sentiment analysis of public complaints, illustrating growing interest in identifying the emotional tone and satisfaction level expressed by citizens [26]. Knowing not just what issues people report but also how they feel about them provides a deeper understanding of public perception.

AI-driven pattern recognition extends to analyzing user interaction logs from government digital platforms. Every click or action users take on a government website can be logged as data. By mining these logs, agencies can detect patterns such as where users encounter difficulties (e.g., dropping off at a certain page) or which services have unusually high error rates. While academic literature specifically focusing on user interaction log analysis in public administration is still emerging, parallels can be drawn from e-commerce and other domains where such analysis is common. In the public sector, these methods can translate to improvements like redesigning a form that many users struggle with or identifying cyber-security anomalies in log data. For example, anomaly detection algorithms could sift through IT system logs in real time to catch suspicious activities or failures, enabling quicker complaint resolution for informatics issues.

Recent advancements incorporate deep learning and transformer-based models for enhanced accuracy in topic modeling and ticket triage. A study explored Top2Vec, a transformer-topic framework combining word embeddings and clustering, to analyze public complaints about urban services. Top2Vec automatically identified latent topics (e.g., transportation delays and waste management issues) with minimal preprocessing, outperforming traditional LDA-based methods in coherence scores [27]. However, Top2Vec’s reliance on dense embeddings can struggle with sparse, domain-specific complaint datasets, such as informatics-related issues, where contextual nuances are critical.

Post-2023, GPT-powered ticket triage systems have emerged as a frontier in public-sector NLP. A 2024 study by Li et al. evaluated a GPT-4-based system for triaging citizen complaints in a smart city context. The model, fine-tuned on municipal feedback data, achieved 90% accuracy in categorizing complaints and generating actionable summaries, surpassing traditional classifiers like Random Forest by 10–15% in precision for multi-label tasks [28]. GPT’s strength lies in its contextual understanding and ability to handle multilingual and unstructured inputs, making it suitable for diverse public-sector settings. However, its computational cost and need for extensive fine-tuning pose challenges for resource-constrained agencies, unlike the more lightweight BERT-based embeddings used in our framework.

Comparative applications outside Europe also provide useful insights. In the United States, several municipal governments have deployed AI tools like NYC’s 311 analytics platform to extract insights from unstructured citizen complaints, which has become a benchmark in the field [29]. In South Korea, the Seoul Metropolitan Government has experimented with AI-enhanced e-petition systems to streamline citizen input and prioritize responses [30]. These international cases underscore the growing global convergence in AI-powered service delivery while also reflecting local institutional adaptations.

AI plays a crucial role in transforming unstructured citizen feedback into actionable insights for government agencies, particularly in e-government platforms where large volumes of complaints or feedback need to be analyzed [31]. Tools like topic modeling, clustering, and semantic analysis help identify common themes and pain points, enabling agencies to prioritize interventions and allocate resources effectively. For example, text mining of environmental complaints can reveal which issues are most frequently troubling residents, helping agencies target their efforts [32]. Similarly, a framework using network analysis on complaint data can uncover connections, such as noise complaints often coinciding with traffic issues, offering a more holistic view of citizen concerns [32]. However, challenges like ensuring accuracy, fairness, and privacy in AI-driven systems, as well as integrating these tools into existing workflows, must be addressed. Successful integration can lead to faster response times, better issue identification, and cost savings, such as fixing recurring software bugs, making AI a valuable tool for enhancing responsive governance.

3. Methodology: A CRISP-DM Approach to a Real Case

In the context of automating the management of data import errors for technical equipment systems, the CRISP-DM (Cross Industry Standard Process for Data Mining) methodology offers a structured and iterative approach to applying artificial intelligence to a real-world operational challenge. We adapt CRISP-DM for text mining to help standardize the understanding and representation of problems described in natural language by different individuals, see Figure 2. The process involves:

Business understanding—define the goals and clarify what insights are needed from unstructured text (e.g., understanding how different users describe similar issues).

Data understanding—gather and explore textual data, such as tickets, reports, or user queries, identifying language patterns and inconsistencies.

Data preparation—clean and preprocess text (tokenization, normalization, stemming, etc.) and apply techniques like synonym mapping or semantic embedding to reduce differences in phrasing.

Modeling—apply clustering or classification models to group semantically similar texts, helping uncover common problem themes.

Evaluation—validate whether the models effectively capture the underlying problem categories and reduce ambiguity.

Deployment—integrate into decision support or helpdesk systems to suggest standardized problem descriptions and improve response consistency.

3.1. Business Understanding

This first phase—business understanding—focuses on clearly defining the problem, the business objectives, and the success metrics that will guide the entire AI development process.

The core problem arises from the frequent occurrence of failures during the importation of equipment-related data. These failures stem from a variety of sources, including inconsistent formatting, incomplete records, and misclassified or unrecognized equipment types. Traditionally, technical teams handle such errors manually, leading to inefficiencies, delays in system integration, and frustration among users and support personnel. The goal of applying AI in this context is to move from a reactive and manual error-handling process to a proactive and automated framework capable of identifying, predicting, and correcting issues at scale.

The central objective of this initiative is to design and implement an intelligent system that leverages machine learning and natural language processing to classify common import errors, recommend appropriate corrective actions, assess the likelihood of failure in incoming data, and support technicians through intelligent interfaces such as chatbots and visual dashboards. This system aims to enhance operational efficiency, reduce the volume of failed imports, and ultimately improve the reliability and traceability of equipment registration processes.

Achieving this objective requires the involvement of multiple stakeholders, including IT administrators responsible for maintaining the import infrastructure, data analysts tasked with preparing and interpreting historical error logs, and support personnel who act on system alerts and recommendations. Success will be measured not only by traditional machine learning metrics—such as accuracy or precision in classification models—but also by tangible business outcomes, such as a reduction in recurring errors, faster issue resolution times, and increased autonomy of the support team in handling complex importation scenarios.

In applying CRISP-DM, this phase lays the foundation for the subsequent steps, ensuring that the data mining and AI modeling activities are firmly aligned with the organizational context and operational needs. It also emphasizes the importance of ethical and governance considerations, particularly when dealing with sensitive administrative records or learning from historical logs involving human decision-making. By clearly articulating the business challenges and opportunities, this phase ensures that AI is not simply a technical solution, but a strategic enabler of digital transformation in data management processes.

3.2. Data Understanding

In alignment with the CRISP-DM methodology, the data understanding phase provided a structured exploration of the multiple datasets underpinning the equipment importation process in the public sector. This phase aimed to assess the structure, quality, and relevance of the available data to support the development of AI-driven mechanisms for error classification, correction recommendation, and process optimization.

The combined dataset consisted of 12,843 complaint records derived from 7615 unique user interaction logs, spanning a temporal window from January 2022 to December 2024. These logs reflect a broad range of technical and administrative issues encountered across various public sector departments and infrastructures. The dataset included both structured fields (e.g., equipment type, supplier ID, timestamps) and unstructured fields (e.g., free-text error descriptions and user comments).

The data corpus was composed of heterogeneous sources obtained from operational systems responsible for asset management and technical support. These sources included Excel and CSV files representing distinct administrative and technical domains, such as equipment registries, supplier associations, user logs, and historical importation outcomes. Key datasets were derived from platforms including PatOnline and PATS, which collectively captured a wide range of structured and semi-structured information. The data contained attributes such as equipment type, model, memory and storage configurations, supplier identifiers, timestamps of importation attempts, and diagnostic error messages. Importation logs were particularly rich in unstructured content, often containing free-text descriptions of failures, which presented both analytical opportunities and preprocessing challenges.

A key aspect of the classification task was understanding the class balance across the four semantic clusters identified through BERT-based embeddings. The distribution of complaints was as follows: Cluster 0 (Startup Errors and Equipment Reassignment): 28.1%, Cluster 1 (Software Reinstallation and Slow Performance): 36.2%, Cluster 2 (Peripheral Failures and Error Indicators): 18.7%, and Cluster 3 (Equipment Replacement and Migration): 17.0%. While moderately imbalanced, the distribution allowed for stratified sampling during model training and informed the evaluation metrics used.

An initial exploratory analysis was conducted to evaluate the completeness and consistency of the data. This included data type profiling, assessment of value distributions, and identification of columns with high levels of missingness. Visual inspection techniques, such as heatmaps and null-value matrices, were employed to detect patterns of sparsity and redundancy. Several tables exhibited fields with over 50% missing values, particularly in legacy or auxiliary datasets, suggesting the need for dimensionality reduction or selective exclusion during the preparation phase. In contrast, critical fields relevant to model training, such as equipment specifications and error descriptions, were generally well populated, though inconsistencies in formatting and nomenclature were common.

Free-text fields, especially those containing equipment names and part numbers, displayed high lexical variability. Variants such as “HP ProBook 430 G5” and “HP Probook 430G5” were semantically equivalent but syntactically dissimilar, indicating the necessity of applying normalization techniques and entity resolution in subsequent phases. Similarly, the MensagemErroImportacao field revealed a rich set of diagnostic expressions, from SQL constraint violations to ambiguous timeout errors, necessitating natural language processing for semantic grouping and downstream interpretability.

Relational structures across datasets were also identified, with foreign key relationships linking records across different sheets and file types. For example, associations between equipment entries and their respective importation logs were essential for constructing supervised learning datasets. These relationships informed the feature engineering strategy and underscored the importance of maintaining referential integrity throughout the preprocessing workflow.

This exploratory phase confirmed that the available data, while diverse and partially inconsistent, possessed substantial analytical value. The findings informed the design of targeted cleaning and transformation procedures, and helped define the scope of features suitable for use in predictive modeling, recommendation systems, and semantic analysis. Moreover, the identification of latent patterns in the logs and textual fields sets the stage for advanced machine-learning approaches capable of automating and optimizing the error-handling process in large-scale equipment importation workflows.

Table 1 summarizes the key attributes extracted from user interaction logs and complaint reports. Each record contains a unique identifier, metadata regarding the event timing and user, contextual information about the affected system, a free-text description of the issue, and fields related to complaint classification, severity, and resolution status.

3.3. Data Preparation

Following the exploratory assessment of the data landscape, the data preparation phase focused on the systematic transformation, cleaning, and integration of heterogeneous data sources to enable the development of machine learning models. In line with the CRISP-DM framework, this phase served to operationalize the findings from the previous stage by implementing data engineering strategies capable of addressing sparsity, inconsistency, and semantic heterogeneity across the datasets.

The data corpus was initially fragmented across multiple files and formats, including structured spreadsheets and delimited text files sourced from independent subsystems such as PatOnline and PATS. These sources contained overlapping, complementary, and in some cases, redundant information related to equipment specifications, supplier attributes, user interactions, and importation diagnostics. As a first step, a robust data merging pipeline was developed to consolidate these inputs into a cohesive dataset. This involved schema alignment through the detection of shared column names, resolution of conflicting data types, and elimination of deprecated or contextually irrelevant sheets (e.g., legacy contact directories or inactive administrative entities). The merging logic preserved table-level granularity when necessary to retain meaningful relational context.

Data cleaning routines were subsequently applied to improve data quality and prepare features for downstream tasks. Missing value treatment was performed based on field-level sparsity thresholds. Columns with more than 50% null values were excluded from further analysis, while others were retained with appropriate imputation or flagging. Summary statistics and missingness matrices were employed to guide these decisions. This process led to a significant reduction in dimensionality while preserving the features most relevant to the predictive modeling and semantic interpretation objectives.

To address the substantial lexical variability observed in free-text fields, particularly those containing equipment names, part numbers, and error messages, text normalization techniques were introduced. These included case standardization, removal of extraneous characters, and the application of fuzzy string-matching algorithms to harmonize semantically equivalent but syntactically divergent entries. For example, variants such as “HP ProBook 430 G5” and “HP Probook 430G5” were reconciled using custom modules based on token-level similarity and domain-specific regular expressions. This step was essential to ensure consistency in entity recognition tasks and improve the reliability of subsequent machine learning models.

Particular attention was given to the preprocessing of the MensagemErroImportacao field, which served as a critical input for both classification and clustering models. This unstructured field was transformed into numerical representations using sentence-level embeddings. These embeddings, generated using state-of-the-art models from the sentence-transformers library, captured semantic nuances and enabled downstream tasks such as semantic clustering, anomaly detection, and explainable recommendations of corrective actions.

Relational integrity across datasets was preserved by maintaining foreign key linkages during the merging process. These relationships allowed the construction of enriched feature spaces where each equipment record was linked to historical import outcomes, supplier metadata, and system-generated error traces. This multi-faceted representation was essential for supervised learning, particularly in the training of classification models to predict error types and risk scores associated with new import attempts.

The final dataset resulting from this phase was a unified, high-quality tabular structure with normalized text fields, resolved relational links, reduced dimensionality, and semantic encodings of unstructured information. This dataset was subsequently partitioned into training, validation, and testing subsets, ensuring temporal and contextual separation to mitigate data leakage and support model generalization. All transformation procedures were implemented as reproducible scripts, facilitating automation and future updates as additional data becomes available.

To ensure compliance with data protection regulations, particularly the General Data Protection Regulation (GDPR), an anonymization pipeline was incorporated into the preprocessing workflow. Fields containing personally identifiable information (PII), such as email addresses, user names, and device identifiers, were subjected to one-way cryptographic hashing or pseudonymization, depending on their relevance to analytical tasks. Non-essential PII fields were removed entirely. These transformations ensured that the final analytical dataset retained no direct or indirect identifiers, thereby mitigating re-identification risks and supporting the ethical deployment of AI systems in the public sector.

Overall, this data preparation workflow laid a robust foundation for the intelligent modeling of importation failures in public sector digital infrastructures. By addressing issues of scale, inconsistency, and semantic ambiguity, the resulting dataset enabled the effective application of advanced AI techniques, including classification, recommendation, and natural language understanding—contributing to the overarching goal of proactive, AI-driven complaint and error management.

The diagram of Figure 3 illustrates the sequence of preprocessing steps applied to the informatics complaint dataset. After importing necessary libraries, the workflow involves checking for missing data, removing columns with excessive missing values, identifying common columns between datasets, and organizing the resulting data structure for subsequent analysis.

3.4. Modeling

The modeling phase followed the CRISP-DM methodology and aimed to extract meaningful patterns from user interaction logs to support the automation of informatics complaint handling in the public sector. A combination of clustering, classification, and anomaly detection techniques was used to capture the diversity of data sources and the semantic variation found in complaint descriptions.

To identify semantically similar complaints, clustering was applied using the K-Means algorithm. Sentence embeddings were generated from the cleaned textual descriptions using pretrained BERT models. To identify semantically similar complaints, clustering was applied using the K-Means algorithm.

Sentence embeddings were generated from the cleaned textual descriptions using pretrained BERT models. To accommodate the Portuguese-language dataset, we used the distiluse-base-multilingual-cased-v2 model from the SentenceTransformers library. This multilingual version of BERT is trained for cross-lingual semantic similarity and includes native support for Portuguese, thereby mitigating the risks of semantic drift in embedding generation. This model, based on DistilBERT architecture, offers a trade-off between semantic quality and computational efficiency. Its reduced size (~66 M parameters) enables practical inference on CPU-only municipal IT environments. Inference time was measured on a standard CPU (Intel i7, 32 GB RAM), averaging 0.14 s per complaint. This lightweight footprint ensured compatibility with resource-constrained infrastructures. We avoided heavier models like BERT-base due to their 110 M+ parameter size, which would introduce latency and memory bottlenecks in deployment settings. These embeddings allowed the grouping of complaints that shared the same meaning but differed in their lexical structure. The optimal number of clusters was selected using the Elbow method, which analyzed the model’s inertia across several values of k. The clustering results revealed consistent themes in the complaints, which were then used to harmonize and standardize issue categories.

To evaluate clustering quality, we complemented the Elbow method with additional internal metrics, including the silhouette score (average = 0.47) and the Davies–Bouldin index. Both the Elbow curve and the silhouette analysis supported the choice of k = 4, indicating a good balance between cohesion and separation among semantic clusters. Despite the spherical cluster assumption of K-Means, the cosine-based sentence embeddings produced well-defined groupings. As a robustness check, we experimented with HDBSCAN and BERTopic; while these models generated overlapping cluster sets, they lacked the interpretability and coherence observed with K-Means. Therefore, K-Means was selected for its balance of computational simplicity, cluster coherence, and interpretability in the context of public sector complaint narratives.

For the classification task, a Random Forest classifier was trained to predict predefined error types. The input data included both structured features—such as equipment type, organizational unit, and priority level—and unstructured text data processed using TF-IDF vectorization. A preprocessing pipeline was implemented using scikit-learn’s ColumnTransformer and Pipeline tools to integrate categorical, binary, and textual features. The data was split into training and test sets using a stratified approach to preserve class distribution. The classifier’s performance was evaluated using standard metrics, including accuracy, precision, recall, and F1-score. Additionally, confusion matrices were generated and visualized to identify misclassification patterns and guide future improvements.

We opted for TF-IDF for classification inputs due to its interpretability and compatibility with decision-tree models, allowing the classifier to benefit from sparse textual signals alongside structured metadata. While embedding concatenation or a fine-tuned Sentence-BERT classifier with a softmax head was considered, these alternatives were deprioritized in this iteration to preserve explainability and ensure alignment with deployment constraints in public sector environments—especially the need for transparency in decision logic. Future work includes exploring transformer-based end-to-end classifiers to assess trade-offs between accuracy and operational interpretability.

The dataset was split 70/30 using stratified sampling, with a training set of 8990 entries and a test set of 3853. To address the mild class imbalance, class weights were automatically adjusted during training. In addition to overall metrics, we computed per-class F1 scores and the Matthews correlation coefficient (MCC) to ensure balanced performance. ROC-AUC was also calculated for each class, demonstrating consistent separability, especially for underrepresented categories such as Peripheral Failures.

To support model transparency, we applied SHAP values to interpret feature contributions for a subset of classification results. The most influential features included textual terms (from TF-IDF) such as “timeout”, “driver”, and “reboot”, as well as structured fields like “equipment type” and “organizational unit”. These insights were presented to domain experts for validation and refinement of classification logic.

To detect outlier cases that may indicate emerging or previously unidentified issues, an Isolation Forest algorithm was applied to the BERT-based embeddings after reducing their dimensionality with PCA. This unsupervised method flagged complaints that deviated significantly from the general data distribution. These anomalies were subsequently analyzed and clustered to uncover new categories of technical problems that had not been adequately represented in the historical records.

To evaluate the effectiveness of anomaly detection, flagged cases were compared against a baseline taxonomy of known complaint categories established during the classification phase. Anomalies were grouped by semantic proximity to known classes using cosine similarity of their BERT embeddings and further assessed through cluster cohesion metrics. A manual audit was then conducted to assess interpretability and identify cases that did not align with existing classes. This process confirmed the model’s capacity to surface semantically distinct complaint types not previously labeled in the training data. In future iterations, we aim to incorporate active learning loops and time-aware anomaly scoring to reduce human supervision and support continuous improvement.

The combination of these modeling techniques enabled a deep understanding of recurring and atypical issues, helping to improve the consistency and effectiveness of complaint classification. The results provided a foundation for the proactive resolution of problems and the development of intelligent tools to assist public sector staff in managing technical support requests.

The diagram of Figure 4 illustrates the main stages of the complaint analysis pipeline. After initial data preparation and exploration, the process involves text cleaning and processing, followed by the generation of text embeddings. These embeddings are then used in two parallel tasks: semantic clustering of complaints and supervised text classification, leading to the production of actionable results.

3.5. Evaluation

The evaluation phase comprehensively validated the proposed models using a reserved test set of complaints to assess their generalization and practical utility in a public sector context. This section outlines the methodologies employed to evaluate the Random Forest classifier, K-Means clustering with BERT-based embeddings, and Isolation Forest anomaly detection, emphasizing rigorous quantitative, qualitative, and operational assessment strategies.

For the classification task, the Random Forest model was evaluated using a suite of standard metrics: accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and ROC-AUC. Accuracy measured overall correctness across categories, while precision and recall assessed the model’s ability to correctly predict and identify instances of each class, respectively. The F1-score, as the harmonic mean of precision and recall, was prioritized to account for potential class imbalance. MCC provided a balanced measure robust to skewed distributions, and ROC-AUC evaluated class separability by analyzing the trade-off between true positive and false positive rates. Per-class metrics were computed to identify performance variations across complaint categories. A confusion matrix was constructed to visualize prediction distributions, enabling analysis of misclassification patterns, particularly for classes with linguistic overlap. To ensure robustness, five-fold cross-validation was conducted, splitting the dataset into training and validation subsets to assess performance stability. Hyperparameter sensitivity was tested by varying tree depth and number of estimators, ensuring optimal model configuration.

Model interpretability was evaluated using SHAP (SHapley Additive exPlanations) values to quantify the contribution of features to predictions. Both textual features (derived from TF-IDF vectorization of complaint descriptions) and structured features (e.g., equipment type, organizational unit, time of day) were analyzed. This approach identified which terms or contextual factors most influenced specific complaint categories, enhancing transparency and guiding operational refinements. IT support staff reviewed SHAP-derived insights to validate their alignment with real-world issue patterns.

Clustering performance was assessed by measuring how well complaints were grouped into semantically coherent categories. The silhouette score was used to evaluate intra-cluster cohesion and inter-cluster separation, with higher scores indicating well-defined clusters. To benchmark the BERT-based K-Means clustering, a baseline was established using TF-IDF vectorization combined with K-Means. The comparison focused on how effectively each method captured semantic relationships in complaint texts. Robustness was tested by experimenting with different numbers of clusters (k = 3–6) and alternative clustering algorithms (e.g., HDBSCAN, BERTopic). The Elbow method and silhouette analysis informed the optimal k value. Additionally, domain experts (IT professionals) qualitatively reviewed a sample of clustered complaints to assess alignment with operational issue taxonomies, providing feedback on cluster coherence and potential misgroupings due to domain-specific jargon.

To identify rare or emerging complaint types, an Isolation Forest algorithm was applied to the BERT-based embeddings after dimensionality reduction via principal component analysis (PCA). The goal was to identify rare or emerging complaint types as semantic outliers. Since labeled anomalies were unavailable, an unsupervised validation approach was adopted. Anomaly scores were analyzed alongside cluster assignments, silhouette coefficients, and distances to cluster centroids to quantify deviations from typical complaint patterns. Temporal and organizational characteristics of flagged complaints (e.g., reporting times, equipment types, department associations) were examined to contextualize anomalies. Robustness was assessed by testing different contamination rates to balance sensitivity and specificity. A subset of flagged anomalies was manually audited by domain experts to confirm their novelty or relevance, ensuring the method’s practical utility.

The operational impact was evaluated by integrating the models into the agency’s complaint management workflow and measuring efficiency gains. Key performance indicators included reductions in repeated complaint categories, time saved in classification and routing, and decreases in user-reported errors post-intervention. These metrics were tracked using helpdesk logs over a six-month period, with pre- and post-deployment comparisons to quantify improvements. User interface adjustments, informed by cluster and anomaly insights, were tested to assess their effect on complaint clarity and resolution rates. Feedback from support staff was collected to evaluate the system’s usability and alignment with operational needs.

3.6. Deployment

The deployment phase focused on embedding the developed AI framework into the operational context of public service support environments. This involved integrating the model outputs into existing helpdesk and monitoring systems to ensure that technical staff could easily access classification results and cluster patterns without disrupting existing workflows. The system was designed to trigger automated alerts whenever predictive signals indicated an increase in complaints associated with a specific cluster, allowing for proactive remediation and resource planning. To support broader adoption and usability, a web-based visualization dashboard was developed, enabling non-technical administrative staff to monitor complaint trends, identify recurring issues, and act on system recommendations in a user-friendly interface.

4. Results

This section presents the results obtained from applying the proposed AI-driven framework to manage and classify informatics complaints within a public sector environment. The process begins by determining the optimal number of clusters to represent the diverse categories of complaints. Using the Elbow method, the inertia curve was analyzed (Figure 5), and a clear inflection point was observed at four clusters. This indicates that four clusters provide a suitable balance between capturing the complexity of the data and maintaining model interpretability, avoiding overfitting.

Subsequently, BERT-based sentence embeddings were generated from the cleaned textual descriptions of complaints. These embeddings were clustered using the K-Means algorithm, and a two-dimensional UMAP projection was employed to visualize the clustering outcome (Figure 6). The UMAP projection shows a visible separation between groups, with some degree of overlap, which is expected given the semantic similarities among certain complaint types. Nonetheless, the overall distribution suggests that the model effectively captures underlying patterns and organizes complaints into coherent operational categories.

To assess the robustness of the clustering structure beyond the spherical cluster assumption inherent to K-Means, two additional models were tested: HDBSCAN (a density-based clustering method) and BERTopic (a transformer-based topic modeling framework combining class-based TF-IDF with dimensionality reduction). Both approaches were evaluated using silhouette score and qualitative inspection of cluster coherence.

HDBSCAN yielded fewer, broader clusters with lower silhouette scores (mean = 0.29) compared to K-Means (mean = 0.47), suggesting that the density-based approach was less effective in this context, likely due to uneven density in the embedded space. BERTopic, on the other hand, produced interpretable topic groups but exhibited instability across runs and fragmented semantically similar complaints across multiple topics. While BERTopic was informative for exploratory analysis, its performance in producing operationally actionable clusters was inferior to K-Means, especially when integrated with the dashboard tooling.

Given these comparative findings, K-Means was retained for the main clustering pipeline due to its superior quantitative cohesion and alignment with the helpdesk’s operational taxonomy.

Based on the results, the following 4 clusters emerge, see Table 2. Table 2, presents the four main complaint categories resulting from BERT-based semantic clustering, along with their operational descriptions and representative examples. Each cluster captures a distinct class of issues reported by users, ranging from startup errors and system performance problems to peripheral hardware failures and equipment management needs.

Table 3 presents the resulting cluster labels alongside the top 10 most salient keywords per group, offering a semantic summary of the primary issues reported. This labeling process supports downstream tasks such as monitoring issue trends and designing targeted interventions.

Building on the clustering results, the framework incorporates a supervised classification model to predict the cluster label of new incoming complaints, enabling automated categorization and real-time decision-making.

To automate the semantic categorization of new complaints, a supervised classifier was trained using a Random Forest, SVM, support vector machine (SVM), and XGBoost. The dataset was split 70/30 using stratified sampling, yielding 8990 records for training and 3853 for testing. To address the class imbalance, the classifier employed automatic class weighting.

To empirically justify the selection of the Random Forest classifier, we conducted a comparative performance evaluation against two widely used alternatives: support vector machine (SVM) and XGBoost. All models were trained on the same stratified dataset and evaluated using macro-averaged metrics across the four predefined complaint categories. As shown in Table 4, Random Forest achieved the highest scores in accuracy (0.83), precision (0.81), and F1-score (0.80), along with the best Matthews correlation coefficient (MCC) of 0.71. These results confirm that Random Forest provides a balanced and interpretable performance suitable for deployment in public sector environments. While XGBoost and SVM delivered competitive results, their slightly lower scores and increased computational complexity made them less optimal for the real-time, resource-constrained context of this application.

Performance was evaluated using precision, recall, F1-score, AUC, and Matthews correlation coefficient (MCC). Table 5 summarizes the per-class results:

The overall MCC was 0.71. The ROC-AUC was consistently high across classes, with particularly strong separability for software-related complaints (Cluster 1). Misclassifications were most frequent between Cluster 2 and Cluster 3, reflecting the operational overlap between peripheral issues and equipment requests.

Building on these results, the classification model was integrated into the operational workflow to enable automated and real-time categorization of new complaints. This integration supports proactive triage by immediately assigning incoming tickets to their most likely semantic cluster, facilitating prioritization and routing. Additionally, the model’s predictions feed into a suite of analytical dashboards, allowing support teams to monitor evolving patterns, identify emerging issues, and adjust resource allocation accordingly. This tight coupling between predictive modeling and operational tooling ensures that the insights generated are actionable and continuously updated as new data arrives.

To enhance model interpretability, SHAP (SHapley Additive exPlanations) values were computed for representative classification outputs. The SHAP analysis quantifies the impact of each input feature—whether structured or textual—on the final prediction made by the Random Forest classifier. Figure 7 illustrates the most influential features, highlighting the words “install”, “slow” and “printer” as the top positive contributors to the model’s prediction, while the “unit=Finance” feature had a negative impact.

The CRISP-DM methodology was not used merely as a static structure, but as an iterative workflow where findings from each phase informed and refined subsequent steps. As shown in Figure 8, iterative refinement across stages—starting from raw text to post-confusion restructuring—resulted in steady gains in both classification (F1 Score) and clustering quality (silhouette score). Each improvement cycle reflected a re-evaluation of earlier decisions, such as redefining entity normalization rules after error pattern analysis, or adjusting vector representations based on misclassification trends.

To support monitoring and operational integration, a set of dashboards was developed. Appendix A presents the equipment inventory landscape, helping contextualize potential sources of technical issues. Appendix B highlights the distribution of personnel and contact structures, which is essential for understanding user demographics and support routing. Appendix C details the distribution and temporal trends of technical assistance requests (PATs), including their classification into semantic clusters and priority levels, providing actionable insights for proactive resource allocation and service improvement.

The set of dashboards presented in the Appendix A, Appendix B and Appendix C offers a multidimensional analysis of informatics infrastructure, workforce distribution, and technical assistance trends within the public sector. Appendix A reveals a substantial concentration of generic equipment (38,311 units), yet computers (10,751 units) and printers (1345 units) represent critical assets for daily operations. The equipment status breakdown highlights that a large portion of assets are either new or actively in use, though a non-negligible share remains decommissioned or not repairable, suggesting potential bottlenecks in lifecycle management. Appendix B shifts focus to the human dimension, showing a predominantly female workforce and emphasizing the need for gender-aware support strategies. It also outlines communication channels, with email and location data as the dominant contact types, reinforcing the importance of centralized and updated staff directories. Appendix C delves into complaint management via PAT records, showing that medium-priority requests dominate, particularly from large units such as DRA and DRPA. The cluster analysis identifies key issue types—especially software reinstallation (36.22%) and startup/equipment reassignment (31.81%)—while the temporal distribution highlights a sharp increase in PATs after 2015, followed by a recent decline. Collectively, these dashboards provide a holistic view of operational demands and areas for optimization in technical support and digital asset management.

In addition to evaluating classification and clustering performance, we examined the downstream impact of the dashboard deployment on operational outcomes. Notably, a 27% reduction in repeated complaint categories was observed in the six months following deployment. To investigate whether this reduction could be causally attributed to the intervention rather than to temporal variation or exogenous factors, an interrupted time series (ITS) analysis was conducted. Complaint frequency per category was aggregated weekly over a 12-month window, with the dashboard introduction defined as the intervention point. A segmented regression model was applied to detect changes in level and slope, accounting for autocorrelation and seasonality. The analysis revealed a statistically significant level change (p < 0.01) and a negative post-intervention trend (p < 0.05), supporting the hypothesis that the observed reduction is associated with system deployment rather than coincidental fluctuations. This strengthens the causal interpretation of the intervention’s effectiveness in reducing redundant complaints by improving feedback clarity and issue resolution efficiency.

To further substantiate the impact of the dashboard deployment, we conducted an interrupted time-series (ITS) analysis to evaluate whether the observed reduction in repeated complaints could be causally linked to the intervention. Weekly counts of semantically repeated issues were aggregated over a three-year period, with the dashboard rollout in early 2023 serving as the intervention point. As shown in Figure 9, the ITS model identified a statistically significant level shift following deployment, along with a continued decline in the post-intervention slope. The red trend line represents the segmented regression fit, with shaded 95% confidence intervals. The immediate post-deployment drop, combined with the sustained downward trajectory, supports the hypothesis that the dashboard contributed to the reduction in redundant complaints, rather than the effect being merely a product of natural fluctuation or seasonal cycles.

To assess the real-time feasibility of the system, we benchmarked the full inference pipeline consisting of sentence embedding generation, semantic clustering, and supervised classification. On a standard CPU machine, the sentence embedding step using a lightweight transformer (MiniLM-L6-v2) required approximately 0.18 s. The K-Means clustering prediction added an additional 0.01 s, and the Random Forest classification took 0.02 s. The total end-to-end latency for processing a single complaint was approximately 0.21 s, supporting the framework’s deployment in interactive, user-facing scenarios.

5. Discussion

The application of AI-driven analytics to user interaction logs yielded meaningful insights into the patterns underlying informatics complaints in the public sector. The semantic clustering of free-text error descriptions revealed strong alignment with known sources of user frustration—such as ambiguous error messages, inconsistent data formats, and system bottlenecks during peak usage periods. Notably, the use of sentence embeddings allowed the model to group complaints that were lexically dissimilar but semantically equivalent, demonstrating the value of text mining in surfacing latent problem structures often overlooked by traditional keyword-based methods.

From an operational perspective, the implementation of the AI framework resulted in a measurable reduction in complaint volume by enabling early detection and preemptive mitigation of recurring errors. Automated classification of import errors and predictive alerts for high-risk submissions accelerated issue resolution, significantly easing the burden on support teams. Moreover, visual dashboards and explainable models empowered non-technical staff to interact with AI outputs in a meaningful way, reinforcing the system’s usability and practical value.

Internal monitoring after the system’s deployment showed promising operational results. Over a six-month observation period, there was a 27% reduction in repeated complaint categories, as measured by duplicate or semantically similar issues flagged in the system. Furthermore, the average time to classify and route complaints decreased by approximately 32%, based on support team handling logs. While these improvements were measured within a single agency and specific operational context, they highlight the potential of semantic clustering and predictive analytics to significantly enhance complaint management workflows. Future work will involve more extensive impact evaluation across multiple agencies and longer timescales to validate the scalability and generalizability of these gains.

While the framework demonstrated promising results in a Portuguese local government setting, its broader applicability across different administrative, linguistic, and cultural contexts remains to be explored. Public sector organizations vary significantly in their reporting structures, complaint typologies, and user behavior, which may impact both semantic clustering and classification performance. In particular, language-specific embeddings may need adaptation for non-Portuguese corpora, and taxonomies of technical issues may require localized calibration. To support transferability, the proposed framework was designed to be modular, allowing the substitution of language models, metadata fields, and priority schemas. Future work should involve cross-jurisdictional deployments in varied institutional contexts—including international agencies and multilingual environments—to systematically assess adaptability, retraining requirements, and governance alignment.

User-centric improvements were also observed. By identifying and addressing the most common causes of failure, the system facilitated small but impactful changes to the user interface, such as clearer input validation messages and automated guidance for correcting errors. These adjustments, informed directly by the AI-driven insights, enhanced user experience and reduced frustration at the source. To address potential bias, metadata on user gender was leveraged to support fairness auditing. Classification performance was disaggregated by gender, and demographic parity metrics were computed during model validation to detect disparities in predictive accuracy. These audits were reviewed by domain experts to ensure equitable model deployment across user groups.

In addition to demographic parity, future iterations of the framework will incorporate further fairness metrics to deepen algorithmic governance. For example, Equal Opportunity Difference will be used to assess disparities in true positive rates between different user groups, while SHAP-based attribution analysis will be extended to detect systematic differences in feature importance across organizational units or complaint categories. These metrics will provide a more granular view of fairness and complement the current demographic audits. Although some sensitive attributes were pseudonymized, the framework is designed to support subgroup-level auditing where ethical and legal constraints permit access to such data.

Nevertheless, ethical considerations were operationalized throughout the design and deployment of the system. The use of historical user interaction data followed strict data governance protocols, including pseudonymization of records, minimization of retained attributes, and internal access restrictions, in accordance with GDPR-compliant procedures. Although no direct personal identifiers were used in model training, certain optional fields such as email and phone numbers were included in dashboard views for administrative traceability. To mitigate re-identification risks, these fields were pseudonymized using irreversible hashing algorithms, and access to dashboards was limited to authorized personnel under role-based controls and audit logging. All modeling phases were subject to ex ante review by domain stakeholders to ensure alignment with administrative policies. The dataset was examined for representation bias across organizational units and equipment categories. Class imbalance was addressed using automatic class weighting during model training, and per-class performance metrics—such as F1-score and MCC—were used to detect disparities in predictive accuracy. Additionally, SHAP values were computed to explain classification outputs, highlighting the contribution of both structured features and textual terms. These explanations were presented to IT support staff to verify the interpretability and operational plausibility of the model’s decisions. This combination of fairness-aware preprocessing, performance disaggregation, and post hoc explainability constitutes a practical framework for responsible AI deployment in public sector complaint management.

The study also encountered certain limitations. The framework was tested within a single government entity, which may limit the generalizability of results to other contexts with different operational workflows or complaint structures. Additionally, the static nature of training data introduces the risk of model drift over time, particularly as user behavior, system updates, or policy changes alter the underlying patterns. Real-time adaptation remains a technical and organizational challenge, requiring continuous monitoring and retraining strategies that are not yet fully operationalized in the current implementation.

Another important limitation relates to the semantic clustering approach itself. While BERT-based embeddings significantly improve complaint grouping compared to keyword methods, they are not immune to semantic drift—especially in the presence of rare technical jargon, domain-specific expressions, or evolving terminologies. Misclusterings can occur when the model’s pretraining context differs from the operational vocabulary of a specific public agency. This suggests that domain-specific fine-tuning or adaptive retraining strategies may be necessary to maintain clustering accuracy over time, particularly as new complaint types or system updates emerge.

When compared to traditional manual filtering approaches, the AI-driven system proved significantly more scalable, consistent, and responsive. Manual methods often rely on rigid keyword lists and human interpretation, leading to inconsistencies in classification and slower turnaround times. In contrast, the automated approach provided dynamic, adaptive classification capabilities, reduced labor requirements, and introduced a level of insight that manual methods are unlikely to achieve at scale.

In sum, this discussion illustrates how the integration of AI techniques in public sector complaint management can improve both operational efficiency and user satisfaction—while also surfacing important ethical and practical considerations that must guide future deployments.

6. Conclusions

This paper put forward an AI-based complaint management system designed to deal with the mounting complexity and volume of informatics complaints in the public sector. The system, with the help of data mining and advanced text mining techniques, standardizes semantically similar complaints but lexically different, rendering classification and solutions more accurate and proactive. It was tested with a real case study on a Portuguese local government agency, demonstrating field feasibility and implementation benefits in improved management of data importation faults and support requests.

A key strength of this work is the systematic application of the CRISP-DM process, with a sequence of business understanding through to deployment that is easy to follow. Each phase allowed for alignment of technical development with institutional goals, so that the system addressed analytical and operational requirements. The integration of BERT-based sentence embeddings and real-time dashboards was particularly significant in solving problems of variable naming conventions, unstructured log data, and slow detection of issues.

Besides technical realization, the research proposes important policy and governance considerations for the digitalization of public services. Therefore, the automation of complaint handling through AI should be supported by robust data governance structures that ensure algorithmic transparency, the ethical treatment of users’ data, and clear human accountability. Government sector organizations’ introduction of such systems must invest in AI literacy and facilitate inter-disciplinary coordination among IT, legal, and administrative departments.

Looking ahead, near-future work will focus on the framework capability enhancement through the addition of state-of-the-art anomaly detection methods, exploration of multilingual complaint processing, and horizontal scaling to deploy across agencies. To achieve this vision, it will be necessary to address technical interoperability challenges and harmonize governance standards across diverse organizations.

Overall, this study illustrates how AI, when used in a strategic manner, can be leveraged to create more responsive, transparent, and citizen-satisfying public services—paving the way for smarter, ethical, and networked digital governance.

Author Contributions

Conceptualization, M.E. and D.F.; methodology, M.E.; software, T.A.M. and R.A.M.; formal analysis, M.E.; investigation, M.E.; resources, M.E.; data curation, D.F.; writing—original draft preparation, M.E. and D.F.; writing—review and editing, J.C.F.; visualization, D.F., T.A.M., and R.A.M.; supervision, J.C.F.; project administration, P.V.P. All authors have read and agreed to the published version of the manuscript.

Funding

Please This work was supported in part by Project P6.1—Centralized Data Management Using BI (Business Intelligence) and AI (Artificial Intelligence), under Investment TC-C19-i05-RAM—Digital Transition of Public Administration in the Autonomous Region of Madeira, funded by the Portuguese Recovery and Resilience Plan (PRR).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available based on DRI approval.

Acknowledgments

We would like to thank the DRI team in Madeira, especially Paulo Rodrigues and Helder Pestana, for their support in providing data access and for their valuable collaboration throughout the research process.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Overview of Dashboard of Equipment Inventory and Usage in the Public Sector, Categorized by Equipment Class

Figure A1. Dashboard of Equipment Inventory and Usage in the Public Sector.

Appendix B. Dashboard of Distribution of Employees Across Public Sector Work Locations Filtered by Gender, Alongside an Overview of Employee Contact Types

Figure A2. Dashboard of Distribution of Employees Across Public Sector Work Locations.

Appendix C. Dashboard of PAT (Technical Assistance Requests) Distribution by Work Location, Cluster Type, and Priority Level

Figure A3. Dashboard of PAT (Technical Assistance Requests).

References

Li, X.; Shu, Q.; Kong, C.; Wang, J.; Li, G.; Fang, X.; Lou, X.; Yu, G. An Intelligent System for Classifying Patient Complaints Using Machine Learning and Natural Language Processing: Development and Validation Study. J. Med. Internet Res. 2025, 27, e55721. [Google Scholar] [CrossRef] [PubMed]
Zhou, M.; Liu, L.; Zhang, J.; Feng, Y. Exploring the role of chatbots in enhancing citizen E-participation in governance: Scenario-based experiments in China. J. Chin. Gov. 2025, 10, 1–32. [Google Scholar] [CrossRef]
Konstantinidis, I.; Kapantai, E.; Michailidis, A.; Deligiannis, A.; Berberidis, C.; Magnisalis, I.; Peristeras, V. From document-centric to data-centric public service provision. Digit. Gov. Res. Pract. 2024, 5, 1–27. [Google Scholar] [CrossRef]
van Noordt, C.; Misuraca, G. Artificial intelligence for the public sector: Results of landscaping the use of AI in government across the European Union. Gov. Inf. Q. 2022, 39, 101714. [Google Scholar] [CrossRef]
Falzone, S.; Gühring, G.; Jung, B. Machine Learning and Anomaly Detection for an Automated Monitoring of Log Data. In Artificial Intelligence for Security; Sipola, T., Alatalo, J., Wolfmayr, M., Kokkonen, T., Eds.; Springer Nature: Cham, Switzerland, 2024; pp. 295–323. [Google Scholar] [CrossRef]
Serey, J.; Alfaro, M.; Fuertes, G.; Vargas, M.; Durán, C.; Ternero, R.; Rivera, R.; Sabattin, J. Pattern Recognition and Deep Learning Technologies, Enablers of Industry 4.0, and Their Role in Engineering Research. Symmetry 2023, 15, 535. [Google Scholar] [CrossRef]
Mikalef, P.; Lemmer, K.; Schaefer, C.; Ylinen, M.; Fjørtoft, S.O.; Torvatn, H.Y.; Gupta, M.; Niehaves, B. Enabling AI capabilities in government agencies: A study of determinants for European municipalities. Gov. Inf. Q. 2022, 39, 101596. [Google Scholar] [CrossRef]
Mergel, I.; Dickinson, H.; Stenvall, J.; Gasco, M. Implementing AI in the public sector. Public Manag. Rev. 2023, 1–14. [Google Scholar] [CrossRef]
Mukonavanhu, T. An international comparison of the role of artificial intelligence in e-governance towards providing better standards of living. Electron. Gov. Int. J. 2025, 21, 137–157. [Google Scholar] [CrossRef]
Eliyan, A.F.; Marwat, S.N.K.; Qadir, J.; Al-Fuqaha, A. Informed Consent in IoT Environments: Challenges and a Futuristic Incentives-Based Model. IEEE Internet Things Mag. 2024, 7, 112–119. [Google Scholar] [CrossRef]
Majeed, A.; Khan, S.; Hwang, S.O. Toward Privacy Preservation Using Clustering Based Anonymization: Recent Advances and Future Research Outlook. IEEE Access 2022, 10, 53066–53097. [Google Scholar] [CrossRef]
Srinivasan, R.; Chander, A. Biases in AI Systems: A survey for practitioners. Queue 2021, 19, 45–64. [Google Scholar] [CrossRef]
Garshol, B.F.; Emberland, J.S.; Knardahl, S.; Skare, Ø.; Johannessen, H.A. The effect of the Labour Inspection Authority’s regulatory tools on compliance with regulations in the Norwegian home care services—A post-test-only control group study. Saf. Sci. 2025, 186, 106829. [Google Scholar] [CrossRef]
Bawack, R.E.; Wamba, S.F.; Carillo, K.D.A.; Akter, S. Artificial intelligence in E-Commerce: A bibliometric study and literature review. Electron. Mark. 2022, 32, 297–338. [Google Scholar] [CrossRef] [PubMed]
van Noordt, C.; Misuraca, G. Exploratory Insights on Artificial Intelligence for Government in Europe. Soc. Sci. Comput. Rev. 2022, 40, 426–444. [Google Scholar] [CrossRef]
Haug, N.; Dan, S.; Mergel, I. Digitally-induced change in the public sector: A systematic review and research agenda. Public Manag. Rev. 2024, 26, 1963–1987. [Google Scholar] [CrossRef]
Gesk, T.S.; Leyer, M. Artificial intelligence in public services: When and why citizens accept its usage. Gov. Inf. Q. 2022, 39, 101704. [Google Scholar] [CrossRef]
OECD. Artificial Intelligence for Real Impact. Available online: https://oecd-opsi.org/work-areas/ai/#:~:text=Governments%20use%20AI%20to%20design,limitations%20that%20must%20be%20considered (accessed on 5 June 2025).
Agarwal, P.K. Public Administration Challenges in the World of AI and Bots. Public Adm. Rev. 2018, 78, 917–921. [Google Scholar] [CrossRef]
Yuan, Q.; Chen, T. Holding AI-Based Systems Accountable in the Public Sector: A Systematic Review. Public Perform. Manag. Rev. 2025, 1–34. [Google Scholar] [CrossRef]
OECD. The Path to Becoming a Data-Driven Public Sector; OECD Digital Government Studies; Organisation for Economic Co-Operation and Development (OECD): Paris, France, 2019. [Google Scholar] [CrossRef]
Mergel, I.; Edelmann, N.; Haug, N. Defining digital transformation: Results from expert interviews. Gov. Inf. Q. 2019, 36, 101385. [Google Scholar] [CrossRef]
Bracci, E. The loopholes of algorithmic public services: An “intelligent” accountability research agenda. Account. Audit. Account. J. 2023, 36, 739–763. [Google Scholar] [CrossRef]
Chatterjee, S.; Khorana, S.; Kizgin, H. Harnessing the Potential of Artificial Intelligence to Foster Citizens’ Satisfaction: An empirical study on India. Gov. Inf. Q. 2022, 39, 101621. [Google Scholar] [CrossRef]
Merson, F.; Mary, R. A Text Mining Approach to Identify and Analyse Prominent issues from Public Complaints. Int. J. Adv. Res. Comput. Commun. Eng. 2017, 6, 54–59. [Google Scholar] [CrossRef]
Mahfud, F.K.R.; Tjahyanto, A. Improving classification performance of public complaints with TF-IGM weighting: Case study: Media center E-wadul surabaya. In Proceedings of the 2017 International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia, 24–25 November 2017; IEEE: New York, NY, USA, 2017; pp. 220–225. [Google Scholar] [CrossRef]
Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Pang, P.C.-I.; Wong, D.; Kan, H.Y. Natural Language Processing Adoption in Governments and Future Research Directions: A Systematic Review. Appl. Sci. 2023, 13, 12346. [Google Scholar] [CrossRef]
Wu, W.-N. Features of Smart City Services in the Local Government Context: A Case Study of San Francisco 311 System. In HCI in Business, Government and Organizations; Nah, F.F.-H., Siau, K., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12204, pp. 216–227. [Google Scholar] [CrossRef]
Yoo, Y. Toward Sustainable Governance: Strategic Analysis of the Smart City Seoul Portal in Korea. Sustainability 2021, 13, 5886. [Google Scholar] [CrossRef]
Hafidz, I.H.; Sulistya, A.; Lidiawaty, B.R. Sentiment Analysis of Public Complaints: A Machine Learning Comparison of SVM, Naive Bayes, Random Forest, and XGBoost. In Proceedings of the 2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), Bandung, Indonesia, 3–4 February 2025; IEEE: New York, NY, USA, 2025; pp. 1–6. [Google Scholar]
Manservisi, F.; Banzi, M.; Tonelli, T.; Veronesi, P.; Ricci, S.; Distante, D.; Faralli, S.; Bortone, G. Environmental complaint insights through text mining based on the driver, pressure, state, impact, and response (DPSIR) framework: Evidence from an Italian environmental agency. Reg. Sustain. 2023, 4, 261–281. [Google Scholar] [CrossRef]

Figure 1. AI-driven framework for managing informatics complaints in the public sector. The process integrates user interaction logs and informatics complaints using text mining and machine learning to classify and standardize issues, enabling proactive remediation and enhanced service delivery.

Figure 2. Application of the CRISP-DM framework for text mining in complaint management. The diagram illustrates how each phase—from business understanding to deployment—is used to standardize the representation of user-reported problems described in natural language, enabling improved classification, analysis, and decision support.

Figure 3. Data preprocessing workflow.

Figure 4. Overview of the text mining and modeling workflow.

Figure 5. Elbow method analysis for determining the optimal number of clusters. The plot shows the relationship between the number of clusters (K) and the corresponding inertia values. The point of inflection at k = 4 indicates the optimal trade-off between minimizing intra-cluster variance and avoiding overfitting, suggesting the selection of four clusters for subsequent semantic complaint grouping.

Figure 6. UMAP projection of clustered complaint embeddings. The figure shows a two-dimensional Uniform Manifold Approximation and Projection (UMAP) visualization of the BERT-based semantic embeddings of complaints. Each point represents a complaint, colored according to its assigned cluster. The spatial organization illustrates the degree of separation and semantic similarity between the identified complaint categories.

Figure 7. SHAP explanation for a complaint classified into Cluster 1—software reinstallation and slow performance. This bar plot illustrates the top features influencing the classification of a specific complaint. Positive SHAP values (in red) indicate a stronger contribution toward assigning the complaint to Cluster 1, with terms like “install,” “slow,”, and “printer” being the most impactful. Structured attributes such as equipment_type=PC also reinforced the classification, while unit=Finance (in blue) had a minor negative influence, slightly decreasing the model’s confidence in this label.

Figure 8. KPI performance improvements across CRISP-DM iteration stages.

Figure 9. Interrupted time-series analysis of repeated issues.

Table 1. Overview of attributes in the informatics complaint dataset.

Attribute	Description
Complaint ID	Unique identifier for each informatics complaint
Timestamp	Date and time when the complaint was logged
User ID	Identifier of the user who logged the complaint
Interaction Log	Sequence of actions or events from user interaction logs
Software/System	Name of the software or system involved
Complaint Text	Unstructured natural language description of the issue
Category	Initial categorization or classification of the complaint
Severity	Assessed impact or urgency level of the issue
Status	Current resolution status (e.g., open, closed)
Resolution	Outcome or actions taken to resolve the complaint

Table 2. Description of the four complaint clusters identified through semantic clustering.

Cluster	Description	Examples
Cluster 0—Startup Errors and Equipment Reassignment	Includes critical startup errors (e.g., hardware failures with beeps at startup) and administrative tasks like reconfiguring or reassigning equipment to new users.	- “Remove Dr. profile. Prepare the laptop for the new user...” - “The computer is beeping and won’t start the OS.” (2×)
Cluster 1—Software Reinstallation and Slow Performance	Issues related to general system slowness and requests for installation or reinstallation of base software, such as Office versions or the Windows operating system.	- “Uninstall English Office and install Portuguese.” - “The PC is very slow.” - “Install Windows XP and Office XP.”
Cluster 2—Peripheral Failures and Error Indicators	Covers failures in peripherals like monitors and printers, as well as physical failure indicators (e.g., intermittent sounds, red lights) that require technical diagnosis.	- “Monitor failure... request diagnosis.” - “PC with intermittent sound and red light.” - “Install multifunction printer.”
Cluster 3—Equipment Replacement and Migration	Concerns logistical requests such as assigning new computers, retrieving old equipment, and transferring data between users, indicating inventory management and continuity of work.	- “We don’t have a computer for this worker.” - “Replace Dr. computer...” - “Retrieve the old PC... need to copy the data.”

Table 3. Cluster labels and top keywords derived from semantic embeddings.

Cluster ID	Label	Top 10 Keywords
Cluster 0	Startup Failures	startup, boot, power, error, beep, system, bios, restart, failure, unresponsive
Cluster 1	Software Issues	install, office, update, license, slow, driver, program, setup, version, antivirus
Cluster 2	Peripheral Problems	printer, cable, sound, screen, port, keyboard, display, light, signal, connector
Cluster 3	Replacement and Migration	replace, data, old, copy, migrate, retrieve, reassign, swap, disk, transfer

Table 4. Comparative performance of three classifiers (Random Forest, SVM, XGBoost) for semantic complaint classification.

Model	Accuracy	Precision	Recall	F1-Score	MCC
Random Forest	0.83	0.81	0.79	0.80	0.71
SVM	0.78	0.76	0.74	0.74	0.64
XGBoost	0.80	0.78	0.75	0.76	0.67

Table 5. Per-class classification metrics.

Cluster	Precision	Recall	F1-Score	Support
Cluster 0	0.85	0.81	0.83	980
Cluster 1	0.88	0.91	0.89	1395
Cluster 2	0.75	0.70	0.72	710
Cluster 3	0.77	0.74	0.75	768
Macro Avg	0.81	0.79	0.80	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esperança, M.; Freitas, D.; Paixão, P.V.; Marcos, T.A.; Martins, R.A.; Ferreira, J.C. Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework. Appl. Sci. 2025, 15, 6673. https://doi.org/10.3390/app15126673

AMA Style

Esperança M, Freitas D, Paixão PV, Marcos TA, Martins RA, Ferreira JC. Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework. Applied Sciences. 2025; 15(12):6673. https://doi.org/10.3390/app15126673

Chicago/Turabian Style

Esperança, Marco, Diogo Freitas, Pedro V. Paixão, Tomás A. Marcos, Rafael A. Martins, and João C. Ferreira. 2025. "Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework" Applied Sciences 15, no. 12: 6673. https://doi.org/10.3390/app15126673

APA Style

Esperança, M., Freitas, D., Paixão, P. V., Marcos, T. A., Martins, R. A., & Ferreira, J. C. (2025). Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework. Applied Sciences, 15(12), 6673. https://doi.org/10.3390/app15126673

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Proactive Complaint Management in Public Sector Informatics Using AI: A Semantic Pattern Recognition Framework

Abstract

1. Introduction

2. Literature Review

2.1. AI in Public Administration

2.2. AI for Decision-Making and Policymaking

2.3. AI in Service Delivery

2.4. AI for Complaint Management and Pattern Recognition

3. Methodology: A CRISP-DM Approach to a Real Case

3.1. Business Understanding

3.2. Data Understanding

3.3. Data Preparation

3.4. Modeling

3.5. Evaluation

3.6. Deployment

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Overview of Dashboard of Equipment Inventory and Usage in the Public Sector, Categorized by Equipment Class

Appendix B. Dashboard of Distribution of Employees Across Public Sector Work Locations Filtered by Gender, Alongside an Overview of Employee Contact Types

Appendix C. Dashboard of PAT (Technical Assistance Requests) Distribution by Work Location, Cluster Type, and Priority Level

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI