Pattern Recognition and Deep Learning Technologies, Enablers of Industry 4.0, and Their Role in Engineering Research

: The purpose of this study is to summarize the pattern recognition (PR) and deep learning (DL) artiﬁcial intelligence methods developed for the management of data in the last six years. The methodology used for the study of documents is a content analysis. For this study, 186 references are considered, from which 120 are selected for the literature review. First, a general introduction to artiﬁcial intelligence is presented, in which PR/DL methods are studied and their relevance to data management evaluated. Next, a literature review is provided of the most recent applications of PR/DL, and the capacity of these methods to process large volumes of data is evaluated. The analysis of the literature also reveals the main applications, challenges, approaches, advantages, and disadvantages of using these methods. Moreover, we discuss the main measurement instruments; the methodological contributions by study areas and research domain; and major databases, journals, and countries that contribute to the ﬁeld of study. Finally, we identify emerging research trends, their limitations, and possible future research paths.


Introduction
With the creation of larger volumes of data, the need to analyze them keeps growing.The management of large compilations of data within an asymmetrical data structure is an ever growing problem for companies.Likewise, challenges on an operational skills level, in data integration and in informatics technological infrastructure, require continuous upgrades in terms of data management software for big data (BD).As a consequence, data management makes automatization essential for operations management, which results in BD management.For Ying et al. [1], the rapid evolution of information architecture, along with the ability to analyze, access, and manage BD, makes timely decision making increasingly critical.Currently, the management of heterogeneous complex data, all the faster in a BD environment, escapes human capacities.However, this does not apply to machine learning (ML).Data management demands a correlation, integration, and superposition of artificial intelligence (AI) with different intelligent computational techniques, such as ML, DL, BD, and data mining/science [2].AI and ML have been created to provide solutions to different problems related to massive data analyses, pattern identification, and automatized forms, and to generate accurate predictions.Disruptive technologies are techniques that offer research and development options and assist in the management of data.Among them, Symmetry 2023, 15, 535 3 of 29 WOS tool, Figure 1 illustrates the number of publications that demonstrate the use of AI methods in engineering.The data show that the use of most methods has increased in the last decade; however, the number of studies utilizing fuzzy logic techniques and k-means methods has not seen a significant change.Despite the growing popularity of neural networks, the number of new studies centered on their use has remained constant over the past six years.Conversely, the number of studies using ML, PR, MI, support vector machines (SVMs), and Bayesian methods has seen a noticeable uptick.Additionally, deep architectures, such as convolutional neural networks and deep belief networks, have been a trending area of research in recent years.
Symmetry 2023, 15, x FOR PEER REVIEW 3 of 32 is having a significant impact in all engineering domains.Through AI, data management is transforming the way business is conducted, leading to improvements in customer and provider services, competitive advantages, new business opportunities, efficiency and cost reduction, and personalized products and services [13].Additionally, using the Clarivate-WOS tool, Figure 1 illustrates the number of publications that demonstrate the use of AI methods in engineering.The data show that the use of most methods has increased in the last decade; however, the number of studies utilizing fuzzy logic techniques and k-means methods has not seen a significant change.Despite the growing popularity of neural networks, the number of new studies centered on their use has remained constant over the past six years.Conversely, the number of studies using ML, PR, MI, support vector machines (SVMs), and Bayesian methods has seen a noticeable uptick.Additionally, deep architectures, such as convolutional neural networks and deep belief networks, have been a trending area of research in recent years.

Scope and Motives for the Study
The field of artificial learning has seen the emergence of new techniques that can be used as intelligent methodologies for managing data.This study focuses on the use of PR/DL methods and explores their potential applications in the field of data management.By understanding the latest developments in PR/DL methods, we can assess their impact and how they can contribute to different domains and research areas.As such, the findings of this study can serve as a roadmap for discovering long-term solutions for PR/DL in data management engineering.
The purpose of this revision is to provide a summary of the theoretical foundations of the methods, to give a general understanding of their uses, review recent developments in the field, examine case studies and research proposals, and explore potential avenues for future research.

Scope and Motives for the Study
The field of artificial learning has seen the emergence of new techniques that can be used as intelligent methodologies for managing data.This study focuses on the use of PR/DL methods and explores their potential applications in the field of data management.By understanding the latest developments in PR/DL methods, we can assess their impact and how they can contribute to different domains and research areas.As such, the findings of this study can serve as a roadmap for discovering long-term solutions for PR/DL in data management engineering.
The purpose of this revision is to provide a summary of the theoretical foundations of the methods, to give a general understanding of their uses, review recent developments in the field, examine case studies and research proposals, and explore potential avenues for future research.
The main objective of this revision is to provide an in-depth examination of the key aspects of PR/DL, including the open challenges, potential applications, and perspectives on computational tools.
The main objective of this revision is to provide an in-depth examination of the key aspects of PR/DL, including the open challenges, potential applications, and perspectives on computational tools.
The solutions to many problems can be found in the data, and AI techniques, including PR/DL, are considered to be effective and dependable tools for managing data.PR/DL technologies have progressed due to the development of advanced algorithms, increased computational capabilities and performance, and reduced costs for hardware and firmware.This advancement has resulted in vast amounts of data requiring powerful algorithms for summarization and interpretation.These algorithms have enabled smart machines to not only capture and store data, but also perform complex generalization and abstraction processes, which are useful for tasks such as data mining, medicine, engineering, and computer vision classification, grouping, identification, and control.
This revision serves as an introduction to other topics within PR/DL.To make the study easier to follow, a conceptual framework was established to classify methodological proposals (Figure 2).In order to provide a comprehensive overview, Section 1 summarizes the current PR/DL domains used in various engineering applications.This framework divides PR/DL methodologies into eight domains: computer-aided diagnostics (CAD), architecture design, forecasting, control systems, fault detection, image analysis, data imputation, and security.In Section 2 of the theoretical framework, the main areas of study affected by recent developments in PR/DL over the past five years are highlighted.It is important to note that these areas are examined within the scope of this article.A total of 13 areas of study are included in the proposed conceptual framework: agriculture, medicine, telecommunications, business, transportation, computing, infrastructure, electric power, weather technology, manufacturing, marketing, and information security.Additionally, Section 3 presents the different computational paradigms discussed in this study and their interrelatedness.(CC), ML, smart sensors, data analytics, robotics, and others, has been the subject of previous research.Mathematical computational models are used in semantic segmentation to describe processes within organizations [14].Yaqoob et al. [15] discuss the use of blockchain, IoT, and AI for managing healthcare data systems.Izonin et al. [16] use AI techniques, primarily ANNs, for managing missing data in air monitoring.Zhuang et al. [17] propose a process traceability and data management system based on digital twins for complex products, and Wang et al. [18] conducts a survey to review recent research trends in AI for urban trajectory data management.They also describe the qualities that a trajectory data management system should possess to maximize flexibility.Kong et al. [19] propose, using a digital twin system in IoT, to provide efficient data management for building workshops, and Putz et al. [20] conducts semistructured interviews with experts to evaluate a model that uses digital twins in a decentralized data exchange.They also propose an access control model to address integrity and confidentiality in data management.
Recently, researchers have been comparing statistical and AI methods for inputting missing data in electrical energy.The experimental results, using 2 years of electrical energy data from Taiwan, show that AI methods generally have better performance than statistical methods [21].Pan and Zhang [22] developed a digital twin framework that integrates construction information modeling, IoT, and data mining for advanced project management.The results indicated that AI improves data communication and exploration, leading to a better understanding, prediction, and optimization of physical construction.Liu et al. [23] proposed an allocation model for corporate human resources that uses AI methodologies, specifically an ANN classifier, for data mining.Results showed that the model has a high ability to combine data in the ideal distribution of human resources.Jiang et al. [24] presented an integrated framework combining statistical techniques and AI for measuring and improving publicity in social media related to household waste management.Hashmi et al. [25] developed an architecture and design framework and implemented an automated electrical energy management system based on IoT and CC that generates a loading profile of the consumer in terms of the current, voltage, and power available through a portal.Shao et al. [26] conducted an exploratory study on the implementation of Industry 4.0 on a supply chain level and its data management system.
Previous research articles have highlighted the use of AI in data management, but they primarily focused on traditional techniques.This revision article aims to provide a broader perspective on the research efforts in the use of two emerging AI methods, PR/DL, in engineering for data management.

Main Contributions of This Paper
This revision article aims to (1) analyze and summarize the methodologies related to PR/DL technologies for data management over the past six years, (2) identify the key challenges, trends, and emerging future directions for using PR and DL in data management, and (3) discuss the limitations, performance, evaluation methods, and scope of the PR/DL methodologies reviewed for data management.
The structure of this revision article is as follows: Section 1 introduces several disruptive technologies used in data management and presents an authors' discussion about how AI-integrated advances are being used in conjunction with other technologies to collect and analyze data.Additionally, the main limitations and contributions of the study are discussed in this section.Section 2 describes the methodology used in the search and selection of relevant documents for this study.Section 3 conducts a literature review, introducing new PR and DL techniques, and providing a critical analysis of their relevance, challenges, limitations, and scope for data management.Additionally, this section includes a descriptive study analysis.Section 4 discusses potential future research paths and emerging trends for the use of PR/DL methods in data management.Finally, Section 5 concludes the literature review with a brief summary.

Literature Review
This section aims to examine, evaluate, and discuss the primary methodologies and algorithms of the latest generations of PR and DL applications for data management.Firstly, the main concepts of PR/DL were discussed through the theoretical framework.Afterwards, the information from each study was analyzed, extracted, and classified using a systematic literature review approach.

Pattern Recognition
The concept of PR was first introduced by Oliver Selfridge in 1955, defined as the extraction of significant characteristics from a group of irrelevant data [27].Recently, PR has been defined as a scientific discipline that aims to classify input data into classes or patterns by extracting significant properties that allow for separation among the classes being studied (classifying objects into categories).Real-world observations classified using a PR system are captured through sensors.Depending on the application, it can be used in image processing, video, text, electromagnetic signals, the web, sounds, and microarray gene data, odors, or any other type of measurement that requires classification [28].PR developments are mainly presented as scientific and engineering disciplines, such as biology, medicine, computer vision, AI, feature recognition, digital marketing, computer-assisted diagnostics, voice recognition, among others [29].
In PR, a pattern represents a description of an object.It refers to a group of attributes used to define an object (a category determined through shared attributes).A class of patterns is a group of similar patterns.The concept of a characteristic vector can be defined as a group of properties that distinguish object patterns.Only the properties that differentiate an object are retained.
PR consists of two phases: (1) the learning or training phase (original information retrieval, preprocessing, feature extraction, feature normalization, feature analysis, and feature selection) and (2) the classification or test phase (classifier design and performance evaluation).In the learning phase, the machine is trained through a pattern recognition system to recognize specific objects or patterns (extraction and selection module).During this phase, the classifier is trained and calibrated to divide the feature space.In the classification phase, the unknown pattern is compared using the trained classifier and classified into the class that it most closely resembles.The error classification rate is evaluated using the evaluation system module [30].
In summary, PR is a form of ML that is also a field within AI.It mainly relies on statistical and ML approaches, with an increasing focus on DL methods in recent years.According to Zhang et al. [31], a complete PR system includes data acquisition, processing, feature selection, retrieval, and decision-making classification.For Paolanti and Frontoni [32], depending on the presence or absence of prior knowledge, PR classification methods can be divided into four main groups: (i) supervised, (ii) unsupervised, (iii) semisupervised, and (iv) reinforced.Supervised PR uses labeled data to train the system and make predictions, while unsupervised PR uses multivariate algorithms to reveal similarity relations between data points and create clusters.Semisupervised PR uses predefined classes to find new relationships and define new groups, and reinforced PR uses a feedback mechanism and reward system to improve decision making iteratively.The training of a group of labeled data involves constantly calculating the cost-difference function (comparing predicted and actual outcomes) and adjusting the weight and biases values to obtain the lowest value.This process uses a gradient, which is the rate at which the cost changes based on the weight or bias values.In group (ii), classification examples are not available, so multivariable automatic classification algorithms are used to identify similarities between characteristic values and create clusters.Group (iii) uses predefined object classes to find new relationships and define new groups.Finally, in group (iv), decisions are improved through a feedback mechanism and reward system.Table 1 summarizes the main PR algorithms that use ML and DL statistical methods.According to Table 1, the types of problems addressed included classification, regression, and prediction.These structures mainly focused on solutions for artificial vision, computer-assisted diagnostics, acoustic recognition, and optical character recognition, among others.In supervised learning, the main methodologies used were ANN, SVM, RF, decision tree, NB, and KNN algorithms.The most common classifiers were SVM, KNN, NB, and minimal distance.Clusterizers such as K-means, balanced iterative reducing and clustering hierarchies (BIRCHs), self-organizing map, wave clusters, mean-shift, the density-based spatial clustering of applications with noise (DBSCAN), fuzzy k-medias, fuzzy C-means, k-medias sequential, CHAMELEON, hierarchy clusters, clustering using representative (CURE), clustering in quest (CLIQUE), expectation-maximization algorithm, and the statistical information grid-based method (STING) were used.The most commonly used preprocessing methods were a combination of characteristics and stepwise selection.For data dimension and characteristic selection, the most used methods were principal component analysis, quadratic discriminant analysis, linear Fisher discriminant, LDA, Bayes' theorem, and wrapper methods.Among statistical methods, generative models, such as the hidden Markov model (HMM), Gaussian distribution, KNN, latent Dirichlet allocation, and Parzen window, were most used.The same situation happened with discriminative models such as SVM and decision trees.
From Table 1, 54 research articles, four review articles, and two polls composed the most significant contributions.The most used PR method for data management was CNNs, used in 16% of publications, followed by ANN methodological frameworks, used in 13% of publications.Characteristic selection methodologies such as PCA were used in 10% of publications.Other data partition methods based on the well-known centroid algorithm, k-means, were used in 7% of publications.NN, SVM, and LDA algorithms were each used in 5% of publications, and SOM was used in 3%.Finally, methods such as the hierarchical cluster analysis, random forest, concentrated method, Bayesian network, mixture regression, radial basis neural network, deep belief networks, dynamic time warping, SAE, Gaussian distribution, hidden Markov model, spiking neural networks, fast Fourier transforms, k-medoids, and autoregressive model each accounted for 2% of the publications.

Deep Learning
When selecting DNN, it is important to consider the type of classification or pattern recognition being used and whether unsupervised learning can be utilized.DNNs are a type of ML model that can be used for both supervised and unsupervised learning, and are effective in analyzing large amounts of data.They are characterized by having many layers, leading to a higher level of complexity [92].The number of layers and type of neural network used in a DNN are chosen, and the training process is used to determine the weights.Currently, there are several popular types of DNNs, including multilayer perceptron (MLP) [93], convolutional neural network (CNN) [94], recurrent neural network (RNN) [95], generative adversarial networks (GANs) [96], the deep belief network (DBN) [97], SAE [98], and graph neural networks (GNNs) [99].Some authors refer to DNNs in general without specifying the specific type of architecture.Table 2 provides a summary of the main DL methods used for data management.According to Table 2, the main types of problems addressed using deep learning methods were pattern classification, feature learning, and feature representation.These methods were primarily used in the fields of computer vision, process control, multimedia analysis and understanding, image recognition, image super-resolution, data recovery, understanding, transmission, and NLP.The literature review showed that in supervised learning, the most used techniques were CNNs, DNNs, and RNNs, including the gated recurrent unit (GRU) and LSTM approaches.In unsupervised learning, generative networks, dimensional reduction, clustering (GANs, autoencoders, restricted Boltzmann machine (RBM), etc.), and RNNs, including GRU and LSTM approaches, were widely used.For sequential data processing and retrieval tasks, such as image/video/music analyses and NLP, RNNs were commonly used, and its extension, recursive neural networks, were shown to be effective.
From Table 2, 54 research articles, 1 literature review article, and 5 polls were found to be the most significant contributions in the field of deep learning for data management.The most used deep learning method was the CNN, which was used in 22% of the publications.The second most used method was DNNs, which were used in 13% of the publications.Other commonly used methods for pattern identification and classification included DBNs, RNNs, MLPs, and SAEs, each being used in 10% of the publications.Another unsupervised deep learning method, GANs, was used in 8% of the publications.Finally, GNNs, a type of algorithm designed to perform inferences on data represented by graphics, were used in 7% of the publications.

Methodology
A systematic literature review (SLR) was conducted, which involved a thorough search of the most relevant pattern recognition/deep learning (PR/DL) artificial intelligence studies in the field of data management.SLRs are studies that synthesize available scientific evidence.In this study, a review of qualitative and quantitative aspects of previous studies was performed with the aim of analyzing, classifying, and summarizing the existing information on the topic.The SLR methodology included a content analysis, a valid technique for studying scientific documents [165].The content analysis was used to thoroughly read the articles and select the relevant information from each of the studies included in the SLR.To successfully conduct an SLR, it is necessary to have a good understanding of the methodologies used in previous studies [166][167][168][169][170][171][172][173].
The goal of this literature review was to examine the main PR/DL methodologies as reliable and efficient tools in the field of data management.The methodological development of this study provided an understanding of the most recent PR/DL algorithms used in data management in engineering.Hamid et al. [174] used an SLR to analyze the methodological development of AI for big data processes in order to identify the main challenges and gaps in the use of application systems for smart tourism.
Ribeiro-Navarrete et al. [175] used an SLR to research AI technologies for controlling COVID-19 and future pandemics by analyzing the gathering and massive data management from users' mobile devices.Similarly, Garg and Mago [176] also used an SLR to summarize and describe the role of various ML techniques used in various medical applications.

Study Search
The research employed the Prisma methodology [177,178], as outlined in Figure 3.
Articles from 2016 to 2021 were included in the study.The search was conducted using Google Scholar and digital platforms, such as Springer Link, Emerald Insight, Science Direct, Wiley Online Library, Taylor & Francis Group, and IEEE Xplore Digital Library, among others.The main focus of the study was to identify the key types of studies, dominions, case structures, and algorithms that contributed to the development of methodologic proposals for data management using PR and DL.
"artificial neural networks", and other techniques used for retrieving, analyzing, and managing big data, specifically limited to methodologic proposals for PR/DL.
As a result, 1087 articles were identified in the first step.The second step was the selection of the most relevant titles.The third step was the summary reading.The fourth and last steps comprised completely reading the articles.After the above, the studies were reviewed according to the exclusion criteria (Table 3).Finally, 120 articles were selected to be a part of the SLR.

Study Selection
The strategy for selecting studies in this research included six inclusion criteria and seven exclusion criteria.The articles had to pertain to the research question of the study and have a focus on PR/DL methodologies using AI.The selection process also considered two quality measures, the h5 index from Google Scholar [179] and the SJR (Q1 and Q2) from the SCImago journal rank [180].A summary of these criteria can be found in Table 3.The initial search for the term ML returned over four million results; therefore, it was necessary to refine the search by using an advanced search method.This included using AND/OR connectors with the following keywords: "data management" AND "deep learning", "data management" AND "pattern recognition", "data management" AND "artificial neural networks", and other techniques used for retrieving, analyzing, and managing big data, specifically limited to methodologic proposals for PR/DL.
As a result, 1087 articles were identified in the first step.The second step was the selection of the most relevant titles.The third step was the summary reading.The fourth and last steps comprised completely reading the articles.After the above, the studies were reviewed according to the exclusion criteria (Table 3).Finally, 120 articles were selected to be a part of the SLR.

Inclusion Criteria
Exclusion Criteria Studies answer directly to research question of the study.Not written in the English language.Studies must clearly show an AI focus using PR or DL during a large portion of its methodology.Lack of focus on domains and research areas.No use of PR/DL methodologies.If the studies were published in more than one magazine or conference, the most recent version is included.

Study Selection
The strategy for selecting studies in this research included six inclusion criteria and seven exclusion criteria.The articles had to pertain to the research question of the study and have a focus on PR/DL methodologies using AI.The selection process also considered two quality measures, the h5 index from Google Scholar [179] and the SJR (Q1 and Q2) from the SCImago journal rank [180].A summary of these criteria can be found in Table 3.If the studies were published in more than one magazine or conference, the most recent version is included.

Descriptive Study Analysis
Figure 4, using the VOSviewer software, summarizes the main keywords found in the SLR.The distribution of publications by quartile showed that the study was mainly limited to high-impact journals in the engineering area; for this reason, 116 studies were a part of quartile one and only three were a part of quartile two [39,42,52,135] (Figure 5).The distribution of publications by quartile showed that the study was mainly limited to high-impact journals in the engineering area; for this reason, 116 studies were a part of quartile one and only three were a part of quartile two [39,42,52,135] (Figure 5).The number of references per year, showed in Figure 6, confirmed the growing interest of researchers in PR/DL methodological proposals for data management.According to the inclusion and exclusion criteria of this study, in the year 2021, 31 articles were published.The number of references per year, showed in Figure 6, confirmed the growing interest of researchers in PR/DL methodological proposals for data management.According to the inclusion and exclusion criteria of this study, in the year 2021, 31 articles were published.The number of references per year, showed in Figure 6, confirmed the growing interest of researchers in PR/DL methodological proposals for data management.According to the inclusion and exclusion criteria of this study, in the year 2021, 31 articles were published.The distribution of studies by domain showed that the SLR used a total of eight domains with 120 articles.The DL image analysis domain provided the largest contribution with 15 studies.In the second position, the DL forecast domain contributed 13.In the third position was PR control systems (with 13).In the fourth position was PR architecture design (with 11).The fifth position was held by PR fault detection (with nine).In the sixth position, PR-CAD had seven.The seventh position was given to DL data imputation with five.Finally, the eighth position went to DL security with two (Figure 7).The distribution of studies by domain showed that the SLR used a total of eight domains with 120 articles.The DL image analysis domain provided the largest contribution with 15 studies.In the second position, the DL forecast domain contributed 13.In the third position was PR control systems (with 13).In the fourth position was PR architecture design (with 11).The fifth position was held by PR fault detection (with nine).In the sixth position, PR-CAD had seven.The seventh position was given to DL data imputation with five.Finally, the eighth position went to DL security with two (Figure 7).The number of references per year, showed in Figure 6, confirmed the growing interest of researchers in PR/DL methodological proposals for data management.According to the inclusion and exclusion criteria of this study, in the year 2021, 31 articles were published.The distribution of studies by domain showed that the SLR used a total of eight domains with 120 articles.The DL image analysis domain provided the largest contribution with 15 studies.In the second position, the DL forecast domain contributed 13.In the third position was PR control systems (with 13).In the fourth position was PR architecture design (with 11).The fifth position was held by PR fault detection (with nine).In the sixth position, PR-CAD had seven.The seventh position was given to DL data imputation with five.Finally, the eighth position went to DL security with two (Figure 7).According to the conceptual framework for the SLR, the top 13 areas that comprised the greatest methodological contributions were (1) DL computing (20 studies), ( 2) PR technology (10 studies), (3) PR infrastructure (8 studies), ( 4) engineering solutions in DL medicine (7 studies), ( 5) PR/DL manufacturing processes (6 studies), ( 6) PR/DL transport, electrical energy, and telecommunications (5 studies each), (7) PR agriculture (3 studies), ( 8) smart tools for decision making PR/DL business (2 studies each), and (9) DL marketing, weather conditions, and informatic security (1 study each) (Figure 8).According to the conceptual framework for the SLR, the top 13 areas that comprised the greatest methodological contributions were 1) DL computing (20 studies), 2) PR technology (10 studies), 3) PR infrastructure (8 studies), 4) engineering solutions in DL medicine (7 studies), 5) PR/DL manufacturing processes (6 studies), 6) PR/DL transport, electrical energy, and telecommunications (5 studies each), 7) PR agriculture (3 studies), 8) smart tools for decision making PR/DL business (2 studies each), and 9) DL marketing, weather conditions, and informatic security (1 study each) (Figure 8).medicine (7 studies), 5) PR/DL manufacturing processes (6 studies), 6) PR/DL transport, electrical energy, and telecommunications (5 studies each), 7) PR agriculture (3 studies), 8) smart tools for decision making PR/DL business (2 studies each), and 9) DL marketing, weather conditions, and informatic security (1 study each) (Figure 8).The journals with the most publications were IEEE Access (with eight publications), Nature Communications, (with six) and Pattern Recognition (with five) (Table S1).Table S1 includes the author, year of publication of the article, name of the journal, h5 index (median), SJR index, SJR quartile, and the number of citations that the articles received as of the aforementioned date.
The country with the most research on methodological proposals for PR/DL was China, with 43 publications, followed by the USA with 19 publications and the United Kingdom with 5 publications (Figure S1).The ScienceDirect and IEEE databases had the most publications on this topic, with 39 publications each, followed by the Springer database with 12 publications (Figure S2).
In the SLR study, the main performance metrics used to evaluate the DNN algorithms were the correlation coefficient (R), mean absolute percentage error (MAPE), area under the curve (AUC), mean absolute error (MAE), root mean squared error (RMSE), variance of absolute percentage error (VAPE), and root mean square prediction error (RMSEP).Through an analysis of these statistical indicators, it was found that The journals with the most publications were IEEE Access (with eight publications), Nature Communications, (with six) and Pattern Recognition (with five) (Table S1).Table S1 includes the author, year of publication of the article, name of the journal, h5 index (median), SJR index, SJR quartile, and the number of citations that the articles received as of the aforementioned date.
The country with the most research on methodological proposals for PR/DL was China, with 43 publications, followed by the USA with 19 publications and the United Kingdom with 5 publications (Figure S1).The ScienceDirect and IEEE databases had the most publications on this topic, with 39 publications each, followed by the Springer database with 12 publications (Figure S2).
In the SLR study, the main performance metrics used to evaluate the DNN algorithms were the correlation coefficient (R), mean absolute percentage error (MAPE), area under the curve (AUC), mean absolute error (MAE), root mean squared error (RMSE), variance of absolute percentage error (VAPE), and root mean square prediction error (RMSEP).Through an analysis of these statistical indicators, it was found that accuracy was the most influential factor, accounting for 27% of the metrics.It was closely followed by performance at 24%, convergence at 18%, speed at 11%, tolerance to failure at 7%, volume at 6%, scaling at 4%, and latency at 3%.
The study found that current DNN models, such as the CNN, DBN, SAE, RNN, MLP, and GAN, were the most used techniques for PR among other ANN models.Models such as reservoir computing, the time-delay neural network (TDNN), transformers, SOMs, the radial basis function network (RBFN), single-layer perceptron (SLP), probabilistic neural network (PNN), and RBM also showed good performance in PR applications.Additionally, new models with methodological proposals were proposed to address limitations or issues that may arise with DNN models.

Discussion
The 4.0 data ecosystem, which merges the real and virtual worlds, necessitates the integration of AI and BD to foster ML systems.The interconnectivity between systems and computers enables the processing of BD from the IoT and local supply chains.The ability to analyze BD is possible due to the use of MI generated through AI techniques.These ML systems can now extract valuable insights by interpreting BD into actionable ideas and, in many instances, they enable autonomous decision making without human intervention.
The BD analysis is a complex system involving multiple variables that need to be identified and processed.If the goal of the ML model is to predict continuous variables, it performs regression.However, if the objective is to predict discrete variables, it performs classification.Forecasting is a high-value research domain that has been a trend in scientific discussion in recent years, driven by ML technologies.The SLR identified important PR/DL methodological advancements in areas such as energy consumption forecast, climate, commodity price, traffic flow, accident forecast, tourist demand, and other topics.Today, these constant methodological advancements have led to the development and improvement of DNN proposals, resulting in exceptional predictive performances.As a result, DNN has become an essential tool for PR in a wide range of applications, such as image classification, iris scanning, object detection, fingerprint scanning, video compression, and optic character recognition, among others.In comparison, traditional PR methods have certain disadvantages, such as the need for a greater effort to design and learn significant and high-level characteristics, the limited learning of superficial characteristics, and the need for well-labeled BD to train the models, limiting them to learn from static data.
In PR systems, the performance of an algorithm mainly depends on its parameters.The algorithms used must be able to understand and take into account different factors that affect the distribution of input data, as this allows for variations in the real world.Algorithms that can abstract and process a wide range of phenomena have the strongest predictive power.While DL methods require a larger volume of data compared to statistical and ML methods, the processing of a large volume of data positively correlates with the performance of all three methods.
According to the SLR, the success of PR applications relies heavily on the methodological advancements in statistical and ML methods, with ML and its evolution towards DL being powerful and efficient tools for managing BD.The constant evolution and development of DL methodologies has led to a new paradigm for ML, where the constant optimization of its algorithms results in new and improved capabilities.
PR methods have a clear capacity for managing big data; their ability to identify specific trends and patterns gives them a significant advantage, making PR an essential support tool for decision-making in engineering.PR's capacity for data retrieval, classification, and categorization allows it to gain experience and acquire knowledge, resulting in improved accuracy and efficiency.PR methods are precise in handling multivariable and multidimensional data and can perform well in changing or unknown environments.
The SLR study on PR applications in engineering for data management identified several challenges, including time consumption, mainly due to the complexity and scale of the process (handling large amounts of data in the form of megabytes, gigabytes, and even terabytes).Insufficient data can lead to problems with accuracy and complicate object recognition logic.Additionally, the SLR highlighted other challenges, such as overfitting, difficulties in object detection, noise (image restoration), uncertainty, difficulty in recognizing identical images, data issues (incomplete, dispersed, and dynamic), complex relationships between fields, challenges in voice recognition accuracy, hierarchy, and data, complexity of DNN models, recognition issues with surface materials, interpretability of results, incorporating domain knowledge, more accurate recognition of unsupervised activities, active user interaction, flexible models for recognizing high-level activities and integration with other systems, and recognizing planned attacks, among others.
As per the SLR, ML and DL statistical models were shown to require understanding the inner workings of the black box predictive models that run them in order to comprehend which aspects of input data drive decision making in the network [32].Thus, the decisionmaking process in the DNN has become a subject of active research.
Currently, DL methods, particularly CNN architectures, have shown remarkable success in PR tasks.According to the SLR, CNNs have a larger number and development of methodological proposals for tasks, such as recognizing emotions, image recognition, facial expressions, video recognition, voice and text recognition, etc.In general, CNNs have performed well in image data, PR, classification, and regression [111], and have advantages in accuracy for solving real-life problems in BD management.One of the notable advantages of CNNs is their ability to provide greater precision and improve system performance due to their unique features, such as local connectivity and shared weights [66].
CNNs have several advantages compared to other DNN models.These include being more similar to the human visual processing system, having a highly optimized structure for processing 2D and 3D images, and being effective in learning and identifying abstract 2D characteristics [32].Additionally, CNNs are an effective method for diagnosing failures [118], detecting lanes [181], inputting data [124], extracting features [135], classifying images [145], classifying audio [48], building recommendation systems [182], and more.
Additionally, the SAE can be used in coding-decoding processes to learn to represent advanced features through an unsupervised learning scheme.This type of network is used to evaluate the quality of images [114,183], noise in data [121,184], computer vision [110,185], process analytics [147], forecasting [155], etc.Similarly, RNNs are an efficient prediction algorithm, and PR mainly uses sequential data in the network.Due to their embedded structure in data sequencing, they provide valuable information, making them an excellent tool for sequential ML, such as voice processing and NLP, and they are used in predictive models for electrical energy consumption [108], urban traffic [115], early emergency warnings [133], failure detection [150], etc. Adversarial learning has shown significant advancements in generative models, with GANs being one of the best examples, specialized in unsupervised learning.In GANs, a generator competes against a discriminator.The generator attempts to model the data distribution by generating false images using a noise vector input and uses these false images to deceive the discriminator, while the discriminator competes with the generator to identify real images from the false ones [131].GANs have been proven to be effective in applications such as enhancing the quality of images [106], inputting data [120], automatic image recognition [129], computer security [144], generating synthetic images [149], etc.
DBNs, or deep belief networks, are a type of generative graphic model composed of multiple RBMs stacked on top of each other.They are commonly used for tasks such as voice recognition, NLP, and image and audio classification.DBNs have connections between layers, but not within a single layer.These layers can be trained using unsupervised algorithms [66].DBNs have been used in various applications, such as failure diagnostics [105], data input [116], urban traffic flow [125], facial expression recognition [130], price forecasting [143], and control systems [152].On the other hand, MLPs are a common type of network that are based on a simpler network called a "simple perceptron."They have one or more hidden layers and an output layer, and are a unidirectional, feedforward network.The input layer is used to introduce and propagate information from the outside, and the number of neurons must match the number of inputs into the data.The hidden layer in an MLP network performs the nonlinear processing of information.The number of neurons used for the input and output layers varies depending on the specific application.The algorithm commonly used for training MLPs is called backpropagation, which aims to find the optimal weights for each connection in the network to produce an output that closely matches the desired output, while minimizing the error.MLPs are relatively easy to implement, and produce high-quality models with a relatively low training time compared to more complex methods.They are commonly used in applications such as quality control [101], time series forecasting [112], CC [119], life cycle of information [127], data input [21], and automated decision making [142].
Recent research found in SLRs suggests a growing interest in GNN methodologies for graphic data coming from different scenarios of the real world.GNNs are DNNs based on graphs, where nodes store information from neighboring nodes; in other words, data are transmitted and incorporated into the properties of the corresponding node (message transference).
GNNs are a powerful tool for unsupervised learning, and are commonly used for forecasting, imputing missing data, and 3D modeling in a wide range of fields, such as social networks, movement planning, knowledge graphs, recommendation systems, molecular compositions, search engines, power blackouts, financial markets, among others [157].They are typically used for tasks such as node classification, link prediction, clustering, and graph classification.The GNN process consists of four phases: the preprocessing of graphs, graph construction, graph representation, and graph classification [148].GNNs take a formatted graph as the input and produce a numeric value vector that represents important information about the nodes and their relationships.The output of GNNs is an embedding, which is a vector representation of the node's data and its knowledge from other nodes in the graph.In summary, the SLR (systematic literature review) mainly proposed the use of convolutional GNN architectures, followed by other taxonomies, such as recurrent GNNs, graph autoencoders, and spatial-temporal GNNs.

Conclusions
The growth of BD is driven by key technological trends, such as IoT technology, Industry 4.0, and the data paradigm 4.0.The effective management of BD requires the ongoing development of ML concepts, where the combination of PR/DL methods should lead to reliable and efficient data management frameworks.This study focused on the field of engineering and specifically examined the AI techniques of PR/DL that are currently being used in diverse applications for data management.This document complemented the study of Joel et al. [186], which proposes a conceptual framework for the analysis of major emerging ML methodology proposals.Due to space limitations, the focus of the review of each article was on the type of research, topic content, case structure considered, and AI methodology used.
This review highlighted the latest developments in PR/DL methodological proposals for data management and provided a general overview of these methods.The methodology of this study involved selecting research articles from highly reputable and relevant journals.The articles selected for the SLR came from journals with a high SJR and Google Scholar h5 index.The review was conducted between the years 2016 and 2021, and a total of 186 studies was used, of which 120 were included in the SLR.
This SLR encompassed various methodological supports that complement PR/DL solutions, such as the component analysis, support vector machines, hierarchical cluster analysis, K-means clustering, LDA, focused method, decision trees, self-organizing map, RF, hidden Markov model, NB, mixture regression, K nearest neighbors, dynamic time warping, Gaussian distribution, fast Fourier transforms, k-medoids, and the autoregressive model.These methods were utilized for a variety of data types and engineering applications in the past six years.
The literature suggests that CNN architectures have contributed significant advancements as PR (predictive) solutions for data mining in engineering problems.Additionally, SAEs, RNNs, DBNs, GANs, and GNNs were also highlighted as notable methods.The fields of computer science and technological development had the highest number of methodological proposals for the development of PR/DL solutions.Furthermore, research in PR/DL proposals was led by countries such as China, followed by the United States.The databases Science Direct, IEEE, and Springer and journals IEEE Access, Nature Communications, and Sensors provided the most significant contributions.
This study found that PR/DL methods have been adapted and increasingly used for addressing various issues in data management.This conclusion was supported by other literature reviews, such as [28,30,54,186].These studies indicated a positive correlation between new PR/DL technologies and emerging methods of artificial intelligence for big data management, which is a crucial aspect for Industry 4.0.The findings of this study aligned with the conclusions of other authors [2,66,78,148], who claimed that PR/DL methods have greater capabilities for resolving complex interactions of parameters in data management, allowing for the solving of problems that are difficult to address using traditional statistical methods.
As ML becomes more prevalent, there is an increasing need to acquire larger amounts of data, particularly for next-generation DNNs.DL is a powerful tool for interpreting engineering data.More advanced DNN methods focus on the properties that define the object, which results in faster processing speeds and improved recognition efficiency, even with low-quality patterns.
The use of multiple clustering algorithms presents a challenging problem.The SLR suggested methodological advancements that involve the deliberate or random selection of certain clustering techniques, with modifications to the initial conditions and the use of different subgroups for parameters.
The SLR found several key findings: (1) In the field of detection systems, PR/DL methods have the potential to be next-generation approaches for engineering by performing evaluations, mainly in the diagnostics of rollers, bearings, gear assemblies, electrical systems, engines, pumps, etc. (2) In the field of computer-assisted image and diagnostic analyses, the SLR showed that PR/DL methods can process dimensional graphical information from sensors; the main methodological contributions focused on the analysis of medical images, location detection, visual recognition, facial expressions, CC, automatic image quality, remote sensing, substance and microorganism detection, quality control, color mixture recognition, chemical patterns, structural health, detailed visual recognition, etc. (3) The SLR suggested that PR/DL methods could be used in the fields of neural network architectural design and control systems to describe outdoor/indoor systems, classify objects, analyze sensitivity parameters, detect emergency events, control urban traffic, robotic assistance, network topology, and network interpretation, among others.(4) The field of forecasting was shown to be mainly used for the analysis of time series for urban traffic, commodity pricing, process analytics, consumption profiles, emergency events, demand forecasting, flow regimes, and tool wearing, among others.(5) The fields of data input and information security had relatively few solution methodologies.Some DL solutions were found for power data, traffic data, structural sensor data, IoT scenarios, facial image manipulation, etc.
The latest and advanced PR/DL methods have driven numerous small and big changes that have had a positive impact on society.From all angles, AI is making inroads in areas that were previously only accessible to human intelligence.Achieving its full potential is within reach, as long as it is conducted ethically and in a sustainable manner.
According to the results of the SLR analysis, future studies should include more in-depth research to optimize the clarification of problems and object location, and to improve scalability with a larger number of layers.Additionally, research should address issues related to scaling or normalization, encoder-decoder architectures in NLP modeling and automatic translation, large-scale graphic processing performance, control graphics performance, solutions to leak gradients or explosions in training (propagation method), challenges in selecting methods for parameters for ANN design in relation to PR, challenges in computer vision and NLP in PR, problems in information security (persistent advanced threats), and new challenges in failure recognition, among others.To provide a comprehensive and complementary PR/DL approach, a multidisciplinary research team is needed to enhance the performance and success of applications.

Figure 1 .
Figure 1.Research perspective on the use of different AI methods in engineering.

Figure 1 .
Figure 1.Research perspective on the use of different AI methods in engineering.

Figure 2 .
Figure 2. Conceptual framework proposed for the classification of the methodological proposals.Figure 2. Conceptual framework proposed for the classification of the methodological proposals.

Figure 2 .
Figure 2. Conceptual framework proposed for the classification of the methodological proposals.Figure 2. Conceptual framework proposed for the classification of the methodological proposals.

1. 2 .
Previous Research AI and engineering are commonly used terms in the field of data analysis.This integrated term, along with technologies such as the Internet of Things (IoT), cloud computing Symmetry 2023, 15, 535 5 of 29

Figure 3 .
Figure 3. PRISMA flow diagram in three levels.

Figure 3 .
Figure 3. PRISMA flow diagram in three levels.

h5 index ≥ 40 .
Data management capacity in engineering.SJR index ≥ 40.Scientific publications not reviewed by peers.Studies must be classified under journal quartile Q1 and Q2, preferably Q1.Studies must be published between the years 2016-2021.Number of quotes.Database reputation.Number of citations, excluding for publications of the year 2021.

Figure 4 ,
Figure 4, using the VOSviewer software, summarizes the main keywords found in the SLR.

Figure 5 .
Figure 5. Distribution by quartile based on the SJR index.

Figure 5 .
Figure 5. Distribution by quartile based on the SJR index.

Figure 5 .
Figure 5. Distribution by quartile based on the SJR index.

Figure 6 .
Figure 6.Distribution of references by year of publication.

Figure 7 .
Figure 7. Distribution of studies by domains.

Figure 6 .
Figure 6.Distribution of references by year of publication.

Figure 5 .
Figure 5. Distribution by quartile based on the SJR index.

Figure 6 .
Figure 6.Distribution of references by year of publication.

Figure 7 .
Figure 7. Distribution of studies by domains.Figure 7. Distribution of studies by domains.

Figure 7 .
Figure 7. Distribution of studies by domains.Figure 7. Distribution of studies by domains.

Figures 9 -
11 show, in order, the distribution trends of the Google Scholar h5 index, the SJR index, and the number of citations received by each reference as of 2/20/2022.

Figure 8 .
Figure 8. Distribution of references by study area.

Figures 9 -
Figures 9-11 show, in order, the distribution trends of the Google Scholar h5 index, the SJR index, and the number of citations received by each reference as of 2/20/2022.

Figure 9 .
Figure 9. Distribution of the h5 index in SLR.

Figure 8 .
Figure 8. Distribution of references by study area.

Figure 8 .
Figure 8. Distribution of references by study area.

Figures 9 -
Figures 9-11 show, in order, the distribution trends of the Google Scholar h5 index, the SJR index, and the number of citations received by each reference as of 2/20/2022.

Figure 9 .
Figure 9. Distribution of the h5 index in SLR.Figure 9. Distribution of the h5 index in SLR.

Figure 9 . 32 Figure 10 .
Figure 9. Distribution of the h5 index in SLR.Figure 9. Distribution of the h5 index in SLR.Symmetry 2023, 15, x FOR PEER REVIEW 18 of 32

Figure 10 .
Figure 10.Distribution of the SJR index in SLR.Figure 10.Distribution of the SJR index in SLR.

Figure 10 .
Figure 10.Distribution of the SJR index in SLR.

Figure 11 .
Figure 11.Distribution of SLR references by number of citations.

Figure 11 .
Figure 11.Distribution of SLR references by number of citations.

Table 1 .
PR applications in engineering for data management.

of Study Domain Case Structure Used PR Method for Data Management
[62]esearch article[60].Characteristic retrieval, selection, and classification.Control graphics pattern behavior.Radial basis function neural networkResearch article[61].Characteristic classification.Prosthetic control for upper extremities.Linear discriminant analysis (LDA)Research article[62].Photonic device design.Dimensional reduction technique.PCA

Table 2 .
DL applications in engineering for data management.