Artificial Intelligence Methodologies for Data Management

This study analyses the main challenges, trends, technological approaches, and artificial intelligence methods developed by new researchers and professionals in the field of machine learning, with an emphasis on the most outstanding and relevant works to date. This literature review evaluates the main methodological contributions of artificial intelligence through machine learning. The methodology used to study the documents was content analysis; the basic terminology of the study corresponds to machine learning, artificial intelligence, and big data between the years 2017 and 2021. For this study, we selected 181 references, of which 120 are part of the literature review. The conceptual framework includes 12 categories, four groups, and eight subgroups. The study of data management using AI methodologies presents symmetry in the four machine learning groups: supervised learning, unsupervised learning, semi-supervised learning, and reinforced learning. Furthermore, the artificial intelligence methods with more symmetry in all groups are artificial neural networks, Support Vector Machines, K-means, and Bayesian Methods. Finally, five research avenues are presented to improve the prediction of machine learning.


Introduction
Information asymmetry in business based on data management can be reduced by using machine learning (ML) techniques, allowing free competition between market agents. Information asymmetry in data management comes from two sources: (i) patterns of public information not observed by some actors in the negotiation and (ii) actions carried out by an economic actor that are difficult to read by the rest of the market. The concept of ML has contributed to the new industrial revolution (industry 4.0) in particular, with the massive use of big data (BD) and cloud techniques. Obtaining information through real-time data processing is a strategy that offers competitive advantages for decision-making, regardless of the size and commercial sector of the organization.
The machine learns by improving its calculation results without human intervention. The machine needs three fundamental elements for this learning: process data, communication with the cloud and BD, and calculation models. These three elements require technological advances in data science, ML, and artificial intelligence (AI).
antecedents and consequences of the adoption and integration of robotics by companies in their customer service, technology marketing, and information technology operations.
Recently, Hoyer et al. [23] proposed a new framework to understand the role of new technologies powered by AI (internet of things (IoT), augmented reality, virtual reality, mixed reality (MR), virtual assistants, chatbots, robots, blockchain, and 3D printing) in the customer/buyer process. In addition, Duan et al. [24] and Liebowitz [25] analyzed the advancement of AI technology and its capacity to process BD for decision-making. Duan et al. propose twelve research proposals in AI information systems. Kokina and Davenport [26] discussed cognitive AI capabilities in auditing processes, with four large accounting firms launching numerous projects. While Singh et al. [27] developed a conceptual framework of companies' capabilities to operate with "one voice" to offer fluid, harmonious, and reliable interactions through various interfaces. Authors such as Kreutzer and Sirrenberg [28] evaluated the capacity of AI systems for: prediction and profiling of potential customers, conversational commerce, sentiment analysis, and the creation and distribution of content. Furthermore, Heller et al. [29] proposed an integrated framework to automate services based on augmented reality.
The articles of the debate highlight methodological advances aimed at developing AI applications mainly in the service industry (obtaining and managing data to aid decisionmaking). Methods, such as RPA, PR, and ML, have seen remarkable developments and increased use in database development and optimization in recent years. Recently, to address the limitations of RPA, authors, such as Berruti et al. [30], have proposed intelligent process automation, which refers to the combination of AI, ML, and cognitive automation.
The popularity of ML is by and large due to the availability of powerful new computing tools and hardware and the increasing ease of generating and having access to large datasets, but adoption has been slow. Taking all web search categories into account, a google trends analysis [31] (Figure 1) on the popularity of ML, BD, and AI interestingly reveals an increase arithmetic mean in ML since 2016, reaching a peak between the years 2018 to 2020. Meanwhile, in AI, the behavior is stable without presenting high peaks from 2011 to 2016. However, the popularity of AI increased from 2016 to 2018, which corresponds to the positive result of the ML. Finally, the behavior of BD has remained stable from 2014 to 2021, presenting some popularity peaks every year. Together, this information highlights a positive correlation between ML, BD, and AI. All the above is evidenced in the growing behavior of the number of articles per year. In this study, the comparison of AI/ML/BD trends was only illustrative to show recent growth of AI and ML compared to BD.  The review article is structured as follows. Section 1 presents a general introduction to the subject of AI and its importance for obtaining and managing data. Within the subject, we present the opinions of different authors, where some advances in AI applications are discussed. Next, we correlate the popularity of the ML domain, and the main limitations and contributions of the study are revealed. Section 2 describes the methodology used. Section 3 presents a conceptual framework for classifying studies and provides a literature review of the latest AI methodologies used in the ML domain, where the differences between these techniques are detailed. Additionally, a descriptive analysis of the studies is carried out. Section 4 presents the discussions on the study thematic. Finally, the conclusions are provided in Section 5.

Limitations and Contributions
This review article presents a broad perspective of research efforts on using emerging ML-supported AI methods for data collection and management. Research question: What are the main methodological proposals for data management that contribute to the development of the ML domain? This study is limited to the literature regarding AI/ML applications in relation to engineering disciplines. For each ML group, the review of each article focuses on the domain addressed by the study, the subgroup and type of research to which the study belongs, the research results, and the AI method used.
The contributions of this review article are: (1) studying and summarizing the AI categories used to obtain customer information; (2) defining and analyzing the main groups and subgroups that make up the ML theme; (3) identifying study types, future directions, and emerging approaches that use AI supported by ML methodologies for data management; (4) identifying the main AI methodological approaches used in the last five years to obtain and manage data; and (5) highlighting the main AI categories, areas of knowledge, research results, and current limitations/challenges of AI methods with ML.
The ML domain is constantly growing, and it is not possible to cover all of the algorithms in a single article. The multidisciplinary nature of ML was the most challenging difficulty to overcome in this review. However, the contributions of co-authors allowed the search to be limited to widely-used AI methodologies.

Methodology
This work corresponds to an extensive review of the literature published of the recent advances in methodological proposals that contribute to developing the AI concept supported by ML technology. The methodology used for this literature review was content analysis; a valid technique for the study of scientific documents [32], used to: identify, classify, and analyze services in smart cities [33], study advances in nanotechnology applied to innovative packaging [34], propose a conceptual framework for strategic management [35], and analyze reverse-logistics models aimed at solid waste management [36][37][38].
This review identifies and analyzes research that proposes new AI methods emerging as reliable and efficient tools in data management. The development of the proposed methodology provides technical background on the indicated methods and knowledge on using these algorithms for data management problems. AI methodological developments for BD processing as a solution for data management were used by Allam and Dhunny [39] to propose a framework that regulates and formulates BD processing policies through AI and ML aimed at the smart city concept.
Using the same methodology, Henrique et al. [40] analyzed different ML methods and techniques to predict financial market values, resulting in a bibliographic review of the most critical studies on this topic. Likewise, van Klompenburg et al. [41] used it to extract and synthesize ML algorithms used in predictive studies of agricultural crop yield.
This study is divided into categories, groups, and subgroups. The categories are represented by the 12 proposed emerging AI technologies. The groups constitute the four AI techniques represented in the ML domain. Each group contains the methodological contributions (subgroups) that illustrate some of the most outstanding algorithms used in the ML, identifying their degree of development through the investigations.
In this work, a systematic review of the scientific literature published between the years 2017 and 2021 has been carried out. For its preparation, the guidelines of the PRISMA statement [42,43] have been followed. Figure 2 summarizes the proposed PRISMA methodology. The systematic search was carried out with the Google Scholar search engine using the WOS and Scopus digital platforms, mainly databases, such as Springer Link, EmeraldInsight, Science Direct, Wiley Online Library, Taylor & Francis Group, and IEEE Xplore Digital Library. The keywords used were machine learning, data management, BD, and artificial intelligence. techniques represented in the ML domain. Each group contains the methodological contributions (subgroups) that illustrate some of the most outstanding algorithms used in the ML, identifying their degree of development through the investigations. In this work, a systematic review of the scientific literature published between the years 2017 and 2021 has been carried out. For its preparation, the guidelines of the PRISMA statement [42,43] have been followed. Figure 2 summarizes the proposed PRISMA methodology. The systematic search was carried out with the Google Scholar search engine using the WOS and Scopus digital platforms, mainly databases, such as Springer Link, EmeraldInsight, Science Direct, Wiley Online Library, Taylor & Francis Group, and IEEE Xplore Digital Library. The keywords used were machine learning, data management, BD, and artificial intelligence. In addition, to choose an article, two types of quality measures were mainly considered: journal impact factor (JIF) and journal citation indicator (JCI), while not being excluding factors (Table S1). Initially, we reviewed about 4000 publications in scientific journals, identifying 883 articles for the first step. For the second step, the studies were selected by reviewing the most relevant titles. Subsequently, in the third step, the summaries were read. After the final choice (reading the abstracts), in the fourth and last step, we read the complete publication. Then, the articles were reviewed in terms of the inclusion criteria: (1) scientific studies that are part of the WOS and SCOPUS digital platforms; (2) which propose methodological solutions for data collection and management; (3) that in the context of methodological advance, the conceptual bias is studied; (4) that develop comparisons between the solution methods and results obtained; and (5) that have been published In addition, to choose an article, two types of quality measures were mainly considered: journal impact factor (JIF) and journal citation indicator (JCI), while not being excluding factors (Table S1). Initially, we reviewed about 4000 publications in scientific journals, identifying 883 articles for the first step. For the second step, the studies were selected by reviewing the most relevant titles. Subsequently, in the third step, the summaries were read. After the final choice (reading the abstracts), in the fourth and last step, we read the complete publication. Then, the articles were reviewed in terms of the inclusion criteria: (1) scientific studies that are part of the WOS and SCOPUS digital platforms; (2) which propose methodological solutions for data collection and management; (3) that in the context of methodological advance, the conceptual bias is studied; (4) that develop comparisons between the solution methods and results obtained; and (5) that have been published between the years 2017 and 2021. The exclusion criteria were: (1) data

Categorical Classification of the Emerging AI Technologies, ML Groups and Subgroups
The 120 investigations classified in the literature review contribute to the development of the main AI technologies. To facilitate the analysis of the ML domain, based on the 12 technologies proposed by Purcell and Curram [44], we sought to establish a conceptual framework, which is summarized in Figure 3.

Categorical Classification of the Emerging AI Technologies, ML Groups and Subgroups
The 120 investigations classified in the literature review contribute to the development of the main AI technologies. To facilitate the analysis of the ML domain, based on the 12 technologies proposed by Purcell and Curram [44], we sought to establish a conceptual framework, which is summarized in Figure 3. This study proposes to adapt and classify the AI technologies proposed by [44] into 12 AI categories used to obtain and manage data (Table 1); with four being the mature categories offering commercial value and impact on customer perception (AI-enhanced analytics solutions, DL platform, natural language generation (NLG), and speech analytics) [45]. These technologies can be used at different levels to provide optimal solutions to specific problems. Table 1. AI categories used for data collection and management.

AI-enhanced analytics solutions
These solutions are a new generation of business intelligence. It relies on NLP and NLG, as well as information retrieval, to respond to user queries. Many of these solutions use ML to personalize the user experience (improvements in service delivery), automatically revealing alerts based on preferences learned by the user. Today these solutions still need to learn from human analysts.

Conversational service solutions
They act as virtual agents, using NLP and ML to understand and address individual customer service problems. These solutions provide a conversation interface that is generally text-based but can also include voice or image, allowing users to participate through natural language. A virtual chat agent can quickly answer routine questions providing the requested information.

DL platform
A branch of ML, which provides access to DL algorithms (interconnected neural networks). These platforms are used for image and video recognition and auditory analysis. Each algorithm in the hierarchy applies a non-linear transformation to its input and uses what it learns to create a statistical model as the output. The number of layers and the iterations continue until the output has reached an acceptable level of precision. This study proposes to adapt and classify the AI technologies proposed by [44] into 12 AI categories used to obtain and manage data (Table 1); with four being the mature categories offering commercial value and impact on customer perception (AIenhanced analytics solutions, DL platform, natural language generation (NLG), and speech analytics) [45]. These technologies can be used at different levels to provide optimal solutions to specific problems. Table 1. AI categories used for data collection and management.

AI-enhanced analytics solutions
These solutions are a new generation of business intelligence. It relies on NLP and NLG, as well as information retrieval, to respond to user queries. Many of these solutions use ML to personalize the user experience (improvements in service delivery), automatically revealing alerts based on preferences learned by the user. Today these solutions still need to learn from human analysts.

Conversational service solutions
They act as virtual agents, using NLP and ML to understand and address individual customer service problems. These solutions provide a conversation interface that is generally text-based but can also include voice or image, allowing users to participate through natural language. A virtual chat agent can quickly answer routine questions providing the requested information.

DL platform
A branch of ML, which provides access to DL algorithms (interconnected neural networks). These platforms are used for image and video recognition and auditory analysis. Each algorithm in the hierarchy applies a non-linear transformation to its input and uses what it learns to create a statistical model as the output. The number of layers and the iterations continue until the output has reached an acceptable level of precision.

Facial recognition
A type of application software that allows people to be identified by analyzing the biometric characteristics of faces (facial patterns). Examples: unlocking of electronic devices, identification of faces in social networks-marketing, virtual payments, security and online education, among others.
Similar to image and video analysis, IR has issues with privacy.

AI Categories Description
Image and video analysis Understands tools and technology to analyze images and videos in order to understand and interpret objects and the characteristics of objects within them. Although many of these tools are pre-trained, adapting your own needs will have to be refined (transfer training). Currently, these technologies require a considerable repository of relevant and correctly classified images.

Intelligent recommendation solutions
Smart recommendation tools leverage AI to provide users with information search results close to their needs. To do this, these new engines continually learn (1) from the individual behavior and conversational interactions, (2) use DL to classify images, identifying interests and suggesting products, and (3) they use NLP to show users' needs and wants.

Intelligent research solutions
They help examine large amounts of structured and unstructured and internal and external data by leveraging NLP, ML, and, in some cases, NLG to generate information that can be used by product developers, sales teams, marketing specialists, scientific research teams, among others.

Machine learning platforms (ML platforms)
These platforms provide users with tools to build, implement, and monitor ML algorithms. Some platforms offer pre-built algorithms and interactive workflows, while others require a greater understanding of development and coding (regressions, decision trees, Bayesian models, unsupervised grouping methods, and statistical models, among others).

NLG
Includes tools and technology that use advanced models to produce high-quality texts in natural language, generally from a corpus of answers or made up of defined textual components. Currently, this technology provides value in areas such as the production of media content (academic evaluation reports, articles for online newspapers or medical, engineering, financial reports, among others).

Pre-trained vertical solutions
Solutions trained in a specific vertical data corpus with functionality adapted to each sector. Examples: agriculture-collection management, phytosanitary control, machinery and equipment control; financial services-detection of transactions and fraudulent accounts; real estate-synchronization of real estate with different portals, optimization of information for monitoring and commercial actions; investment advisers-client portfolio management; medicine-diagnosis of diseases, optimal treatment methods; journalism-writing articles, among others.

Speech analytics
Also called audio mining, it includes tools that understand and interpret the spoken word. This technology is made up of three parts: acoustic speech recognition; speech to text transcription and text analysis. It is an example of a technology that makes unstructured data ready for analysis.

Text analytics
Text analysis allows you to identify hidden information patterns and structures in the data and gain insight from the document collection for decision-making. Text analytics converts unstructured text data to analyzable structured data. Among the tasks carried out by the text analysis are: descriptive statistical analysis, entity extraction, concept extraction and self-tracking, cross-relevance analysis, sentiment analysis, and automatic categorization. Table 2 defines the ML groups and subgroups used for classifying the studies. Table 2. General description of the ML domain for the literature review.

ML Groups Concept
Supervised learning (SL) The algorithms work with labeled data, trying to find a model/function that, given the input variables, assigns the appropriate output label. The algorithm is trained with a history of real data and thus learns to assign the appropriate output label to a new value, predicting the output value. If the objective of the model is to forecast continuous variables, it is classified as a regression. However, if the goal is to predict discrete variables, it is known as classification.

Unsupervised learning (UL)
Unlike SL, UL is only given the characteristics without providing the algorithm with any labels. Its function is clustering; therefore, the algorithm should catalog by similarity and create groups. The system itself forms the groups from the input patterns.

Semi-supervised learning (SSL)
Aims to produce better classifiers, combining SL and UL techniques to learn from both labeled and unlabeled data. To later retrain the model, SSL methods use the untagged data to modify or reform the assumptions obtained only from the tagged data.

ML Groups Concept
Reinforced learning (RL) RL tries to get an agent or intelligent machine to learn to decide through their own experience. In other words, depending on the environment (real or virtual), in a given situation, execute the best possible action through an interactive trial and error process, depending on the observed state (knowledge of the environment). As a result, the agent gets the best possible reward.

ML Subgroups Concept
Regression Y function attempts to predict the estimated value of a response variable based on one or more independent variables of interest; that is, it predicts the Y value (dependent variable), given values of the X variable. Models can be linear, exponential, or logarithmic. Forecast of continuous variables.

Classification
Used when the expected result is a discrete label. Different performance metrics are used to evaluate the classification models (accuracy, precision, sensitivity, specificity, and F1 score). Depending on the target classes, the model prediction can be binary or multivariate.

Clustering
Consists of grouping a series of vectors according to a criterion in groups or clusters. Usually, the criterion is similarity (grouping similar vectors into groups). These models predict which is the best grouping of data.

Dimension reduction
The process of reducing the number of random variables involved (dimension reduction). These algorithms map a dataset to subspaces derived from the original space of less dimension, which describes the data at a lower cost.

Inductive
Methods that build a classifier that can generate predictions for any object in the input space.

Transductive
Methods that are limited during the training phase in obtaining label predictions for unlabeled data points. These methods are based on graphics.

Control
Flexible and adaptable methods that can be introduced into a control system to analyze differential equations (components of the control loop).

Literature Review
The literature review corresponds to the period 2017-2021. Different authors have provided models to the SL, UL, SSL, and RL groups during this period. These authors propose AI methodologies to solve ML problems in data collection and management. For each group, the maximum limit was 30 articles. A total of 120 studies were selected and are classified in Table 3.   Table 4 classified and listed the group 1 SL model (Table 3) literature investigations into 12 AI categories (mentioned in Table 1). SL can be used in regression problems and classification problems. In regression problems, the outputs are continuous, while in a classification problem, the outputs are categorical.   In SL, the correct input/output pairs are available, and the goal is to correctly map them from the input space to the output space. Table 4 shows that 70% of studies correspond to the classification subgroup, 17% to the regression subgroup, and the remaining 13% to studies where both subgroups are addressed. This indicates a greater interest in the development of categorical methodologies for solving classification problems. In the literature review in Table 4, 22 research articles, 4 literature review articles, 3 surveys and, one case study were found. Thus, the literature on SL presents a significant progress regarding the collection and organization of knowledge, reflected in the solid understanding of the approaches and algorithms developed. Table 4 shows that the most popular AI methodology for SL is ANN (50% of publications). Subsequently, multidimensional AI methodologies were found, classified as SVM (10% of publications), followed by non-parametric-highly flexible methodologies, such as decision trees (7%). Another recursive partition method that involves predictions based on a collection of individual decision trees is random forests (7%).
The literature reviewed for each SL subgroup has expanded in volume and scope and now encompasses a broad algorithmic spectrum. The main AI methodologies used included comparisons with models of (1) regression: assembly methods, regression analysis, learning metrics, regression tree, non-linear regression, Bayesian model, among others; and (2) classification: K-nearest neighbor, Bayesian belief networks, principal component analysis, linear discriminant analysis, assembly methods, learning metrics, collaborative filtering, etc.
Three AI categories feature the most significant breakthroughs-image and video analytics, ML platforms, and pre-trained vertical solutions, all with five publications. However, we did not find studies for the categories AI-conversational service solutions and facial recognition. Table 5 detailed the methodological contributions found in the literature review of the group 2 UL model (Table 3) into 12 AI categories (mentioned in Table 1). UL is used to build clustering or dimension reduction models based on the input data without the corresponding output labels [75]. Improved prediction rate and time K-means  The output data is not available in UL, and the goal is to find patterns in the input data. Table 5 shows that 80% of the studies correspond to the clustering subgroup, 13% to the reduction dimension subgroup, and the remaining 7% of studies addressed both subgroups. This indicates a greater interest in developing exploratory methodologies (structural description of data) for solving clustering problems. According to the analysis of the studies, 28 research articles were found, one literature review article, and one survey; no findings were presented for case studies. Thus, UL takes advantage of large amounts of unlabeled data. Currently, there are significant developments in mathematical modeling, reflected in the number of research articles.

Unsupervised Learning Models for Data Collection and Management
According to Table 5, the most widely used AI solution methodology is the ANN (43% of publications). AI methodologies, such as k-means (20% of publications), aimed to determine the number, quality, and cohesion of the groupings in a data set. Statistical methodologies, such as Markov's (7% of publications), relate observable events and hidden events.
The AI category that has made the greatest advancements is DL platforms, with five publications. We did not find studies for the category AI-facial recognition.

Semi-supervised Learning Models for Data Collection and Management
The SSL approach builds inductive or Transductive models based on the original tagged data and the untagged data with new tags [105]. Table 6 presented an updated methodological description of the group 3 SSL model (Table 3) into 12 AI categories (mentioned in Table 1).   SSL takes advantage of large amounts of labeled and untagged data. Conceptually it is situated between SL and UL. Table 6 shows that 67% of studies correspond to the inductive subgroup, 23% to the Transductive subgroup, and the remaining 10% to studies that address both subgroups. The preceding indicates more interest in developing methodologies that optimize predictive models for solving classification problems. In the literature review, we found 26 research articles, 2 literature review articles, one survey, and one case study. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed to models based on ANNs and generative learning.
According to Table 6, the most widely used AI solution methodology was ANN (36% of publications). Next, was AI methodologies that analyze networks, known as graph theory (7% of publications). Probabilistic graphical methodologies provide simple ways to visualize the structure and properties of a probability model, such as Bayesian methods (7% of publications). Another method found commonly used for clustering data was the Gaussian mixture model (7%) and the k-nearest neighbor classification algorithm (7%).
The AI category with the greatest advancements was image and video analysis, with six publications. We found no publications of AI categories-AI-enhanced analytics solutions, conversational service solutions, speech analytics, and text analytics. We found only one recent survey that collects and organizes this knowledge, which may hamper the ability of researchers and engineers to use SSL. The literature on this subject has expanded in volume and scope and now encompasses a broad spectrum of theories, algorithms, and applications.

Reinforced Learning Models for Data Collection and Management
An RL algorithm aims to maximize cumulative rewards by learning strategies through interaction with the environment. Table 7 classified and listed the group 4 RL model (Table 3) literature investigations into 12 AI categories (mentioned in Table 1).
RL is a framework for decision-making problems where the agent interacts through trial and error with its environment to discover optimal behavior. Table 7 shows that 57% of studies correspond to the control subgroup, 24% to the classification subgroup, and 18% to studies that address both subgroups. The above indicates a greater interest in methodological developments where computers make decisions on complex and stochastic systems to solve control problems. In the content analysis, we found 23 research articles, 4 literature review articles, and 3 surveys; no findings were presented for case studies.
According to the literature, RL is the most popular technique for artificial agents to learn optimal strategy closely through experience. Such techniques are validated with the algorithmic developments found in the reviewed studies. Different studies analyzed models and solved RL problems using Markov's decision process theory, Monte Carlo, and dynamic programming. RL is a potent engineering tool for modeling dynamic behaviors and achieving goals based on rewards and penalties.   Table 7 shows that the most popular AI methodology for SL is ANN (50% of publications). Next was control methodologies that assign probabilities, such as the direct search for policies (10%). Followed by methodologies to obtain optimal policies, such as Markov's decision processes (7%).
The two AI categories with the biggest breakthroughs are smart research solutions (8 publications) and DL platforms (7 publications). We found no studies for the categories AI-conversational service solutions, NLG, speech analytics, and text analysis. The distribution of publications by study type: the type of study most evoked were research articles (with 99 investigations), followed by literature reviews (with 11), surveys (with 8), and case studies (with 2) ( Figure 5). The distribution of publications by study type: the type of study most evoked were research articles (with 99 investigations), followed by literature reviews (with 11), surveys (with 8), and case studies (with 2) ( Figure 5). The number of publications per subgroup: eight subgroups with 120 articles were used in the literature review. The classification subgroup made the most significant contribution, adding the groups SL and RL would be 28 publications, followed by the clustering (24 publications), the inductive subgroup (20 publications), the control subgroup (17 publications), transductive (7 publications), regression (5 publications), and dimension reduction (4 publications) ( Figure 6).  The number of publications per subgroup: eight subgroups with 120 articles were used in the literature review. The classification subgroup made the most significant contribution, adding the groups SL and RL would be 28 publications, followed by the clustering (24 publications), the inductive subgroup (20 publications), the control subgroup (17 publications), transductive (7 publications), regression (5 publications), and dimension reduction (4 publications) ( Figure 6). The number of articles per year: according to the analysis in Figure 1, the growing popularity of ML, AI, and BD is evident. This confirms the increasing interest that researchers have been giving to ML in the last five years. In this research, 35 articles belonging to the literature review were published in 2020 (Figure 7). Distribution of AI categories: according to the literature review, five AI categories make the most significant methodological contributions. As shown in Figure 8  Distribution of AI categories: according to the literature review, five AI categories make the most significant methodological contributions. As shown in Figure 8, the most important is the DL platform (with 21 studies), followed by intelligent research solutions (with 20), image and video analytics-ML platforms (with 17), and pre-trained vertical solutions (with 16).

Descriptive Analysis of the Studies
Distribution of the areas of knowledge: the analysis of the 120 studies that make up this literature review shows that the area of computer engineering and systems presents the most widely used methodological developments (with 62 investigations). This is followed by telecommunications (with 14), infrastructure (with 12), transportation (with 10), health (with 10), the financial area (with 6), marketing and news (with 4), and agriculture (with 2). Distribution of AI categories: according to the literature review, five AI categories make the most significant methodological contributions. As shown in Figure 8, the most important is the DL platform (with 21 studies), followed by intelligent research solutions (with 20), image and video analytics-ML platforms (with 17), and pre-trained vertical solutions (with 16). Distribution of the areas of knowledge: the analysis of the 120 studies that make up this literature review shows that the area of computer engineering and systems presents the most widely used methodological developments (with 62 investigations). This is followed by telecommunications (with 14), infrastructure (with 12), transportation (with 10),  (Table S1). Regarding the origin of the research by country, China proposes the greatest ML methodological developments (39), followed by the USA (32) ( Figure S1). We reviewed 883 publications under the ML, SL, UL, SSL, and RL criteria and/or obtaining and managing information. Finally, 120 publications were selected in the fourth step; the largest number of contributions were made by Science Direct databases, IEEE, and Springer ( Figure S2). In total, 113 articles and 7 conference articles made the most important contributions.

Discussion
The most widely used ML technique corresponds to SL; even so, today's BD requires UL and RL learning paradigms. However, the accuracy of UL and RL techniques is accompanied by high computational costs.
The literature review suggests using different metrics to evaluate the performance and efficiency of ML models. We found different metrics to evaluate the performance and efficiency of AI methodologies; area under the curve (AUC), Nash-Sutcliffe coefficient radius (NS), relative percentage difference (RPD), precision, accuracy, median absolute error (MedAE), recall, normalized mean squared error (NMSE), root mean square prediction error (RMSEP), mean squared prediction error (MSPE), correlation coefficient (R), and specificity. The accuracy metric was the most used by the classification subgroup, followed by mean absolute percentage error (MAPE), mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), F-score, and normalized root mean square deviation or error (NRMSE) for the regression subgroup. Future works are necessary to obtain precision levels close to 100%.
The analysis of the data partition formats suggests that the most typical partition ratio is (80:20) training/testing. Additionally, other studies adopted the training/validation/test data partition (80:10:10). We found no studies that stated a general rule for adopting data partitioning. The literature suggests that partition formats (80/20) following the Pareto rule provide optimal divisions for AI and ML data analysis.
Based on this study, we observed possible research avenues to improve ML predictions (1) integration of two or more AI methodologies; (2) integration of new AI methodologies with soft computing or other conventional methods; (3) use of data decomposition techniques to improve data set quality; (4) use of a set of methods to generalize models and reduce uncertainty; and (5) use of complementary algorithms to improve the quality of new AI methodological proposals.
The deep RL-DL/RL combination promises to revolutionize the future of AI in areas such as automatic driving, NLP, robots, among others. The findings of the review mainly suggest the use of two types of RL models: when the environment and state are known, they use model-based RL solutions (e.g., AlphaZero); when the environment and the state are partially known, they use the model-free RL, whose algorithms are mainly Q-learning (value-based) and gradient policy algorithms (probability-based).
The ML architecture through the IoT concept analyses and interprets complex and large volumes of data, particularly CNN. The ANN analysis of learning rates found that most studies use fixed rates. However, some studies suggest the use of adjustable rates using special algorithms. Regarding the activation function, it was observed that the linear function was the most used. Additionally, within the findings in ANN, some information processing architectures were found for signals supported on graphs, known as graph neural networks (GNN). Most of these methodological proposals are supported in deep learning and mainly propose GNN architectures based on CNN, recurrent ANN, and deep autoencoders.
The next great challenge lies in the superposition of the four ML groups: the ability to select the most appropriate AI method. This involves anticipating various scenarios (selection of parameters) and dealing with different levels of uncertainty (missing or incomplete data, computational capacity, classification precision, among others).
Despite multiple advancements of the 12 AI categories used for data collection and management, there are still multiple problems, challenges, methodologies, and future trends that AI/ML must overcome. While some UL techniques remove unnecessary data, there is still a need for massive processing power capable of analyzing all scenarios. NLG processing is a long way from being a natural and accurate translation. Jargon, accents, and understanding the language remain big challenges for ML because although image classification is a settled issue, the machine does not really understand the meaning of the image. For now, we continue to classify everything without defining intermediate states, despite the constant developments in fuzzy or soft systems.
The lack of video training is a sensitive topic for ML. Video data sets are much richer in content than still images; therefore, ML needs deeper systems capable of learning and responding efficiently with little input data. This challenge requires solving storage capacity (memory capacity to store past events) through technologies such as a collective memory network between all artificial thinking entities and differentiable neural computers, added to a modular system that integrates different algorithms. The ML reasoning ability is associated with the future development of a model of ideas; this model should serve as an interface, helping to interpret ML's own language.
Currently, changes in the importance and frequency of participating in online activities before and during COVID-19 created new challenges for ML. According to Mouratidis and Papagiannakis [165], during the pandemic, there were substantial increases in the importance of teleworking (31% increase), teleconferencing (34% increase), e-learning (34% increase), and telehealth (21% increase), among others. To reduce the effect of the pandemic on the education sector, most educational institutions were forced to teach online classes. As an academic tool (Zoom Microsoft Teams, Moodle, Google Classroom, virtual reality applications, etc.), the web provides a global open platform for storing data and presenting it in text, graphic, audio, and video formats, and in communication tools for synchronous and asynchronous communication [166]. AI-supported e-learning (AIeL) refers to the use of AI techniques in e-learning (the use of computer and network technologies for learning or training) [167]. Through web platforms, AIeL proposes ML approaches to: identify the learning style and personalize learning experiences [168,169], personalized hybrid recommender for the adjustment or association of content to students [170,171], DL algorithms for monitoring student emotions in real-time [172,173], a multi-agent system to improve the Moodle platform in intelligent tutoring systems [174], a cyber threat detection model in e-learning systems [175], and fuzzy ANN for learning English [176]. Finally, different authors have evaluated the impact of AIeL during the COVID 19 pandemic [177][178][179][180][181].

Conclusions
This review article presents the importance of continuous AI methodological developments for ML applications during 2017-2021. A total of 181 studies were used, of which 120 are part of the literary analysis. The literature indicated that, among the numerous methods, ML has been increasingly adopted and used to develop emerging AI technologies. In general, ML areas are closely related, as they fundamentally overlap in scope.
The most used tools to evaluate the performance of AI methods were accuracy, RMSE, R, specificity, MAPE, MPAE, followed by MSE, in addition to generalizability, robustness, calculation cost, and speed. The most commonly used ML algorithm was ANN, followed by SVM, k-means, and Bayesian methods. Some studies adopted hybrid methodologies to harness the power of different techniques to compensate for the weakness of specific techniques. The knowledge areas evaluated make software and systems engineering the next generation approaches to perform data collection and management, with fault diagnosis being the application area with the greatest solution proposals, followed by robotics, autonomous computing, and driving. This review showed that classification tasks are the most frequently used methods by so-called intelligent systems, either by statistical algorithms or AI. The literature also suggests that the AI-based DL platforms require less information, which improves complicated decision problems, making it the alternative solution with the greatest AI methodological proposals.
Based on the methodology proposed in this study, mature AI technologies, such as speech analytics, facial recognition, NLG, and conversational service solutions, had slower methodological developments, showing researchers are interested in less developed AI categories.
Future work should amplify the discussions in the proposed study areas. For example, one of the concepts that needs to be expanded is the emerging PR AI method; the objective of this study would be to know the impact that PR advances generate in different areas of engineering, from a perspective framed in generative models versus discriminative models.
It is essential to broaden the literature review, also focusing on the emerging DL AI method, for example, to analyze the advances that CNN applications have had in different areas of engineering. Likewise, analyzing the methodological advances of DL architectures, recurrent neural networks, automatic encoders, deep belief networks, among others, is important. Finally, the design of an adequate methodology that integrates multiple emerging AI technologies to facilitate data collection and management is envisaged.

Conflicts of Interest:
The authors declare that there is no conflict of interest regarding the publication of this paper.

Abbreviations
The