Tendency on the Application of Drill-Down Analysis in Scientiﬁc Studies: A Systematic Review

: With the fact that new server technologies are coming to market, it is necessary to update or create new methodologies for data analysis and exploitation. Applied methodologies go from decision tree categorization to artiﬁcial neural networks (ANN) usage, which implement artiﬁcial intelligence (AI) for decision making. One of the least used strategies is drill-down analysis (DD), belonging to the decision trees subcategory, which because of not having AI resources has lost interest among researchers. However, its easy implementation makes it a suitable tool for database processing systems. This research has developed a systematic review to understand the prospective of DD analysis on scientiﬁc literature in order to establish a knowledge platform and establish if it is convenient to drive it to integration with superior methodologies, as it would be those based on ANN, and produce a better diagnosis in future works. A total of 80 scientiﬁc articles were reviewed from 1997 to 2023, showing a high frequency in 2021 and experimental as the predominant methodology. From a total of 100 problems solved, 42% were using the experimental methodology, 34% descriptive, 17% comparative, and just 7% post facto. We detected 14 unsolved problems, from which 50% fall in the experimental area. At the same time, by study type, methodologies included correlation studies, processes, decision trees, plain queries, granularity, and labeling. It was observed that just one work focuses on mathematics, which reduces new knowledge production expectations. Additionally, just one work manifested ANN usage.


Introduction
Currently, big computers size' measure velocity on petaflops where processors manage to use up to 15 cores for command processing [1]. This kind of power opens an opportunity window for hard disks, buses, and dynamic memory design, making software engineering lag behind. The computer industry and its products are still losing pace against this dynamic change because software development is behind hardware [2]. Before this scenario, data warehouse (DW) techniques had allowed new database version development, such as Oracle 19i, whose Transparent Data Encryption technology let sensitive data encryption, which is stored in tables and tablespaces, granting privacy and security to the user where it was not done previously [3]. Process time during encryption administration increases considerably when it only has 32 bit processors or less as a base, so encryption and volume require capable enough hardware to support them. Even so, to improve efficiency on DW, there have been new techniques created such as Deep Reinforcement Learning, which is an agent trained on historical data on storage operations and retrieving [4]. In the case of normalization, there has been a fault of standardization and disconnection between the theoretical environment and practical applications, which has given as a result methodologies that suggest data structure map out, documentation on design, and panels for monitoring [5]. Other techniques that improved the performance of DW are data cube (DC), which is frequently used in On-Line Analytical Processing (OLAP) [6], and the drill-up method, which adds elements to the process line by clustering nodes and edges with similar derivation histories [7]. However, there is also a tendency for the use of DD analysis to make cross-references that focus on patterns and observations in independent sources [8] on data analysis, and the business communications between users are also improved with DD interaction [9].
Between storage technology and mining design, the question surges over the necessity of redesign techniques to exploit all these potentialities. As an answer emerges DD analysis, which has been implemented inside deterministic parameters, although underutilized because of its lack of AI application by not using machine learning (ML) techniques. That is why this research is pursuing the answer to the question of if the lack of ML tool implementation on DD analysis has constituted a limitation for its technological evolution in the scientific literature.
Existing works have shown the use of DD as a support tool for solutions research on data analysis problems. However, under actual circumstances, it has limitations that could be solved if the methodology is improved with ANN modeling implementation, which will be not only an advance in technology terms but a way to exploit new hardware and software resources. Moreover, this investigation proposes a methodology classification that could be used for clarifying the state-of-the-art on DD analysis and helps researchers to introduce in their own works the methodology that will help to reach their objectives.
As a systemic review, it is important to notice that patterns about productivity in scientific journals are discipline-specific, meaning that each one has its own measures and that productivity will raise projection on statistics or outperform in journal impact [10], in this case, depending on DD references. The frequency of DD analysis usage in the scientific literature will determine if it is convenient to improve this technique through the implementation of ANN surpassing limitations that are currently confronted, such as the shortage of knowledge generation.
This manuscript is structured as follows: in Section 2, DD analysis general characteristics are explained along with their contextual theme. Section 3 shows the explanation of the dataset origins and the methodology for experimental work. In Section 4, results of the analysis are presented. In Section 5, a discussion about the findings is elaborated, and in Section 6, conclusions are offered.

Theoretical Fundaments
The objective of this investigation is to determine DD analysis from the results perspective of its implementation, using it as a search guide, among other things, concepts of DD (which is the subject of study), overfitting (which is one of its most frequent problems), data mining DM (which uses it as a tool), and deterministic model (whose designs are the most frequently used).
In addition, we also want to retrieve a list of all methodologies used and the results produced both for and against its particular objectives. In this way, we can propose a methodology that generalizes results and suggests solutions that can be justified in the course of experimentation.

DD Analysis
DD analysis is a deterministic model that helps to provide different data images in reports, schemas, and spreadsheets, which makes it simple and helps to reveal the tendency origin exposed during the study phase [11]. DD analysis applications go from medicine [12] and production [13] to malware detection [14], visualization [15], and data administration [16].
In modern decision panels, it is easy to understand but not to implement because its principal limitation is the DM exploitation, which utilizes models from probabilistic to deterministic and generates some problems, such as recursivity and letting aside base knowledge production. In ML, overfitting occurs when a given model develops very well in the training stage but falls significantly in the test one [17]. Figure 1 shows the overfitting nature with respect to model usage and error retrieved, using optimization as an arbitrary frontier determined by the application model.

Overfitting
Optimum Application Model Test Training Error Figure 1. Overfitting in ML, according to [17], represents data behavior when training and test data surpass the optimum mark established by the application model.
Overfitting occurs when the parsimony principle is violated in the use of models or procedures. This is when more than the necessary terms are included or there are more complicated approaches than required. There are two overfitting types [18]: To use a model more flexible than it should be, and 2.
To over-represent performance on a dataset.
Models with overfitting tend to memorize all training data, including noise, instead to learn the hidden knowledge inside the data. Some solutions to avoid this problem, according to [19], are: • Early-stopping, which prevents the algorithm precision to stop improving after a certain point. • Network-reduction, which is about reducing the noise amount when reducing the classification model size. • Training-data expansion, which is to improve the training dataset quantity and quality, especially in supervised learning areas.
During ANN utilization, the parameters' increase demands a great quantity of training data to tune hyper-parameters. To reduce overfitting, even a perfect training must not only be big in size, but include limited dosages of noise.

Deterministic Models
Deterministic models are those that, when subjected to the same impulse, lack uncertainty. In other words, they can be predicted with certainty and their behavior is evaluated with effectivity or efficacy measures.
Deterministic models are classified in tree programming methods according to [20], which are:
Some techniques utilized in linear programming are a variable identification of those that influence the supplies lost, diffuse linear programming to improve the supply chain, and integrate linear programming focused on heuristic aspects. Their limitations are established when all these methods have an objective to maximize earnings or minimize costs. In the case of entire mixed linear programming, it is required that variables have as much of the integer values as no negatives with what it could obtain results on coordination and control for subsequent studies. However, we notice a restriction for two variables when they are integers and there is binary number utilization. In the case of algorithms, these are used due to the complexity existing in production systems and the objective we want to achieve. They are a solution to the problems that cannot be solved by conventional methods or using different types such as multi-objective algorithms or genetics, among others.
The above comes up because DD analysis has been developed and implemented always inside deterministic models, lagging behind when state-of-the-art models are used, such as those supplied by the ANN.

Materials and Methods
To conduct the systematic review, it is necessary to adopt a quantitative profile because this process is performed on scientific literature available online with normalized vectors (which we will see ahead). This allows for the retrieval of sizable, quantified, and predictable results, which is what concerns quantitative research.

Data Source
The used research criterion includes journal articles that are listed in the Journal Citation Reports of those published by IOP, Nature, IEEE, MDPI, and others, whose topics used DD analysis in their particular methodologies or utilized methodologies leading to DM. During the first stage, dataset recollection, articles were grouped by type of work, methodology, and solved and unsolved problems when authors decide to report them. Table 1 shows the recollection criteria.
Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [21] specifications, on the selection process, 200 works were reviewed, and from them, only 80 were selected as the base. Special emphasis was made on those referring to DD analysis or DM processes on its methodology. The search was focused on data science engines and, in some cases, on engineering related to computer science.

Methodology Problem Solved Problem Unsolved
The research work Applied methodology The problem or problems solved The problem or problems not solved (when researchers have reported) With the aim of facilitating normalization, the dataset was enriched by adding the methodology classification proposed by Khaldi [22], which includes in the experimental design: true experimental when variables can be manipulated, quasi-experimental when variables are manipulated in a controlled environment, and single subject in very punctual cases, which has not been included in this research. In the case of non-experimental design, on which variables cannot be manipulated, it was divided into: descriptive when there is data recollection, comparative when relationships are looked up, correlational when there are possible but not forced relationships, survey when it refers to surveys, and post facto when it focuses on effects and tries to establish causes.

Works Clustered by Methodology
Already enriched, the dataset was given a procedure to cluster by methodology following the next steps: first, ordering by the recently added methodology attribute; second, ordering by publication year; and third, ordering by methodology approach. Then, it was divided by methodology tables to offer an easier visualization. This procedure will permit categorization by methodology analysis and its inner types, and at the same time, it will avoid the use of a single table. The methodology approach, on column Type, refers either to the type of work, which could be a mathematical analysis, or a process performed. Regarding the referred methodologies type, it will be described next.

Tree
It is inferred that classification trees and regression are used to identify local structures in both big as well as small datasets. Classification trees include models on which dependent variables are categorical, while regression trees are continuous [23].

Query
There are numerous query processing techniques, of which the most popular are those based on aleatory selection, where selection is realized in little samples, which is later extrapolated to the rest of the database [24].

Correlation
Correlation analysis allows users to specify two or more key attributes in a dataset, with the aim of making an analysis by calculating the correlation between each pair of selected columns producing, regularly, a result matrix [25].

Granularity
This is the task of grouping in granules, groups, classes, or clusters from a universe in the process of solving the problem [26].

Normalizing
To be able to quantitatively shape the research, a normalization was applied by manually vectorizing columns by separating the publication year, methodology, and type, besides the number of problems solved and not solved, as Table 2 shows.

Variable Definition
To define research variables, it was necessary to look up concepts according to key performance indicators, which are those elements that can measure the achievement procurement [27]. They were adapted to the standard definitions design of ISO-9000, so creating performance indicators could be related with the activity management [28], which are specified in Table 3. Where: • The sum of works will establish the quantity of knowledge around the DD analysis on the scientific community. The denominator is the sum of time defined during the process [29]. • The modal will indicate the year the technique was more utilized and that it will be compared with the new knowledge stagnation or the absence of ANN techniques. • Sum of problems solved and problems not solved will determine the proportionality of the successful method and the causes responsible for its no-usage in subsequent works. In set theory, this is represented by the method, which belongs to the objective searched.

Dataset and Software
The dataset and process software are found in [30].

Results
Below are shown the results obtained from the systemic revision of the scientific literature.

Data Distribution
Considering that work [31] from 1997 is an extreme value, which was used as an initial reference of the DD analysis implementation over procedural language on SQL (PL/SQL), and starting from 2004 to 2022, the mathematical mean by year was established on 4. Figure 2 shows its growth after the start of 4G communication networks technology and it soaring during the COVID-19 pandemic in 2021, produced by works that were focused on the research caused by this disease. Normalized and categorized tables are shown next.

Comparative
Those works in which the performance, processes, a correlation on a pair basis, or simple query are compared are shown in Table 4. All those works that intended to depict data and their relationships are listed in Table 5, sorted by year and analysis type.  [47] correlation [48] correlation [49] performance|process 2010 [50] correlation [51] performance|process [52] performance|process 2011 [53] label [54] performance|process [55]   One of the best ways to reach data science is the experimental approach. Table 6 shows those works that focused on the results, specifically when performance or query were involved.   [96] correlation [12] correlation [97] performance|process [98] performance|process [99] performance|process  Table 7 lists those works for which the methodology was looking for effects intending to justify causes. The fact is that so few studies emphasize the lack of theories that are subject to verification. Once the dataset was processed with the application software, both independent variable and dependent variable incidence were obtained, along with the sum of studies that are related to them. Table 8 shows the results.

Problems Solved
Studies that solved a great diversity of problems that are classified by category are described below.

Comparative Methodology
In work [32], the authors presented a range tree that is used to compact and mark out correlation in metadata, which produces improved scalability and adaptiveness. In the case of visualization, it allows researchers to arrange aleatory, while multi-compare groups make possible the comparisons of clustering algorithms and analysis of multidimensional data [15]. Interactively visualizing a sequence of filters and logical combinations produces faster and more efficient workflow [36]. On the verge of analyzing production process data against those obtained through simulation predicts the emergence of the accuracy of outstanding failures [13]. Prototype implementations of expand-ahead drill-down faster [33]. Metric access methods make it possible to understand data organization [35]. While integrating layers on SOA systems, it was realized that the service bus allows a declarative definition of how to react to anomalies and diagnose origins of problems [37].
Graphical statistical methods as well as data mining methods produce knowledge discovery techniques [38]. Human-machine interactive methods perform data mining to make a data classification and relativity analysis [34]. To understand data sub-clustering behavior on adding filters progressively, ref. [40] shows how tendency deviation is attributed to a local change, also named the drill-down fallacy. Post-Study System Usability Questionnaire (PSSUQ), a tool made for testing based on user satisfaction, makes the SOLAP measures possible to complete OLAP visualizations on operations and data [41]. Kitchenham's technique for selecting and clustering makes it possible to research learning analytics on big data that generally intends to gain learning processes [42]. Hierarchization for data clustering and a hybrid data warehouse model for extraction and analysis solves obstacles in data mining process algorithms, even on a data cube [39].

Descriptive Methodology
For Angryk and Petry [44], mining multi-level knowledge makes it possible to enhance the methodology to apply scientific data mining. Identifying relevant metrics while exploring data cubes helps support the decision-making functions, which are integrated into a commercial OLAP [16]. Substring hole analysis for viewing the coverage of huge datasets identifies coverage holes [47]. Machine learning created to construct structures with tree shapes that interpret dependencies on a KPI allows business analysis to process them, even if they depend on lower-level metrics [48]. Techniques that are based on histograms to reduce sliding windows have proposed one relying on a multi-structure tree [50]. Cross-lateral identification support traffic classification on multilateral and hierarchical identification [56]. The bi-level framework that unifies macroscopic and microscopics measures spam to pinpoint suspicious results on rating datasets used in websites supporting restaurants [57]. The use of non-supervised techniques to discover everyday activities of smart home residents produces automatic identification of such activities [58]. When information flow is grouped into matrix design cells, to identify patterns and instances from the larger network, it flows and another distinguished quantity flows [60]. To create a concise schema design, the grouping data process must adjust to understand the relationship between data sources, so structure design will implement a robust, documented, and updatable architecture [61]. Aggregation represented by UML diagrams and PRR language makes it possible to class diagrams and typology [53]. Describing each column as a rule by format f(a b*n) optimizes problems [63]. An auto-organized map working as an unsupervised learning algorithm to render visualization of multivariate data by producing an initial cluster, and at first only showing representational clusters, makes it possible to show inherited global structure [43]. Application technology on AI makes it possible to highlight production issues and easily analyze information [45]. The use of vector-type methods to validate every DD operation may clarify if such a method is really efficient [46]. The approach to the workflow field by taking a data-centric workflow viewpoint makes it possible only if processes are connected by a record and the system is available to connect processes with different data formats [49]. Using plug-in architecture, which permits module development to return data, allows for the building of websites with sophisticated DD operations [51]. An approach to discover knowledge by the integration of sums and rendering techniques reduces time to search and identify information [52]. A wisdom appreciation of system and job execution reduces code volume, splits data, and leverages open sources for tools [54]. Linear data analysis is much better when a tree map is adapted along with the calendar metaphor and using time as the principal hierarchical attribute [59]. To conduct methods for reproducible science, it is necessary to accumulate tracing by grouping edges and nodes with the same derivation [7]. Designing and developing the design process develops executive information [64]. The OLAP visualization approach with tree-like analysis views generates multidimensional expressions [65]. For the implementation of a new data cube, a hierarchy algorithm is necessary to implement spatial indexing and non-relational techniques [67]. DD view adjusts to perceive noise on data analysis (noise on vibrating mechanical parts), allowing for design optimizations and the ability to study data simulation noise much faster [68]. The development of a tool based on a pie chart benefits the visual analysis of categorical data [55]. To explore data sub-clustering behavior on the learning analytics dashboard, [66] propose a perspective that recommends a profound DD on LAD users. Architecture for solving big data queries on NoSQL depots, which pre-compute results on the granular sector for collections, which are de-grouping, proves model effectivity to apply DD and drill-up queries on extensive experimental evaluations [62].

Experimental Methodology
Sen et al. [77] show that OLAP operations on multidimensional models are possible after adding smaller cuboids partitioned depending on their cardinality. The entropy is maximized when its information principles are used to determine proxy databases [78]. Opinion-mining techniques and visualization tools quantify the opinion of the voters [82]. An analytics solution focused on team metrics allows for visual design and navigation [90]. Regression function and forecasting make possible trend detection [95]. The use of online dynamic queries on data layers establishes correlations, trends, or outlier identifications [96]. We use the MediSyn tool for selecting, connecting, elaborating, exploring, and sharing qualified insights via interactions [12]. Group-by-group aggregation for performance evaluation makes alternatives possible for moving computer object groups [83]. For implementing functionality information, recovery is useful to combine text ranking/searching techniques [69]. In some cases, materialized views from OLAP cubes could be originated from data models on hierarchical and multidimensional definitions [71]. The use of flash memory in energy-efficient environments is possible due to a storage-centric sensor network [73]. Improved complete algorithm Glide for views actualization eliminates data anomalies [74]. On query algorithms, the parallel closed cubes decrease and the number of data blocks increases [75]. The collection of data is made possible due to multi-layered and constraint language based on offline and DD analysis [84]. Shortened response time on performance evaluation is made possible through experimental evaluation on open source software [85]. Running several instances of fixed window sizes is made possible due to an algorithm that supports intense traffic [86]. Tendencies and statistics to perform analysis are made with the help of frameworks supported by event collection and aggregation [89]. Malware inspection on data networks allows for activating or inactivating verification timeout [14]. Effectivity of the space-temporal simulation model provides feedback on geo-spatial data [94]. Perspective on performance and process shows errors that can be produced along manual decision tree trace [97]. Gaussian Alerts of un-reachability incidents level are possible due to the average raw rate of HTTP pings [98]. Reliability performance analysis on large datasets is possible by extracting transaction data with a fast model [99]. Comparison and identification is possible with a hierarchy and pivot visualization breakdown [70]. Education data can be integrated, analyzed, and processed with the Panda application system [80]. Multidimensional analysis tools affect the outline of each function [81]. Citation was designed considering usability and user experience goals fulfilling usability goals of effectiveness, efficiency, and learnability [87]. DW instantiation using a document-oriented system makes it possible to model and cross-model comparison [88]. Efficiency on heavy hitters and frequency queries relies on specific algorithms [91]. The limitations established by tuple shape data can be redefined by OLAP queries [92]. E-learning visual narrative potential could be demonstrated with a narrative approach [93]. Node labeling on the hierarchy tree makes it possible to choose table categories throwing the labeling method [31]. Locating targets is made more accurate by using a distortion algorithm from fisheye design [72]. Feedback texts' foreground and background models supported by weighting schemes lay the foundations for cluster contrasts [76]. The size of interface elements, including shaping emotions in interface design, can be conducted with the use of concept hierarchies [100]. The algorithm for using a dynamic data structure identifies a Galois Connection with well-defined abstraction and concretization functions [79].
Multi-layer networks as data models make it possible to generate EER diagrams, model flexibility, and model suitability [101].

Post Facto Methodology
Odoni et al. [102] presented Orbis, an extendible environment for DD analysis with multiple notation tasks and versioning that makes it possible for entity recognition, disambiguation, and entity typification. Generation of generic knowledge needs a big set of rules and then searching down the basic one with semantic analysis [103]. Use of a panel that leads introspection to the facilities level makes it possible to identify probable problems and retrieve eight performance indicators visualized in several views, which will enable DD analysis on specific data [104].

Problems Not Solved
Although the difficulty implies scientific research development and errors and inaccuracies that are frequently attained, not all authors reported failures or narrow circumstances in the process. Those who made it, grouped by category, are listed below.

Descriptive Methodology
In 2018, Jiménez [61] presented that to create a concise schema design, the grouping data process must adjust to understand the relationship between data sources, and such schema must be updated to prevent future problems. The plug-in architecture, which permits developing on server-side modules, does not permit expansion [51]. Development of a tool based on a pie chart is based on a short usability study [55]. Architecture for solving big data queries on NoSQL depots that pre-compute results on the granular sector for collections, which are de-grouping. Notice that proposed architecture was just tested in specific study cases, and it was considered temporal data importance due to its low granularity [62].

Experimental Methodology
Mathrani [96] practiced with dynamic online queries into data layers that show that deployment was not ready to allow for the understanding evaluation during performance. Perspective on performance and process suggest that errors may be produced at data overfitting [97]. In the case of heavy hitters, their algorithms have shown a slight overhead [91]. OLAP query redefinition does not allow one to see a list of problems to solve [92]. Node labeling on the hierarchy tree informs that in the absence of a label, reading must be made on the detail [31]. Concept hierarchies that render a dynamic interface lack mobile applications [100]. Multi-layer networks as data models do not inform the existence of total verification [101].

Post Facto Methodology
Additionally at Orbis, Odoni et al. [102] noticed that multiple notation tasks and versioning do not integrate significance testing statistics, building plug-ins for monitoring and developing support for extra evaluations. The creation of base knowledge sometimes is omitted in the works, although that is an obligatory step to improve the adoption of these techniques [103]. Use of a panel leads introspection to the facilities level, making it possible to identify probable problems. Notice that for better efficiency not all data were included [104].

Methodologies Application
This section refers to the use given to methodologies in the aforesaid works. Each research, as explained before, has been normalized to be classified within a methodology categorization, which could help in understanding its own nature and being helpful for ease of later studies in the same area. Table 9 shows the results of methodologies classified by category. For better appreciation, Figure 3 outlines the distribution percentage by category on research studies that applied DD analysis or DM techniques, with some similarities in the analysis approach.  It follows from the above that for each category, a diverse set of techniques were used, which are depicted in Table 10. The methodology type was not limited to one particular category but was also combined to obtain results suited to each particular objective of the researchers. Figure 4 shows the interrelationships between the category of each methodology and the techniques used. For lack of a third dimension, it had to connect with the dotted line granularity technique used in experimental as much as in descriptive methodologies. At the end of the day, a general coincidental technique is the use of methodologies for data extraction or query.

Perspective
At the beginning of this systematic review, the perspective revolved around the application of DD on macro-and micro-economic research, but because of the little material found, it was expected that it will not be enough to back the desired results up; hence, the investigation profile was redefined, focusing more on knowing the work's frequency and its methodological profile. As a result, it posed its application inside the general investigation that, from there, split by categories and profile by quantity, nor for specificity, with respect to the exploitation of this technique.
Those works collected on the basis of DD analysis have only coincided with the fundamental methodology core, which is the profound data analysis. Each one has focused its own analysis on the resolution of specific problems in the researcher's selected area. Inside this panorama, Tables 4-7 show the methodology type and the year the study was made. In the case of the possibility of reproducing the experimentation, with the exception of Wang & Iver [31], which presented the PL/SQL code for a relational database, there were no available codes nor open databases.
At the moment, there are few works on DD focused on this particular topic. Hence, studies on this could be considered emergent technology.
According to the results, there were found slopes over query manipulation and decision tree labeling, whose common factor is data exploitation, leading to efficiency. It turns out interesting to see the scarce amount of methodologies focused on mathematics, circumstances that justify the lack of base knowledge, and as a consequence, the loss of evolutive usability of the DD analysis.

Discussion
The lack of DD analysis usage in research works results in the display of biased conclusions. If it is true that deterministic models applied to DD have a set lag behind before the implementation of new AI tools, it must be considered that the rehabilitation of this technique from the application of ANN faces technological advances in software in such a way that it can be reestablished on the scientific as well on the economic research vanguard.
If a combinatory application of the used methodologies in the previous studies is realized, and the lack of knowledge base generation is fixed, a new methodology capable of bringing down the overfitting problem and integrating AI technologies would be obtained, as well as avoiding redundancy problems. New knowledge generation to establish future theories will be a researcher's responsibility in future application technology works.
It was observed that decision trees and labeled related models drive attention to granularity processes, what is right in the case of conclusive results or those which require streamlining process velocity. However, it is recommended to avoid it because it does not permit specific knowledge generation, just because data grouping may obey the necessity of establishing tendencies and not resolve punctual problems. Figure 5 displays problems solved and not solved with respect to an applied methodology. The majority of solved problems focused on experimental (42%) and descriptive (34%) methodologies, which infers that researchers exhibit a tendency to establish observation processes or to describe them from the perspective of a third one. Only 17% compare processes with the object to obtain new knowledge, and just 7% focused on data science.
The tendency on works that manifest unsolved problems indicates that 50% had issues during experimentation, 29% on the description of the sample, and 21% on data analysis. This is coincidental with the small interest in the last figure to board such methodology as an objective or that there were affectations on the data process, as it would be in the overfitting case. It was noticed that there were no manifestations of unsolved problems in the case of methodology comparison, which is normal given the fact that comparison is just an observation activity.

Conclusions
With respect to the methodologies performed by the analysis of the state-of-the-art, it was proved that, besides finding biases in its methodologies, they have not been empowered by ANN algorithms such as an AI tool. With respect to the application of DD and DD plus ANN, there were no findings of studies that showed the usage of such methodologies.
Most dissected works used DD analysis as a supervision technique and not as a method for producing conclusions. In other words, DD is used as a vehicle for other methodologies and is not as a methodology per se.
Low frequency in the amount of work made by researchers by year demonstrates the pursuit of newer techniques as newfangled rather than the actualization with resources surrounding the technological world and, for that reason, easily available and effective, which only needs new methodologies to be able to compete, as currently is the application, in any technological area of ANN. Something this research has really done is to appreciate the wide application of DD analysis, in all faces data can offer or apply, depending on the point of view.  Acknowledgments: This research was done with the help from Autonomus University of Queretaro UAQ.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations were used: