SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review

: The competent software architecture plays a crucial role in the difﬁcult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efﬁciently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their speciﬁcally tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes.


Introduction
The architecture of a particular software application addresses non-functional characteristics, such as dependability, usability, scalability, performance, interoperability, portability, adaptability, and data sharding. There are always trade-offs among the set of quality attributes, and a software architect faces the difficult task of balancing them. Big data systems [1] are intrinsically distributed. Data availability and consistency difficulties are produced by data sharding and replication within vast data systems. Due to the increasing expansion of data applications, database technologies have experienced substantial variations. Over the course of more than a decade, NoSQL databases have grown exponentially, although classic database automation has persisted. Traditional mock-up forces a rigid schema structure, which leads to scaling obscurity and inhibits data modification across clusters. In contrast, NoSQL databases support simple prototypes. Principal properties of NoSQL database designs include: • Schema-less structure • Permitting data representations to grow effectively and dynamically • Scaling horizontally, by data replication collections and sharding, over massive clusters.
In recent years, numerous organizations have accumulated vast volumes of data, which relational databases cannot process efficiently. In the past four decades, there has • Scaling horizontally, by data replication collections and sharding, over massive clusters.
In recent years, numerous organizations have accumulated vast volumes of data, which relational databases cannot process efficiently. In the past four decades, there has been a huge rise in the use of relational databases. They adhere to the ACID (availability, consistency, isolation, and durability) attribute and are designed for structured data. While "Big Data" comprises tools and technology that manage massive amounts of data at any scale, these tools and technologies are scalable. Big data comprises the 5Vs (volume, velocity, variety, veracity, and value) and a massive amount of unstructured data with a diverse nature. Numerous frameworks, including Hadoop/MapReduce, Spark, Flink, and Samza, are utilized for large data processing [1].
The subject of SQL query performance and optimization for enterprise, production, parallel databases, and big data [2] has received increased attention in recent years. Ineffective and non-optimized queries may consume system and server resources, resulting in database locking and data loss issues. Information mining entails extracting the facts and logical correlation structure from the original data set, as opposed to the information itself. Query optimization refers to selecting the optimal query execution strategy with minimal cost and system resource consumption. The data mining algorithms conduct indepth and extensive database queries to extract patterns and knowledge from comprehensive data [3]. Alternative methodologies, such as XML and object databases, have never achieved the same level of popularity as RDBMS technology.
Over the past decade, science and online vendors have questioned the "one-size-fitsall" nature of data shop technology. This line of thinking resulted in the development of a new alternative database system known as NoSQL, which stands for "Not only SQL." NoSQL describes web developers' usage of non-relational databases [4]. In 1998, the term NoSQL [5] was used for the first time, and the non-relational databases conference in San Francisco drew greater attention to it. Figure 1 describes the key aspects of NoSQL databases. Eric Brewer presented the CAP (consistency, availability, and partition tolerance) theory [6,7]. The main characteristics of CAP theory are given in Table 1.

Consistency Availability Partition Tolerance
• Consistency means that the data in the database remain consistent after the execution of an operation. • For example, after an update operation, all clients see the same data.
• Availability means that the system will not have downtime (100% service uptime guaranteed). • Every node (if not failed) always executes the query.
• Partition tolerance means that the system continues to function even when the communication among servers is unreliable. • The servers may be partitioned into multiple groups that cannot Eric Brewer presented the CAP (consistency, availability, and partition tolerance) theory [6,7]. The main characteristics of CAP theory are given in Table 1.

Consistency Availability Partition Tolerance
• Consistency means that the data in the database remain consistent after the execution of an operation. • For example, after an update operation, all clients see the same data.
• Availability means that the system will not have downtime (100% service uptime guaranteed). • Every node (if not failed) always executes the query.
• Partition tolerance means that the system continues to function even when the communication among servers is unreliable.

•
The servers may be partitioned into multiple groups that cannot communicate with one another.
Theoretically, it is impossible to fulfil all three requirements. Therefore, CAP supplies the essential need for a distributed system to follow two of the three elements. Hence, all the current NoSQL databases support the various combinations of C, A, and P of the CAP theory, as described in Figure 2. communicate with one another.
Theoretically, it is impossible to fulfil all three requirements. Therefore, CAP supplies the essential need for a distributed system to follow two of the three elements. Hence, all the current NoSQL databases support the various combinations of C, A, and P of the CAP theory, as described in Figure 2. The following are the advantages of NoSQL databases: • Volume: Data at rest-Terabytes to exabytes of existing data to process. NoSQL databases differ from relational databases and have their own models and architectures. When storing NoSQL databases in the cloud, caution is necessary due to the diversity of these databases and the need for interoperability, portability, and security measures. Cloud service providers (CSP) [8][9][10][11][12] offer scalability, availability, and privacy, but it is important to encrypt sensitive user data before granting access to the CSP. This can be challenging due to the varied nature of NoSQL databases. Database as a Service (DBaaS) [13,14] is a cloud platform that converts traditional architectures into cloud architectures.
The main contributions of this SLR are the following: • This SLR is related to the SQL and NoSQL database architecture assessments, scaling capabilities, and performance analysis, particularly Oracle RDBMS and NoSQL Document Database (MongoDB). In addition, data movement among various databases across multiple cloud platforms is explored.  The following are the advantages of NoSQL databases: • Volume: Data at rest-Terabytes to exabytes of existing data to process. NoSQL databases differ from relational databases and have their own models and architectures. When storing NoSQL databases in the cloud, caution is necessary due to the diversity of these databases and the need for interoperability, portability, and security measures. Cloud service providers (CSP) [8][9][10][11][12] offer scalability, availability, and privacy, but it is important to encrypt sensitive user data before granting access to the CSP. This can be challenging due to the varied nature of NoSQL databases. Database as a Service (DBaaS) [13,14] is a cloud platform that converts traditional architectures into cloud architectures.
The main contributions of this SLR are the following: • This SLR is related to the SQL and NoSQL database architecture assessments, scaling capabilities, and performance analysis, particularly Oracle RDBMS and NoSQL Document Database (MongoDB). In addition, data movement among various databases across multiple cloud platforms is explored. • A total of 142 studies have been analyzed to accomplish the research goals mentioned earlier.

•
This article identifies the research gaps in the associated architectures and their causes.

State of the Problem
The proliferation of big data has led to the need for scalable systems that can effectively process massive amounts of data. Relational databases, which are based on SQL, can manage structured, semi-structured, and unstructured data, to a certain extent, but have limitations in terms of scalability. On the other hand, NoSQL databases, which follow the BASE property and can expand their storage capacity horizontally, are better suited to managing vast volumes of data and adapting to changes in data type and structure.
There are four types of NoSQL databases-key-value, document, column, and graph databases-each with their own distinct features and applications. For example, a NoSQL graph database stores information in nodes rather than tables and stores associations (joins) among nodes, which requires constant execution time. This gives it an advantage over SQL databases when handling highly interconnected data.
MongoDB is an example of a NoSQL database that has effectively managed huge data due to its excellent scalability characteristics. However, converting from OLTP databases to MongoDB can cause issues, such as unique indices, composite keys, data inconsistency, and data duplications due to the complexity of their schema.
Another major concern in cloud storage transfers is the protection of users' private information. Current cloud service provider (CSP) designs are developed without interoperability and portability issues being taken into consideration, making it difficult to offer and create a unified cloud solution for NoSQL models.
Finally, in this SLR, the state-of-the-art security techniques and policies for NoSQL databases are examined in detail.

Method
This SLR follows the PRISMA guidelines and aims to synthesize high-quality primary studies based on evidence associated with specific research questions. SLRs have been widely used in software engineering research, and our study focuses on SQL and NoSQL database architecture assessments and performance evaluations, as well as data portability and interoperability across multiple cloud platforms. The characteristics that differentiate our study from existing ones are its systemic data collection process, comprehensive list of covered studies, focused study scope, and its detailed classification and analysis of selected studies. The remainder of the paper is organized as follows: Section 2 outlines the research questions, search strings, inclusion/exclusion criteria, data extraction, and classification, while Section 3 presents the results of the selected papers. The discussion and research gaps are described in Section 4, and the conclusion and future work are summarized in Section 5

Objectives and Research Questions
The purpose, focus, and objectives of the current SLR:

1.
Address the existing SQL and NoSQL document approaches and techniques by considering big data processing.

2.
Perform a systematic literature review associated with SQL and NoSQL databases.

3.
Review selected study subsets in depth.

4.
Assess the strength and weaknesses of SQL and NoSQL databases on the basis of the evidence collected and analyzed from these studies.

5.
Highlight the research gap in the area. 6.
Formulate the following research questions to achieve the main objective of our study: • Considering big data (structured and unstructured data): What is the need for NoSQL? • Why does the NoSQL database follow the BASE property instead of the SQL database ACID property? • Does DBaaS tackle data interoperability and portability efficiently in various NoSQL databases?

Search Criteria
The search criteria for primary studies involve identifying and collecting the literature that meets the inclusion and exclusion criteria. To accomplish this, various search techniques were used, such as an electronic database search, manual search, and snowballing, along with searching associated journals and conference proceedings. For our SLR, we followed a protocol proposed by [15,16] and the seven phases investigated in [17]. Figure 3 illustrates the selection of associated studies.

Search Criteria
The search criteria for primary studies involve identifying and collecting the literature that meets the inclusion and exclusion criteria. To accomplish this, various search techniques were used, such as an electronic database search, manual search, and snowballing, along with searching associated journals and conference proceedings. For our SLR, we followed a protocol proposed by [15,16] and the seven phases investigated in [17].  First, we performed the database search suggested by [18]. The search strings mentioned in the search strategy section were used for categorizing and classifying, i.e., tools, methods, and framework.

Search Resources
In Phase I, we derived a set of associated search strings. We used the derived set of strings to find the related papers. The largest databases were selected for finding the associated articles: First, we performed the database search suggested by [18]. The search strings mentioned in the search strategy section were used for categorizing and classifying, i.e., tools, methods, and framework.

Search Resources
In Phase I, we derived a set of associated search strings. We used the derived set of strings to find the related papers. The largest databases were selected for finding the associated articles: Wiley Online Library (onlinelibrary.wiley.com) We found 13,000 papers during the search process using article titles, abstracts, and keywords. There were 2431 papers that met the inclusion and exclusion criteria after removing duplicates from Phase I. Backward snowballing in Phase V led to 89 articles being selected after repeatedly validating the inclusion/exclusion criteria from Phases II through IV.

Search Strategy
Various combinations of search strings were created. The devised search strings were run on the mentioned search resources to identify the associated literature:

Selection Process and Criteria
We included all associated papers by applying inclusion/exclusion criteria. In Phase 1, we evaluated the quality and characteristics of the papers according to our research questions and finalized the selected list. Our research papers are organized by source as follows:

Selection Process and Criteria
We included all associated papers by applying inclusion/exclusion criteria. In Phase 1, we evaluated the quality and characteristics of the papers according to our research questions and finalized the selected list. Our research papers are organized by source as follows: • Step1: Total number of documents based on:  Table 2 depicts the number of associated papers related to the points mentioned earlier after each phase filtration.

Selection Process and Criteria
We included all associated papers by applying inclusion/exclusion criteria. In Phase 1, we evaluated the quality and characteristics of the papers according to our research questions and finalized the selected list. Our research papers are organized by source as follows: • Step1: Total number of documents based on:  Table 2 depicts the number of associated papers related to the points mentioned earlier after each phase filtration. Repeat the entire process, go to Step1 Table 2 depicts the number of associated papers related to the points mentioned earlier after each phase filtration.  To increase the reliability and efficiency of the SLR, the corresponding author reviewed and investigated the impact and methodology of the included papers.

Exclusion Criteria
The research papers were excluded on the basis of the following criteria: To reduce the threat to the reliability of the SLR, the co-authors (2nd and 3rd) rechecked the excluded papers according to the checklist of exclusion criteria.
The latest version of the proposed SLR will be used when it is published in the journal or presented at a conference. The document quality was checked by the correspondingauthors and co-authors.

Data Collection and Extraction
After collecting the 142 associated studies, two reviewers reviewed them and extracted valuable data that satisfied the research questions [19]. The following data were obtained from each selected paper: We selected most of the empirical articles. The empirical studies consisted of the following categories of the papers' evaluations, discussion assessments, experiments, and reviews of existing techniques (SQL and NoSQL).

Data Analysis and Classification
Furthermore, information about each of the extracted data was arranged and tabulated in accordance with the research questions as follows:

1.
Considering big data (structured and unstructured data): What is the need for NoSQL? 2.
Why does the NoSQL database follow the BASE property instead of the SQL database ACID property? 3.
Does DBaaS tackle data interoperability and portability efficiently in various NoSQL databases?
During analysis, we grouped all extracted strategies into categories and summarized our findings into three groups: research methods, research process phases, and evaluation.

Validity Threats and Evaluations
Threats to the SLR process, as suggested by [11], should be consistently evaluated. These threats were categorized into different groups, including descriptive validity, theoretical validity, interpretive validity, generalizability validity, and repeatability [20,21]. Professor Zhang Cheng of Anhui University analyzed the SLR validity checks, and we incorporated his suggested changes into the protocol.

Results
In this section, we summarize our chosen research by publication year, paper genre, and number of chosen studies from a particular digital library (presented in full in Table A1 (Appendix A). Based on the selection procedure and criteria, most of the empirical research articles were chosen. According to the research literature, the researcher utilized both types of databases for their recommended methodologies and studies. In addition to actual investigations, we also discovered survey articles concerning SQL and NoSQL databases. Following the associated review selection procedure, we categorized the selected studies and publications into three main categories. Figure 4 illustrates the category pie chart for studies.
Professor Zhang Cheng of Anhui University analyzed the SLR validity ch incorporated his suggested changes into the protocol.

Results
In this section, we summarize our chosen research by publication year, and number of chosen studies from a particular digital library (presented in A1 (Appendix A). Based on the selection procedure and criteria, most of research articles were chosen. According to the research literature, the resea both types of databases for their recommended methodologies and studies. actual investigations, we also discovered survey articles concerning SQL an tabases. Following the associated review selection procedure, we categorize studies and publications into three main categories. Figure 4 illustrates the chart for studies.  Professor Zhang Cheng of Anhui University analyzed the SLR validity checks, incorporated his suggested changes into the protocol.

Results
In this section, we summarize our chosen research by publication year, pape and number of chosen studies from a particular digital library (presented in full A1 (Appendix A). Based on the selection procedure and criteria, most of the em research articles were chosen. According to the research literature, the researcher both types of databases for their recommended methodologies and studies. In add actual investigations, we also discovered survey articles concerning SQL and NoS tabases. Following the associated review selection procedure, we categorized the studies and publications into three main categories. Figure 4 illustrates the categ chart for studies.    Figure 6 include articles such as white papers, book chapters, and technical reports from a variety of publishers, including Oracle, MIT, Academic Journal, SciTePress, IJACSA, and IOPScience. Figure 7 depicts the paper category (journal/conference proceeding) of a particular digital library from Figure 6.
Many researchers compared the performance and properties of databases, such as MongoDB, MySQL, Cassandra, Couchbase, Oracle, SQL Server, PostgreSQL, Neo4j, and Hbase, throughout the study of the chosen studies. Figure 8 shows the variety of databases used to select the total number of research works. Figure 6 depicts the number of papers connected to relational and NoSQL databases for each digital library. Most relevant articles are in the IEEE and Springer digital libraries. Other possibilities in Figure 6 include articles such as white papers, book chapters, and technical reports from a variety of publishers, including Oracle, MIT, Academic Journal, SciTePress, IJACSA, and IOPScience. Figure 7 depicts the paper category (journal/conference proceeding) of a particular digital library from Figure 6.  Many researchers compared the performance and properties of databases, such as MongoDB, MySQL, Cassandra, Couchbase, Oracle, SQL Server, PostgreSQL, Neo4j, and Hbase, throughout the study of the chosen studies. Figure 8 shows the variety of databases used to select the total number of research works.

No. of Papers Per Source
No. of papers for each digital library. Most relevant articles are in the IEEE and Springer digital libraries. Other possibilities in Figure 6 include articles such as white papers, book chapters, and technical reports from a variety of publishers, including Oracle, MIT, Academic Journal, SciTePress, IJACSA, and IOPScience. Figure 7 depicts the paper category (journal/conference proceeding) of a particular digital library from Figure 6.  Many researchers compared the performance and properties of databases, such as MongoDB, MySQL, Cassandra, Couchbase, Oracle, SQL Server, PostgreSQL, Neo4j, and Hbase, throughout the study of the chosen studies. Figure 8 shows the variety of databases used to select the total number of research works.

No. of Papers Per Source
No. of papers  As can be seen in Figure 8, most studies comparing and analyzing performance and attributes used one or more of the following databases: MongoDB, MySQL, Oracle, Cassandra, PostgreSQL, Neo4j, SQL server, or CouchDB.

Empirical Studies Analysis
Empirical research and articles fall into the categories of comparisons, evaluations, experiments, categorizations, dialogues, and surveys. Articles that we believe were relevant to our study topics were chosen from the literature and assessed in light of our hypotheses and selection criteria. As can be seen in Figure 8, most studies comparing and analyzing performance and attributes used one or more of the following databases: MongoDB, MySQL, Oracle, Cassandra, PostgreSQL, Neo4j, SQL server, or CouchDB.

Empirical Studies Analysis
Empirical research and articles fall into the categories of comparisons, evaluations, experiments, categorizations, dialogues, and surveys. Articles that we believe were relevant to our study topics were chosen from the literature and assessed in light of our hypotheses and selection criteria. RQ1: Considering big data (structured and unstructured data): What is the need for NoSQL databases? RQ2: Why do NoSQL databases follow the BASE property instead of the SQL database ACID property?
Modeling a database helped us anticipate the types of data that will be stored in it and how they will be stored.
NoSQL, an acronym for "Not only SQL" [22], is an approach to database management that excels at handling massive amounts of unstructured data and big data [23] analytics [14]. All sorts of different query languages can be used with these databases, and they do not adhere to a strict, predetermined schema structure. However, over the past few decades, relational databases have used the industry standard SQL language. Document-oriented databases are a subset of NoSQL databases. Databases that focus on storing and retrieving documents include MongoDB and CouchDB. Databases of this type are utilized for the storage and administration of data that is primarily document-based. Complex data formats, such as JSON, BSON, XML, and PDF are used to store information in document-oriented databases. Both MongoDB and CouchDB are free and open source; however, MongoDB [24] is more suited to a distributed setting and to JSON. Researchers in [25] investigated several different NoSQL database features. One popular database that was built specifically with JSON in mind is MongoDB [22,26], which uses the C++ programming language. MongoDB uses dynamic schema [17] structures [1] instead of predefined, static documents. Data analysis and retrieval are quick and accurate because of improvements in query processing, indexing support, and in-memory aggregation. In addition to offering complete safety, it provides recovery and backup utilities. SQL and NoSQL databases are shown in Figure 9 along with their respective data storage structure. SQL (Oracle, MySQL, and SQL-Server) and NoSQL (MongoDB, Neo4j) databases have been the subject of numerous studies comparing their structure, design, and performance . These studies used the proposed techniques for situations involving SQL and NoSQL databases and analyzed the outcomes. NoSQL databases have different features and applications. Because of their ability to scale horizontally, NoSQL databases cannot guarantee the ACID property. In this regard, the Neo4j graph database [51,52] is SQL (Oracle, MySQL, and SQL-Server) and NoSQL (MongoDB, Neo4j) databases have been the subject of numerous studies comparing their structure, design, and per-formance . These studies used the proposed techniques for situations involving SQL and NoSQL databases and analyzed the outcomes. NoSQL databases have different features and applications. Because of their ability to scale horizontally, NoSQL databases cannot guarantee the ACID property. In this regard, the Neo4j graph database [51,52] is an excellent option. Graph databases [53][54][55] have been shown to effectively organize and store data with complex dependencies. Documents serve as MongoDB's primary focus. MongoDB uses the BSON format, which is an offshoot of JSON.
With huge data, MongoDB performs well and has shown itself repeatedly. Research in [56] compared the NoSQL database MongoDB with the relational database Postgre SQL for social network service and stream sensor data. In the scenarios where the two databases were compared; MongoDB came out on top [56]. Unlike relational database management systems (RDBMS), the NoSQL [57] database is well-suited for use in a cluster setting, where its flexible and powerful architecture can accommodate vast amounts of data in a wide variety of forms. MongoDB and MySQL were compared with the canonical database management system in [35]. MongoDB outperformed MySQL in data retrieval and data insertion. Based on the comparisons, they presented, it was clear that NoSQL MongoDB was the superior option to MySQL when handling large amounts of data. In-depth research of the prerequisites for effectively managing massive amounts of unstructured data was conducted for articles [43,58]. They preferred to use the NoSQL MongoDB database instead of employing MySQL RDBMS. The relational and non-relational database model was examined in detail [44]. Depending on their analysis, the NoSQL MongoDB database outperformed the MySQL database. They suggested that for a small dataset with basic queries, MySQL is efficient, while for large datasets with complicated queries, MongoDB is a more acceptable solution. The authors of [40] compared the performance of Oracle RDBMS with the NoSQL MongoDB database. As a result of their research, they concluded that NoSQL databases are not a viable alternative to SQL databases. They [40] claimed that companies can select a database suitable for their needs by considering the specifics of their company.
As for data scalability, high performance, and data availability, we discovered many publications [59][60][61][62][63][64][65][66][67][68] that discuss migrating from a relational database to a NoSQL database, as well as other papers [59,[69][70][71][72][73][74][75][76][77][78] on the topic. Because of its scalability, integrity, dissemination, security [24,59,[69][70][71][72][73][74], and customizable designs, MongoDB is well-suited for processing massive amounts of data. According to the cited works, switching from a relational database to the NoSQL database MongoDB is the way to go. MongoDB uses the Binary JSON Object Notation (BSON) format to store its information. Because it does not require junctions as do relational databases, MongoDB is able to store and retrieve large numbers of documents in a single collection efficiently. MongoDB is versatile enough to handle and process data in a wide variety of formats, including structured, semi-structured, and unstructured data. Relational databases were created with the express purpose of handling organized data, and as such, they can handle a certain amount of data that is vertical in nature. To be clear, NoSQL databases are not meant to replace relational databases, but their increased performance and utility mean that they have a place alongside relational databases in some contexts. According to the research [60,79], the relational database system does not scale well with massive amounts of data. Automatic mapping from MySQL RDBMS to a MongoDB NoSQL database utilizing metadata saved in MySQL RDBMS was proposed in article [61]. NoSQL databases do not promise to offer the same atomicity, consistency, isolation, and durability that are hallmarks of relational database management systems [80]. The BASE [81] feature is observed by NoSQL databases. NoSQL [60] is an umbrella term for a collection of technologies that share the goals of data availability, efficient data scaling, effective storage management, and improved performance. Research in [25] looks at the motivations for the shift from relational database management systems to nonrelational document-oriented databases. Meanwhile, the ACE (availability, consistency, and efficiency) aspects of the big data system have been explored in both databases [82,83].
Research into the topic of automatic schema transformation from SQL to NoSQL can be found in [84].
Articles [70,85] and [86] compared the efficiency of several NoSQL databases (including Couchbase, MongoDB, Re-thinkDB, and in-memory) when used with real-world data. While planning the trials they've proposed, the authors considered several factors, such as the response time of various databases. Comparative performance analyses between Oracle NoSQL and MongoDB were conducted in the study [71]. Databases were studied from multiple angles, including their storage model, scalability, concurrency, and replication. The authors drew the conclusion that, overall, MongoDB was more popular than Oracle's NoSQL offering. Based on DBEngine.com's rankings, MongoDB is the fifth best NoSQL database, whereas Oracle NoSQL is ranked as number 78. The paper [73] claimed that when compared with MongoDB's MapReduce, Oracle's RDBMS aggregation performed better. On the other hand, regarding the speed with which queries were answered, MongoDB bested Oracle RDBMS. According to [40], Oracle aggregation was superior to MongoDB aggregation for SUM, COUNT, and AVG workloads. The work in [40] used a dataset with approximately 500,000 records, which was insufficient for big data analytics. The process flow diagram of the SQL Select statement in Oracle 11g RDBMS is depicted in Figure 10.  MongoDB data retrieval is simple and easy because of the sub-document structure and does not require checking constraints or any clause, unlike the SQL Select statement of Oracle 11 g RDBMS. The SQL Insert statement process flow is shown in Figure 11. MongoDB data retrieval is simple and easy because of the sub-document structure and does not require checking constraints or any clause, unlike the SQL Select statement of Oracle 11 g RDBMS. The SQL Insert statement process flow is shown in Figure 11.
In comparison with Oracle RDBMS, MongoDB insertion is faster since it does not have to verify the steps shown in Figure 11 of the SQL Insert statement.
In contrast, MapReduce is the programming module [87] that performs well in distributing scenarios and is more ideal for big data [88] analysis compared with basic aggregation [73] in cluster (multiple server) contexts. Before commencing the map stage, MapReduce operations can conduct any arbitrary sorting, and they limit the documents in a single collection that they utilize as their input. The two phases of MapReduce, "Map" and "Reduce", are required for the MapReduce algorithm to work; MongoDB uses the map phase for each document that is fed into MapReduce. A pair of critical values is returned from the map function. MongoDB employs a process called "reduction" to gather and compress the aggregated data, and then MongoDB stores the results in a collection. For example, in the Chicago crime dataset, the map function calculates all the crimes against each day, and then the reduction function takes the day as the key and extracts the appropriate values (Key: values). The MapReduce-Merge framework was developed by the authors of [89] by including the merge operation into the MapReduce architecture. The performance of MapReduce was increased and was able to calculate relational algebra due to the merge operation and was able to process the data in the cluster. In order to maximize MapReduce productivity and enhance the cluster's query performance, others proposed the MRShare framework [90]. Therefore, Oracle RDBMS aggregation outperformed MongoDB in MapReduce aggregation.  MongoDB data retrieval is simple and easy because of the sub-document structure and does not require checking constraints or any clause, unlike the SQL Select statement of Oracle 11 g RDBMS. The SQL Insert statement process flow is shown in Figure 11. In comparison with Oracle RDBMS, MongoDB insertion is faster since it does not have to verify the steps shown in Figure 11 of the SQL Insert statement.
In contrast, MapReduce is the programming module [87] that performs well in distributing scenarios and is more ideal for big data [88] analysis compared with basic aggregation [73] in cluster (multiple server) contexts. Before commencing the map stage, MapReduce operations can conduct any arbitrary sorting, and they limit the documents in a single collection that they utilize as their input. The two phases of MapReduce, "Map" and "Reduce", are required for the MapReduce algorithm to work; MongoDB uses the map phase for each document that is fed into MapReduce. A pair of critical values is Over the past decade, Oracle RDBMS has been widely adopted by businesses of all sizes. Due to their single-image abstract system architecture, SQL RDBMSs did not fare well with regard to processing unstructured data.
The study [72] investigated several features and classifications of NoSQL databases operating on top of Hadoop [91][92][93]. The study [72,94] found that NoSQL databases could be broken down into subcategories according to factors such as scalability and data type (connected vs. document). Table 3 compares Oracle RDBMS with MongoDB and lists their primary features. Does not perform read/write very quickly in big data analytics Performs read/write very quickly because of memory mapping function in big data analytics. Supports relational algebra [60] Supports relational algebra [60] The key strengths of Oracle relational database and MongoDB NoSQL database are described in Table A3 (Appendix A), while Table A2 (Appendix A) compares NoSQL databases (MongoDB, Neo4j) with relational databases (Oracle, MySQL, and SQL-Server).
According to the cited works [95][96][97], there has been a notable increase in the quantity of geospatial and geolocated data. Many applications were developed to handle and use geospatial data [98][99][100][101][102][103], and this is true across many different fields (emergency management, archaeology, IoT, and smart cities). In order to process the massive amounts of geographical data effectively [27], a powerful database management system is strictly required.
When handling massive amounts of data, NoSQL databases, rather than SQL databases, are the best option for web applications [57,[104][105][106][107]. Geospatial and location data were also managed using NoSQL databases [108,109]. It was suggested in multiple reports [110][111][112][113] that NoSQL databases were able to effectively process vast amounts of unstructured data. The study [114] reported that researchers first became aware of spatial data in Geographic Information Systems foundational writing. Two problems with effective geospatial query data processing were investigated in the research [115,116]. The first was that conventional optimization methods were inadequate for solving geographical queries. When taking big analytics into account, it became clear that every method and approach to geographic queries tried thus far was inadequate for the massive amounts of data involved. In contrast to traditional textual data, which contain only a limited set of qualities, geospatial data offer a wide range of additional features. Extensive geospatial data processing, according to several studies [113,[117][118][119][120][121][122][123][124][125][126], necessitates well-established methods for processing the enormous quantity of geo-spatial data at hand. The article [127] demonstrated that wellknown RDBMS systems encountered numerous difficulties when attempting to analyze geographical data. Using a performance comparison analysis, the authors of [108] looked at how NoSQL document database MongoDB stacked up against the relational database management system PostGIS. As their tests showed, MongoDB performed better than PostGIS when handling geospatial data.
The oracle spatial storage data model [128] had two basic components: location and form. Storage models, such as index Engine and Geometry Engine, employed by query analysis, made use of the SDO_GEOMETRY data type. For location services, a geocoder was utilized to translate an address into SDO_GEOMETRY information. Oracle Maps and Map Viewer were used for visualization.
SQL Server 2016 (https://www.microsoft.com/en-us/cloud-platform/sql-server (accessed on 7 January 2020)) and Azure SQL database (https://azure.microsoft.com/ en-us/services/sql-Database (accessed on 9 January 2020)) and its cloud version both include various geo-functions for geospatial data analytics, as does SQL Oracle spatial database, while NoSQL databases, e.g., Azure DocumentDB (https://azure.microsoft. com/en-us/services/documentdb (accessed on 10 January 2020)) and MongoDB (https: //www.mongodb.com (accessed on 15 January 2020)), enable geographical capabilities. The Database as a Service (DBaaS) paradigm utilized by the Azure SQL database also features the Microsoft SQL server's same functionalities and provides cloud services. Post-greSQL (https://www.postgresql.org (accessed on 17 January 2020)) is a free and open relational database system. Comparatively, PostGIS is an extension of PostgreSQL that serves as a spatial database (http://postgis.net (accessed on 18 January 2020)) for working with geospatial information. The NoSQL database Azure DocumentDB is built by Microsoft and provides the same geographic operations and functionalities as MongoDB. MongoDB supports the GeoJSON standard data format. The authors of [129] undertook a performance analysis of geographical data. According to their research, the Azure DocumentDB was faster than the Azure SQL database but less scalable than the Azure SQL database. Popular SQL and NoSQL databases' primary geospatial properties are outlined in Table 4.
NoSQL MongoDB Data Modeling Figure 12 depicts the shard nodes, configuration servers, and routing servers (or mongos) that make up the MongoDB architecture, as described in [42,130,131]. NoSQL MongoDB Data Modeling Figure 12 depicts the shard nodes, configuration servers, and routing servers (or mongos) that make up the MongoDB architecture, as described in [42,130,131]. The data are stored in a shard, and a MongoDB cluster cannot be constructed without one or more shards. When a node fails, the information for that shard is held by its replicas. Data transactions (read/write) determine which shard is used. One or more servers are used by the replicated node, and the secondary node is modeled like the original server. In case the primary server goes down, one of the backup servers will take over. The server is the hub around which all operations (read/write) revolve. Eventually, the cluster will be in sync with all of the distributed read transactions. A set of configuration servers in the cluster is responsible for storing metadata. Data shards are identified by these servers, which also relay which data chunk belongs to which shard. Customer service requests that either the routing servers or MongoDB carry out the action. Prior to receiving confirmation from the client, MongoDB assigns each user task to the appropriate shard on the basis of the task type, then combines the resulting data. Since Mongos [23] are stateless, they can be used in a distributed setting.
Memory-mapped files are used by MongoDB to make the most of available memory, which, in turn, boosts performance. Indexing in the MongoDB [132] database uses B-trees. In a MongoDB cluster [133], a user can claim ownership of a certain partition collection by using a shard key.

RQ3: Does DBaaS tackle data interoperability and portability efficiently in various NoSQL databases?
In-depth research on the literature of DBaaS architecture was conducted for this work. The data analysis and extraction show that the cloud DBaaS approach developed for relational databases is not optimal for NoSQL databases. Expert time and effort can be re- The data are stored in a shard, and a MongoDB cluster cannot be constructed without one or more shards. When a node fails, the information for that shard is held by its replicas. Data transactions (read/write) determine which shard is used. One or more servers are used by the replicated node, and the secondary node is modeled like the original server. In case the primary server goes down, one of the backup servers will take over. The server is the hub around which all operations (read/write) revolve. Eventually, the cluster will be in sync with all of the distributed read transactions. A set of configuration servers in the cluster is responsible for storing metadata. Data shards are identified by these servers, which also relay which data chunk belongs to which shard. Customer service requests that either the routing servers or MongoDB carry out the action. Prior to receiving confirmation from the client, MongoDB assigns each user task to the appropriate shard on the basis of the task type, then combines the resulting data. Since Mongos [23] are stateless, they can be used in a distributed setting.
Memory-mapped files are used by MongoDB to make the most of available memory, which, in turn, boosts performance. Indexing in the MongoDB [132] database uses B-trees. In a MongoDB cluster [133], a user can claim ownership of a certain partition collection by using a shard key.

RQ3: Does DBaaS tackle data interoperability and portability efficiently in various NoSQL databases?
In-depth research on the literature of DBaaS architecture was conducted for this work. The data analysis and extraction show that the cloud DBaaS approach developed for relational databases is not optimal for NoSQL databases. Expert time and effort can be reduced, and security can be improved if scientists and researchers can find a unified [134][135][136][137][138] DBaaS [139] solution for both SQL and NoSQL databases. There is also no need to re-engineer programs for use with different CSPs thanks to standard APIs (APIs). Data portability and interoperability among different cloud providers are the primary obstacles to overcome. Interoperability is defined differently by each of the three paradigms [136]: IaaS, PaaS, and SaaS. Open standards [137] help mitigate the interoperability issue; however, our solution is focused on the IaaS layer specifically. When transferring information among different cloud providers, unified APIs are typically necessary [138]. There is not a single data storage model utilized by all cloud services. When a developer migrates from one CSP to another, the data are transferred according to the high-level architecture depicted in Figure 13. engineer programs for use with different CSPs thanks to standard APIs (APIs). Data portability and interoperability among different cloud providers are the primary obstacles to overcome. Interoperability is defined differently by each of the three paradigms [136]: IaaS, PaaS, and SaaS. Open standards [137] help mitigate the interoperability issue; however, our solution is focused on the IaaS layer specifically. When transferring information among different cloud providers, unified APIs are typically necessary [138]. There is not a single data storage model utilized by all cloud services. When a developer migrates from one CSP to another, the data are transferred according to the high-level architecture depicted in Figure 13. Databases [140] that are both consistent and connected, as well as management [141] that is both effective and efficient, are key and critical concerns in the modern information technology era. The fundamental features of a database system are the permanent data store's assurance of the data's independence from the underlying physical storage medium, as well as the query processing capabilities enabled by declarative queries (DBS). In the database sector, a wide variety of approaches to the various domains have been observed. Multiple DMSs, including ACID, OOM, XML, and data warehousing, are based on the relational data model. However, the BASE [81] attribute is supported in NoSQL databases, which are primarily intended for processing large amounts of data.
New cloud services [142] and capabilities are being offered by cloud service providers to customers at a low cost and with high efficiency. However, multiple cloud service providers offer the same features using varying implementations and user interfaces [143], which inevitably causes problems with interoperability [144], incompatibility [145], and portability. These are the difficult problems for cloud service providers while embracing and facilitating cloud technology [146]. IaaS, PaaS, and SaaS interoperability are all different terms with different meanings and applications within the realm of cloud services [136]. Cloud users should be moved from one CSP to another for the following reasons [137]: downtime or failure higher rate, contract termination, corporate plan changes, better alternatives with low cost, and legal difficulties. Customers using multiple cloud providers cannot easily port their data among them. The cloud service models strive to control the customer's capabilities since their key architectures lack interoperability. This issue is addressed by resorting to vendor lock-in, which poses a significant security concern for cloud-based models [147]. Data portability [148] will improve as increasingly more cloud service providers (CSPs) adopt the open standard to solve interoperability problems. As a result, it can be challenging for a developer to ensure that data and applications are consistent across different cloud services.
Open  Databases [140] that are both consistent and connected, as well as management [141] that is both effective and efficient, are key and critical concerns in the modern information technology era. The fundamental features of a database system are the permanent data store's assurance of the data's independence from the underlying physical storage medium, as well as the query processing capabilities enabled by declarative queries (DBS). In the database sector, a wide variety of approaches to the various domains have been observed. Multiple DMSs, including ACID, OOM, XML, and data warehousing, are based on the relational data model. However, the BASE [81] attribute is supported in NoSQL databases, which are primarily intended for processing large amounts of data.
New cloud services [142] and capabilities are being offered by cloud service providers to customers at a low cost and with high efficiency. However, multiple cloud service providers offer the same features using varying implementations and user interfaces [143], which inevitably causes problems with interoperability [144], incompatibility [145], and portability. These are the difficult problems for cloud service providers while embracing and facilitating cloud technology [146]. IaaS, PaaS, and SaaS interoperability are all different terms with different meanings and applications within the realm of cloud services [136]. Cloud users should be moved from one CSP to another for the following reasons [137]: downtime or failure higher rate, contract termination, corporate plan changes, better alternatives with low cost, and legal difficulties. Customers using multiple cloud providers cannot easily port their data among them. The cloud service models strive to control the customer's capabilities since their key architectures lack interoperability. This issue is addressed by resorting to vendor lock-in, which poses a significant security concern for cloud-based models [147]. Data portability [148] will improve as increasingly more cloud service providers (CSPs) adopt the open standard to solve interoperability problems. As a result, it can be challenging for a developer to ensure that data and applications are consistent across different cloud services.
As expected, the CIMI standard is compatible with the IaaS API and helps reduce the degree of interoperability required by cloud users and their infrastructure service providers. By using OCCI, the interoperability between CSPs can be reduced while still producing adequate results. Because these standards do not incorporate interoperability features into their underlying architectures, they are not widely used by a variety of CSPs. In addition, there is a problem with the availability of standardized interfaces and APIs.
Design patterns, cloud middle infrastructures, the Service Delivery Cloud Platform (SDCP), and migration tools have all been developed by other academics [137,152,153] to facilitate the movement of data among different clouds. Despite saving users time, these methods do not solve the portability problem among different CSPs that arises from having to learn and implement several APIs. Amazon Web Service (AWS), Microsoft Azure, and Google App Engine (GAE) are examples of cloud service providers that help consumers build and launch cloud-based apps. In addition, they provide the Database as a Service (DBaaS) cloud platform to support the database developers. Data porting in DBaaS, which can occur between SQL [154] and NoSQL [155] databases as well as inside each type of NoSQL database, can potentially be problematic due to the interoperability issue. Many types of NoSQL databases adhere to incompatible storage formats and data paradigms. There is a need for standardized APIs that can regulate data movement among various cloud storage platforms.
To host their data and applications, software developers can take advantage of DBaaS [139], a highly scalable and available backend cloud service. As a result, DBaaS is the most alluring option for cloud customers right now [154]. Database as a Service (DBaaS) [156] is a cloud service offered by cloud providers (CSPs) that facilitates the transition from traditional database architecture to cloud database architecture. SaaS facilitates remote access to computer programs and their features and functionality. PNUTS, HBase, SimpleDB, and Google BigTable are some other cloud-based database services. Potentially, the DBaaS framework can do a good job of handling traditional databases. However, DBaaS performance declines in areas such as data consistency, confidentiality, integrity, availability, and lack of security due to the different approaches followed by the many databases. If data privacy and security were guaranteed, the outsourcing model known as DBaaS may be a financial success. An expert virtualization consultant for private table database clouds was proposed in the research [157]. Cloud computing allows for the integration of numerous applications into the framework of the service itself. The scalability and adaptability of cloud computing comes at a lower price than ever before.
To provide more data interoperability, portability, and security during data porting, the paper [134,135] investigated two unified API frameworks, CDPort and Se-cloudDB, for SQL and NoSQL databases. Before handing it over to a third party, the API that they offered ensured the privacy of users' critical information (CSP). As part of their planned MCTool, requests are transformed into their corresponding models and then communicated to the appropriate database, considering the models that it can handle. Metadata encryption/decryption keys ensure that only authorized users and the DBA have access to and can make changes to the data stored in the various clouds. Encryption and decryption are supported by their proposed framework.

Discussion and Classification
The SQL vs. NoSQL debate is not about relational versus non-relational databases, despite the names. Both databases' transaction models are analyzed and contrasted. When an application is executed, the database performs a series of operations known as "transactions". All transactions in a SQL database depend on ACID properties. In this case, ACID refers to the principles of atomism, consistency, isolation, and durability. However, NoSQL database designers concluded that the ACID property was a useless roadblock to securing huge data. Consequently, in the early 2000s, Professor Eric Brewer proposed a new theory. The CAP principles are consistency, availability, and partition tolerance. The theorem states that all these qualities can be obtained concurrently by designers working in a distributed environment. As a trade-off for guaranteed consistency and availability, developers can implement a partition-tolerant database using the CA model. If the developer of a database places more value on availability and partition tolerance than consistency, then the database cannot be said to be AP-based. To achieve consistency and partition tolerance without compromising availability, a CP-based database is deployed. Table 5 describes the main features supported by both flavors of databases. Standardization on how to classify DBMSs has led to several sub-categories. The foundation of any database management system is the data model upon which it is built. Many contemporary commercial DBMSs rely heavily on the relational data model. The object data model has seen little adoption, even though it has seen use in a small number of commercial systems. Many legacy applications continue to run on database systems based on hierarchical and network data models. We can classify the NoSQL business model as one of the following, as defined in Table 6.
NoSQL database systems have been given a variety of indications, which can be attributed to the fact that the implementation structure does not include conventional SQL. Several distinct types of NoSQL databases each make use of a unique set of querying procedures. Software developers are often responsible for appropriately designing query accomplishment rather than depending on a declarative query language that combines questions and delivers fast query execution plans. This is because a declarative query language can combine questions. Users of the system are responsible for performing extra tasks, such as validating and interpreting the information that has been gathered. In addition, the responsibility for ensuring data consistency, data replication, and availability during concurrent changes in shared and replicated databases is shifting more and more to the hands of developers. The combination of massive data systems with NoSQL databases can have several different effects on software designs and architectures.
Brewer formulated a set of requirements for distributed databases by utilizing his CAP method [7]. These requirements serve as a baseline. In the case where there is a network partition (P: in the cluster, when the information is lost randomly between the nodes), consistency (C: present the identical data to all users) must take precedence over availability (available to users) (A: the acknowledgement must be received by every client as either success or failure). If there is not a partition (P), a framework will favor latency (L) over consistency (C), as shown by Abadi's PARCELS; nevertheless, when there is a partition (P), the framework will emphasize availability (A) over consistency (C). Challenges with reliability and availability (CA): This database design places a priority on data consistency and accessibility using a replication approach. Parts of the database do not care about partition tolerance. If the nodes are partitioned, the data will go out of sync. Vertica, Greenplum, and relational database management systems are examples of this type of database.
There are problems with consistency and partition tolerance (CP) that must be fixed. The primary objective of such a database management system is to ensure the integrity of the data it stores. However, high availability is no longer supported. The data are stored in the various nodes, and when a node crashes, it causes the data that ensures consistency between them to become unavailable. It maintains partition tolerance by blocking data resynchronization. Hypertable, BigTable, and HBase are only a few examples of the database systems that are CP-aware.
In this type of database, providing data availability and partition tolerance (AP) is a top priority. If there is a communication breakdown among nodes, it does not affect the status of any individual node. After a partition is resolved, data are resynchronized but consistency is not ensured. These principles are followed by databases such as Riak, CouchDB, and KAI.
When one element of a database goes down but the others keep running, that section of the database is said to be "basically available." In the event of a node failure, the operation will continue by replicating the data among the remaining nodes.
In a soft state, data are subject to change over time depending on factors such as the level of participation of the user. The usefulness of such information may also deteriorate after a certain period has passed. Therefore, it is necessary to either update or obtain the data for the information to be useful in the system.
According to eventual consistency, after every data modification, the data do not instantly become consistent throughout the entire system, but they will become consistent eventually. The data, it is said, will continue to be accurate into the foreseeable future.
Today, it is not enough to rely solely on structured data, as the unstructured nature of astronomical data necessitates fast data analytics and information extraction methods. SQL database capabilities are limited in their ability to effectively process various forms of large data. The capabilities of NoSQL databases allow for the efficient processing of massive data sets. NoSQL databases excel in the areas of storage capacity expansion, schema flexibility, scalability, and real-time access. The BASE attribute is adhered to by NoSQL databases. Instead of prioritizing data consistency and security, NoSQL databases place an emphasis on improving data read/write efficiency. The application is responsible for ensuring data consistency. For this reason, it is an excellent option for handling large datasets. In addition, NoSQL databases do not apply restrictions at the data, column, or table levels as used in structured database.
This research analyzed approximately 140 previous studies that compared the usefulness, productivity, and dependability of SQL databases with NoSQL databases. This research considered vast amounts of data. According to our investigations, NoSQL databases offer a greater capacity for expansion [23,37,68,158,159]. SQL databases are better suited for transactional systems and use more resources to ensure data integrity and consistency, whereas NoSQL databases are better equipped to handle huge and diverse datasets and use fewer resources to ensure data integrity and consistency. On the other hand, NoSQL databases [24,95] are distinct in the sense that they place a larger priority on the accessibility of data. The results of our testing indicate that relational databases cannot be effectively replaced by NoSQL databases. Because both databases have their advantages and disadvantages, the database that an organization chooses to use will depend on the requirements that are unique to that business. To give just one illustration, NoSQL databases, which make it possible to use the MapReduce programming module, are better suited to parallel computing [124,160] when they are implemented within a cluster environment.
NoSQL databases are constructed on a schema that is more fluid and dynamic, in contrast to relational databases, which are strongly dependent on a preset data structure that is referred to as a "schema" (tabular form) [24,62,68,81,85,158,161]. For example, to keep track of student data, one should use the StdRegNo, StdName, and StdAddress fields. When working with relational databases, the first thing that needs to occur is the construction of a schema that satisfies all the necessary domain and integrity requirements.
After that, one will be able to save the relevant student data and comply with the necessary restrictions. When considering extending the scope of the current database by incorporating two new columns, one should give this some consideration. Those who are going to be responsible for making modifications to the existing schema will also be responsible for migrating data from the previous schema to the new one. When dealing with large data sets, the procedure might become impractically slow and drawn out. The fact that NoSQL databases do not automatically update themselves whenever new changes are made to the schema presents the most significant challenge for agile software development. [25]. When working with NoSQL databases, it is not necessary to have a predetermined schema to make any kind of change. Table A4 of Appendix A contains a listing of the several databases that were accessed during the research that was selected.
Data normalization to control outliers, relational schema (attributes), domain constraints, check constraints, unique constraints, and Not NULL constraints, among other things, all aid safe data integration in relational databases [24,26,37] as opposed to NoSQL databases [24,26,26,37]. Relational databases provide well-established security and authentication systems for users. SQL databases are superior to NoSQL databases in terms of both performance and endurance. SQL is the standard interface for all relational databases; however, NoSQL databases do not use SQL. Instead, they use a different interface.
SQL and NoSQL databases follow a wide variety of models [60,68,126]. In addition, there is a difference in the underlying data model used by each NoSQL database [72,76] type. Given the prevalence of multiple CSPs, it is difficult to accomplish data portability [134,138]. The rapid expansion of commercial databases presents another difficulty for cloud storage. Instead, AWS DBaaS [1] offers the storage expansion options for Azure cloud databases are somewhat restricted. Product categories, architectures, database types, and languages are outlined in Table 7.

Research Gap
The implementation of complicated distribution systems is required to achieve significant levels of both availability and scalability. In addition, sharding and partitioning take place on the foundational layers of the software architecture, which are referred to as the application tier, the caching tier, and the back-end storage tier, respectively. To achieve great scalability, however, can be challenging because of the atomic abstraction image that is presented by a typical framework that uses SQL as the gold-standard non-procedural language. In addition, the software needs to be intelligent to tackle issues that arise with data replicas and inconsistencies that are brought about by collisions between changes to replicas that are happening at the same time. When regard to quality criteria, such as scalability, consistency, performance, and durability, each variety of NoSQL database has its own unique set of drawbacks and limitations. In addition, the architect is fundamentally required to conduct research on the characteristics of the nominee database to fulfill the requirements imposed by the vendor while choosing the appropriate database. The gaps of each flavor of NoSQL databases are described in Table 8. If your applications simply need to store and retrieve data items that are transparent to the DBMS and can be identified by a key, a key-value store may be the best option for you. In contrast, the software crashes if it tries to perform a database query based on a value for an attribute other than the key. Furthermore, it is impossible to update or obtain just one field from a document's key-value store.
When applications require granular control over which records to obtain, which fields within a record to change, and which fields to retrieve based on criteria other than the primary key, document databases are an excellent option. Document data stores provide more query flexibility than key-value stores.
When applications need to store data with hundreds or thousands of fields but need to access only a subset of these fields in most queries, column-family data stores are an efficient solution. Such data repositories are well suited for massive data sets.
Graph databases are ideally suited for use cases that include storing and analyz-ing data on entities with complex relationships to one another. In a graph data-base, the importance of entities and their connections is treated equally.
The term "big data" [88,158,162] is frequently used to refer to data obtained from largescale sensor networks or user-generated information from social networks [88,158,162]. However, because many people do not have the necessary skills, they do not know how to handle data to accomplish a desired outcome for their businesses. This is because a significant number of the technologies, techniques, and disciplines concerning big data are relatively recent developments. Relational data stores, othe other hand, are unable to process such a large data collection, which necessitates the creation of new storage system capabilities. Data stored in a NoSQL database do not require a centralized schema. Because of their greater scalability, data availability, fault tolerance, and the rapid processing of massive amounts of unstructured data, NoSQL databases are suitable for handling big data. There are a lot of questions that have not been solved yet because of the differences between relational databases and NoSQL databases.
The paper concludes that NoSQL is not a suitable alternative to relational databases. In addition to this, NoSQL is an excellent option for heterogeneous big data. Each of these fields has numerous unfilled research voids. The simplicity, scalability, and performance of NoSQL databases' designs are areas with significant room for study. Different from the relational database's strict schema data structure, the NoSQL databases use a flatfile or key/value data model (tabular form). NoSQL databases, due to their horizontal scalability, are well-suited to deal with the vast quantities and varieties of data currently in use. However, this is not the case for SQL databases, which can scale vertically. When comparing NoSQL with relational databases, scalability and performance are the two most important factors. There is also the unanswered research question of how to integrate the features [49] of relational databases with those of nonrelational databases. For NoSQL databases, there is also a lack of study into the problem of simple schema design [163][164][165].
The development of flexible data migration frameworks [50,54,62,65,76] from a relational database to a NoSQL data storage is another area of research that has received increased interest in recent years. There is a lot at stake regarding the success of the SQL-to-NoSQL data transfer. It is crucial since any company needs to extract useful insights from its own data, such as system resource use and employee performance reviews.
Additionally, NoSQL is the superior option to relational databases on the cloud because of the cloud platform's distributed nature. In order to reduce cloud consumer effort during data movement across many cloud platforms and to exert control over data interoperability and portability concerns, the state of the art requires unified API [134,135,138] frameworks.

Prediction and Occurrences of DBMSs against a Particular DBMS
Our data set-up is based on Table A4. First, we created a pair of (x, y) data. We then used a combination formula to create a pair of (x, y) for a line with more than two database names, thus constructing 301 pairs. We used a label encoder to encode x and y to numbers from string for pre-processing and used Gaussian Naïve Bayes to train our encoded data. Once it was prepared, we obtained a unique database name from x. We encoded those unique names and then tested them on individual encoded data. Then, we obtained the probability of all classes corresponding to individual encoded data and represented the result as a unique x encoded corresponding to the rest of the different database probabilities. Thus, the table size is n by n, where n is a unique database.
Analysis of the collected data: On the collected data, we trained the Gaussian Naïve Bayes model. Nevertheless, the main issue is that it predicted the MongoDB for all database names except itself. MongoDB in label set (y) are many as compared to the others. Figure A1 and Table A6 of Appendix A show that the MongoDB percentage contains more than 22% of all data. Thus, the model is data-biased. To cater to this issue of data imbalance, we generated data.
Analysis of the collected and generated data: On the basis of the analysis presented in Figure A1, we concluded that the databases "MongoDB, Cassandra, MySQL, and HBase" significantly contributed to a collected dataset. Thus, we adopted the strategy of ignoring these database names when creating a combination of other database names. The percentage of these database names can then be decreased, and another database name can be increased, thus causing the dataset to be balanced. As we can see from Figure A2 and Table A7 of Appendix A, the dataset seems balanced, and the results differ from what we obtained from the collected dataset.

Conclusions
The findings indicate that there is no need to transition from relational databases to NoSQL databases. As there are advantages and disadvantages to both types of databases, a company can choose the DBMS that best meets its requirements. An organization can utilize an SQL database if it places a priority on data standardization and consistency. NoSQL should be utilized when a business has a large quantity of unstructured data and data availability is a high requirement. A relational database may be preferred over a NoSQL database for the aggregation of small datasets, and vice versa for big data analytics. MapReduce is most suited for use in clusters due to its distributed nature. Although MapReduce is slow during the process of aggregation, it is more effective in parallel computing and was developed to handle massive amounts of unstructured data. NoSQL databases, due to their dispersed and highly scalable nature, are well-suited for applications that generate voluminous data with diverse features. When geospatial data are involved, the scalability of relational databases is superior to that of NoSQL databases. NoSQL databases have a faster data response time than relational databases, particularly when handling huge amounts of geospatial data.
NoSQL data stores offer an alternative to conventional RDBMS, but it is probable that organizations may not immediately decide which one to use. The optimal strategy for choosing the proper NoSQL database is to determine what the application requires that a relational database management system cannot deliver. If a Relational Database Management System (RDBMS) can effectively manage the data, a NoSQL storage system may not be required.
Moreover, it is generally agreed that the NoSQL database is a newer development in the database space. However, they are being developed using generally recognized theories. Despite the benefits, NoSQL-based systems are not without their flaws, such as lack of widely accepted standards or a well-known query language for use with NoSQL databases. Each database has its own characteristics and ways of working. In contrast, these data sets are new and dynamic. There is no assurance that all data will be written to the data store correctly because NoSQL databases do not offer strict ACID characteristics.
Furthermore, NoSQL databases make rapid development simple because of their dynamic/flexible schema. Different from the rigid structure of relational databases, the models and architectures of NoSQL databases are more adaptable.
Changing from one storage model to another is challenging for developers because of the different models used by NoSQL databases. Different CSPs use incompatible protocols and interfaces, making them incompatible with one another. Because each CSP has developed their own APIs for their specific services, it becomes difficult for data to be shared between them. To effectively manage data portability and interoperability, NoSQL databases require a single, standardized cloud solution.
Future directions for work regarding structured data include adopting a denormalized strategy for SQL RDBMS and comparing the outcomes of inserting, updating, and retrieving data in MongoDB with those in another system. Instead, we can compare the performance of MongoDB and SQL RDBMS in a specific scenario by keeping an eye on the normalized approach taken by the former and then evaluating the outcomes of data insertion, update, and retrieval. The parallel geospatial approaches used by NoSQL databases demand more attention if they are to serve large numbers of users efficiently. Computer vision [166], object detection [167], signal classification [168], and various other deep learning applications [169] are just some of the deep learning-based applications that can benefit from the expansion of big data [170] methodologies.

Conflicts of Interest:
The authors declare no conflict of interest.